Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1308

2.2 clients unable to mount upgraded MDT

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.3.0, Lustre 2.1.2
    • Lustre 2.2.0, Lustre 2.3.0
    • None
    • Scientific Linux 5.5, Lustre 2.2.0 on servers, patchless 2.1.1 and 2.2.0 on clients

    Description

      We are hitting a strange bug while upgrading to 2.2.0. We moved all the servers and some clients to 2.2 already, however our TCP clients are unable to mount the filesystem, because they are unable to find a suitable NID to connect to the MDT. 2.1.1 clients work fine.
      In our case o2ib

      {0,1}

      are the first networks listed in all configs (MGS/MDT/OST config), and the tcp one is occuring as the third one. All the clients which use o2ib work fine, as the first MDT NID they get from MGS works for them, however TCP ones fail (at least thats what we supose).
      Our mds params are:
      Parameters: mgsnode=172.16.193.1@o2ib,172.16.126.1@tcp mgsnode=172.16.193.3@o2ib,172.16.126.2@tcp failover.node=172.16.193.3@o2ib,172.16.126.2@tcp mdd.quota_type=ug

      Servers have:
      options lnet networks="o2ib0(ib0),o2ib1(ib1),tcp0(eth0)"
      TCP clients:
      options lnet networks="tcp0(eth0)"

      And the client gets this:
      [root@n1-4-1 ~]# lctl which_nid 172.16.126.1@tcp
      172.16.126.1@tcp

      [root@n1-4-1 ~]# lctl ping 172.16.126.1@tcp
      12345-0@lo
      12345-172.16.193.1@o2ib
      12345-192.168.193.2@o2ib1
      12345-172.16.126.1@tcp

      [root@n1-4-1 ~]# mount -t lustre 172.16.126.1@tcp:/scratch /mnt/lustre/scratch/
      mount.lustre: mount 172.16.126.1@tcp:/scratch at /mnt/lustre/scratch failed: No such file or directory
      Is the MGS specification correct?
      Is the filesystem name correct?
      If upgrading, is the copied client log valid? (see upgrade docs)

      Dmesg says:
      Apr 11 16:55:53 n1-4-1 kernel: Lustre: MGC172.16.126.1@tcp: Reactivating import
      Apr 11 16:55:53 n1-4-1 kernel: LustreError: 2469:0:(ldlm_lib.c:381:client_obd_setup()) can't add initial connection
      Apr 11 16:55:53 n1-4-1 kernel: LustreError: 2469:0:(obd_config.c:521:class_setup()) setup scratch-MDT0000-mdc-ffff81018d9d6400 failed (-2)
      Apr 11 16:55:53 n1-4-1 kernel: LustreError: 2469:0:(obd_config.c:1362:class_config_llog_handler()) Err -2 on cfg command:
      Apr 11 16:55:53 n1-4-1 kernel: Lustre: cmd=cf003 0:scratch-MDT0000-mdc 1:scratch-MDT0000_UUID 2:172.16.193.1@o2ib
      Apr 11 16:55:53 n1-4-1 kernel: LustreError: 15c-8: MGC172.16.126.1@tcp: The configuration from log 'scratch-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      Apr 11 16:55:53 n1-4-1 kernel: LustreError: 2457:0:(llite_lib.c:978:ll_fill_super()) Unable to process log: -2
      Apr 11 16:55:53 n1-4-1 kernel: LustreError: 2457:0:(obd_config.c:566:class_cleanup()) Device 3 not setup
      Apr 11 16:55:53 n1-4-1 kernel: LustreError: 2457:0:(ldlm_request.c:1170:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
      Apr 11 16:55:53 n1-4-1 kernel: LustreError: 2457:0:(ldlm_request.c:1796:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
      Apr 11 16:55:53 n1-4-1 kernel: Lustre: client ffff81018d9d6400 umount complete
      Apr 11 16:55:53 n1-4-1 kernel: LustreError: 2457:0:(obd_mount.c:2349:lustre_fill_super()) Unable to mount (-2)

      I'm also attaching two debug dumps (lctl dk) for 2.1.1 client (works fine) and 2.2.0 client (fails).

      Attachments

        1. debug_logs.tar.gz
          5.12 MB
          Marek Magrys
        2. dumps.tar.gz
          23 kB
          Marek Magrys
        3. mount.debug.gz
          115 kB
          Lukasz Flis
        4. mount-debug-patch2.log.gz
          1.45 MB
          Lukasz Flis

        Issue Links

          Activity

            People

              green Oleg Drokin
              m.magrys Marek Magrys
              Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: