Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.4.0
    • Lustre 2.4.0
    • Lustre: 2.3.53-2chaos
    • 3
    • 4642

    Description

      After updating to 2.3.53-2chaos, the MDS is no longer able to mount its MDT. The relevant console messages:

      Lustre: Found index 0 for lstest-MDT0000, updating log
      LustreError: 33410:0:(sec_config.c:1024:sptlrpc_target_local_copy_conf()) missing llog context
      LustreError: 33836:0:(genops.c:316:class_newdev()) Device lstest-MDT0000-osp-MDT0000 already exists at 136, won't add
      LustreError: 33836:0:(obd_config.c:374:class_attach()) Cannot create device lstest-MDT0000-osp-MDT0000 of type osp : -17
      Lustre: lstest-MDT0000: Temporarily refusing client connection from 0@lo
      LustreError: 11-0: lstest-MDT0000-osp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11
      LustreError: 33836:0:(obd_mount.c:373:lustre_start_simple()) lstest-MDT0000-osp-MDT0000 attach error -17
      LustreError: 33836:0:(obd_mount.c:1135:lustre_osp_setup()) lstest-MDT0000-osp-MDT0000: setup up failed: rc -17
      LustreError: 15c-8: MGC172.20.5.2@o2ib500: The configuration from log 'lstest-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      LustreError: 33405:0:(obd_mount.c:1865:server_start_targets()) lstest-MDT0000: failed to start OSP: -17
      Lustre: lstest-MDT0000: Unable to start target: -17
      Lustre: Failing over lstest-MDT0000
      LustreError: 32689:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff880faa7f0800 x1415277549978464/t0(0) o13->lstest-OST0181-osc-MDT0000@172.20.2.185@o2ib500:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      LustreError: 32690:0:(osp_precreate.c:116:osp_statfs_interpret()) lstest-OST0182-osc-MDT0000: couldn't update statfs: rc = -5
      LustreError: 32689:0:(client.c:1116:ptlrpc_import_delay_req()) Skipped 253 previous similar messages
      Lustre: server umount lstest-MDT0000 complete
      LustreError: 33405:0:(obd_mount.c:2985:lustre_fill_super()) Unable to mount  (-17)
      

      I'm just about to start looking into the root cause.

      Attachments

        Activity

          [LU-2110] Unable to mount (-17) MDT

          can we close the ticket?

          bzzz Alex Zhuravlev added a comment - can we close the ticket?

          Patch landed to master.

          ian Ian Colle (Inactive) added a comment - Patch landed to master.

          Ok, we've pulled in patch http://review.whamcloud.com/#change,4227 and will give it a try.

          morrone Christopher Morrone (Inactive) added a comment - Ok, we've pulled in patch http://review.whamcloud.com/#change,4227 and will give it a try.

          #12 (128)attach 0:lstest-MDT0000-mdc 1:mdc 2:lstest-clilmv_UUID
          ...
          #20 (088)add_uuid nid=172.20.2.185@o2ib500(0x501f4ac1402b9) 0: 1:172.20.2.185@o2ib500
          #21 (088)add_uuid nid=172.20.2.185@tcp(0x20000ac1402b9) 0: 1:172.20.2.185@o2ib500

          #21 resulted in a second instance of OSP device.

          I think the patch above should help with the issue.

          bzzz Alex Zhuravlev added a comment - #12 (128)attach 0:lstest-MDT0000-mdc 1:mdc 2:lstest-clilmv_UUID ... #20 (088)add_uuid nid=172.20.2.185@o2ib500(0x501f4ac1402b9) 0: 1:172.20.2.185@o2ib500 #21 (088)add_uuid nid=172.20.2.185@tcp(0x20000ac1402b9) 0: 1:172.20.2.185@o2ib500 #21 resulted in a second instance of OSP device. I think the patch above should help with the issue.

          Here you go. The dump of

          # grove-mds2 /mnt/grove-mds2/mgs > llog_reader CONFIGS/lstest-client > lstest-client.llogreader
          
          prakash Prakash Surya (Inactive) added a comment - Here you go. The dump of # grove-mds2 /mnt/grove-mds2/mgs > llog_reader CONFIGS/lstest-client > lstest-client.llogreader

          one way is to fetch /CONFIGS/lstest-client file from MDS and parse it with llog_reader utility.
          it would help us if you attach it to the ticket as well. thanks.

          bzzz Alex Zhuravlev added a comment - one way is to fetch /CONFIGS/lstest-client file from MDS and parse it with llog_reader utility. it would help us if you attach it to the ticket as well. thanks.

          Actually, I'm not certain of that. At one point a failover NID was added using a writeconf, but the filesystem was reformatted since then. During the reformat, I'm unsure if both the failover NIDs were specified at mkfs time, or the writeconf method was used after mkfs. I can try to track down that information if it is useful..?

          prakash Prakash Surya (Inactive) added a comment - Actually, I'm not certain of that. At one point a failover NID was added using a writeconf, but the filesystem was reformatted since then. During the reformat, I'm unsure if both the failover NIDs were specified at mkfs time, or the writeconf method was used after mkfs. I can try to track down that information if it is useful..?

          Prakash, please try with http://review.whamcloud.com/#change,4227

          if I understand right, failover nid for MDS was specified at mkfs.lustre time, not added later ?

          bzzz Alex Zhuravlev added a comment - Prakash, please try with http://review.whamcloud.com/#change,4227 if I understand right, failover nid for MDS was specified at mkfs.lustre time, not added later ?

          thanks. I see the root cause.. working on the fix.

          bzzz Alex Zhuravlev added a comment - thanks. I see the root cause.. working on the fix.

          People

            bzzz Alex Zhuravlev
            prakash Prakash Surya (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: