Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.4.0
    • Lustre 2.4.0
    • Lustre: 2.3.53-2chaos
    • 3
    • 4642

    Description

      After updating to 2.3.53-2chaos, the MDS is no longer able to mount its MDT. The relevant console messages:

      Lustre: Found index 0 for lstest-MDT0000, updating log
      LustreError: 33410:0:(sec_config.c:1024:sptlrpc_target_local_copy_conf()) missing llog context
      LustreError: 33836:0:(genops.c:316:class_newdev()) Device lstest-MDT0000-osp-MDT0000 already exists at 136, won't add
      LustreError: 33836:0:(obd_config.c:374:class_attach()) Cannot create device lstest-MDT0000-osp-MDT0000 of type osp : -17
      Lustre: lstest-MDT0000: Temporarily refusing client connection from 0@lo
      LustreError: 11-0: lstest-MDT0000-osp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11
      LustreError: 33836:0:(obd_mount.c:373:lustre_start_simple()) lstest-MDT0000-osp-MDT0000 attach error -17
      LustreError: 33836:0:(obd_mount.c:1135:lustre_osp_setup()) lstest-MDT0000-osp-MDT0000: setup up failed: rc -17
      LustreError: 15c-8: MGC172.20.5.2@o2ib500: The configuration from log 'lstest-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      LustreError: 33405:0:(obd_mount.c:1865:server_start_targets()) lstest-MDT0000: failed to start OSP: -17
      Lustre: lstest-MDT0000: Unable to start target: -17
      Lustre: Failing over lstest-MDT0000
      LustreError: 32689:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff880faa7f0800 x1415277549978464/t0(0) o13->lstest-OST0181-osc-MDT0000@172.20.2.185@o2ib500:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      LustreError: 32690:0:(osp_precreate.c:116:osp_statfs_interpret()) lstest-OST0182-osc-MDT0000: couldn't update statfs: rc = -5
      LustreError: 32689:0:(client.c:1116:ptlrpc_import_delay_req()) Skipped 253 previous similar messages
      Lustre: server umount lstest-MDT0000 complete
      LustreError: 33405:0:(obd_mount.c:2985:lustre_fill_super()) Unable to mount  (-17)
      

      I'm just about to start looking into the root cause.

      Attachments

        Activity

          [LU-2110] Unable to mount (-17) MDT

          Patch landed to master.

          ian Ian Colle (Inactive) added a comment - Patch landed to master.

          Ok, we've pulled in patch http://review.whamcloud.com/#change,4227 and will give it a try.

          morrone Christopher Morrone (Inactive) added a comment - Ok, we've pulled in patch http://review.whamcloud.com/#change,4227 and will give it a try.

          #12 (128)attach 0:lstest-MDT0000-mdc 1:mdc 2:lstest-clilmv_UUID
          ...
          #20 (088)add_uuid nid=172.20.2.185@o2ib500(0x501f4ac1402b9) 0: 1:172.20.2.185@o2ib500
          #21 (088)add_uuid nid=172.20.2.185@tcp(0x20000ac1402b9) 0: 1:172.20.2.185@o2ib500

          #21 resulted in a second instance of OSP device.

          I think the patch above should help with the issue.

          bzzz Alex Zhuravlev added a comment - #12 (128)attach 0:lstest-MDT0000-mdc 1:mdc 2:lstest-clilmv_UUID ... #20 (088)add_uuid nid=172.20.2.185@o2ib500(0x501f4ac1402b9) 0: 1:172.20.2.185@o2ib500 #21 (088)add_uuid nid=172.20.2.185@tcp(0x20000ac1402b9) 0: 1:172.20.2.185@o2ib500 #21 resulted in a second instance of OSP device. I think the patch above should help with the issue.

          Here you go. The dump of

          # grove-mds2 /mnt/grove-mds2/mgs > llog_reader CONFIGS/lstest-client > lstest-client.llogreader
          
          prakash Prakash Surya (Inactive) added a comment - Here you go. The dump of # grove-mds2 /mnt/grove-mds2/mgs > llog_reader CONFIGS/lstest-client > lstest-client.llogreader

          one way is to fetch /CONFIGS/lstest-client file from MDS and parse it with llog_reader utility.
          it would help us if you attach it to the ticket as well. thanks.

          bzzz Alex Zhuravlev added a comment - one way is to fetch /CONFIGS/lstest-client file from MDS and parse it with llog_reader utility. it would help us if you attach it to the ticket as well. thanks.

          Actually, I'm not certain of that. At one point a failover NID was added using a writeconf, but the filesystem was reformatted since then. During the reformat, I'm unsure if both the failover NIDs were specified at mkfs time, or the writeconf method was used after mkfs. I can try to track down that information if it is useful..?

          prakash Prakash Surya (Inactive) added a comment - Actually, I'm not certain of that. At one point a failover NID was added using a writeconf, but the filesystem was reformatted since then. During the reformat, I'm unsure if both the failover NIDs were specified at mkfs time, or the writeconf method was used after mkfs. I can try to track down that information if it is useful..?

          Prakash, please try with http://review.whamcloud.com/#change,4227

          if I understand right, failover nid for MDS was specified at mkfs.lustre time, not added later ?

          bzzz Alex Zhuravlev added a comment - Prakash, please try with http://review.whamcloud.com/#change,4227 if I understand right, failover nid for MDS was specified at mkfs.lustre time, not added later ?

          thanks. I see the root cause.. working on the fix.

          bzzz Alex Zhuravlev added a comment - thanks. I see the root cause.. working on the fix.

          Rebooted and collected the lustre log file.

          prakash Prakash Surya (Inactive) added a comment - Rebooted and collected the lustre log file.

          well, sorry you're seeing this... could you please try again and attach lustre log to the ticket ?

          bzzz Alex Zhuravlev added a comment - well, sorry you're seeing this... could you please try again and attach lustre log to the ticket ?

          Originally, this this was on a clean reboot. But the messages I pasted in the description were from a manually retried mount, after the first failed.

          Here are all the messages from the console:

          Lustre: Lustre: Build Version: 2.3.53-2chaos-2chaos--PRISTINE-2.6.32-220.23.1.1chaos.ch5.x86_64
          Lustre: Found index 0 for lstest-MDT0000, updating log
          LustreError: 32758:0:(mgc_request.c:248:do_config_log_add()) failed processing sptlrpc log: -2
          LustreError: 32761:0:(sec_config.c:1024:sptlrpc_target_local_copy_conf()) missing llog context
          LustreError: 33225:0:(genops.c:316:class_newdev()) Device lstest-MDT0000-osp-MDT0000 already exists at 136, won't add
          LustreError: 33225:0:(obd_config.c:374:class_attach()) Cannot create device lstest-MDT0000-osp-MDT0000 of type osp : -17
          Lustre: lstest-MDT0000: Temporarily refusing client connection from 0@lo
          LustreError: 11-0: lstest-MDT0000-osp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11
          LustreError: 33225:0:(obd_mount.c:373:lustre_start_simple()) lstest-MDT0000-osp-MDT0000 attach error -17
          LustreError: 33225:0:(obd_mount.c:1135:lustre_osp_setup()) lstest-MDT0000-osp-MDT0000: setup up failed: rc -17
          LustreError: 15c-8: MGC172.20.5.2@o2ib500: The configuration from log 'lstest-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
          LustreError: 32758:0:(obd_mount.c:1865:server_start_targets()) lstest-MDT0000: failed to start OSP: -17
          Lustre: lstest-MDT0000: Unable to start target: -17
          Lustre: Failing over lstest-MDT0000
          LustreError: 32680:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff880f977bb800 x1415277549977961/t0(0) o13->lstest-OST0181-osc-MDT0000@172.20.2.185@o2ib500:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
          LustreError: 32680:0:(osp_precreate.c:116:osp_statfs_interpret()) lstest-OST0181-osc-MDT0000: couldn't update statfs: rc = -5
          LustreError: 32681:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff882016d06c00 x1415277549977962/t0(0) o13->lstest-OST0182-osc-MDT0000@172.20.2.186@o2ib500:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
          LustreError: 32682:0:(osp_precreate.c:116:osp_statfs_interpret()) lstest-OST0183-osc-MDT0000: couldn't update statfs: rc = -5
          LustreError: 32682:0:(osp_precreate.c:116:osp_statfs_interpret()) Skipped 1 previous similar message
          LustreError: 32680:0:(osp_precreate.c:116:osp_statfs_interpret()) Skipped 125 previous similar messages
          Lustre: server umount lstest-MDT0000 complete
          LustreError: 32758:0:(obd_mount.c:2985:lustre_fill_super()) Unable to mount  (-17)
          Lustre: Found index 0 for lstest-MDT0000, updating log
          LustreError: 33410:0:(sec_config.c:1024:sptlrpc_target_local_copy_conf()) missing llog context
          LustreError: 33836:0:(genops.c:316:class_newdev()) Device lstest-MDT0000-osp-MDT0000 already exists at 136, won't add
          LustreError: 33836:0:(obd_config.c:374:class_attach()) Cannot create device lstest-MDT0000-osp-MDT0000 of type osp : -17
          Lustre: lstest-MDT0000: Temporarily refusing client connection from 0@lo
          LustreError: 11-0: lstest-MDT0000-osp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11
          LustreError: 33836:0:(obd_mount.c:373:lustre_start_simple()) lstest-MDT0000-osp-MDT0000 attach error -17
          LustreError: 33836:0:(obd_mount.c:1135:lustre_osp_setup()) lstest-MDT0000-osp-MDT0000: setup up failed: rc -17
          LustreError: 15c-8: MGC172.20.5.2@o2ib500: The configuration from log 'lstest-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
          LustreError: 33405:0:(obd_mount.c:1865:server_start_targets()) lstest-MDT0000: failed to start OSP: -17
          Lustre: lstest-MDT0000: Unable to start target: -17
          Lustre: Failing over lstest-MDT0000
          LustreError: 32689:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff880faa7f0800 x1415277549978464/t0(0) o13->lstest-OST0181-osc-MDT0000@172.20.2.185@o2ib500:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
          LustreError: 32690:0:(osp_precreate.c:116:osp_statfs_interpret()) lstest-OST0182-osc-MDT0000: couldn't update statfs: rc = -5
          LustreError: 32689:0:(client.c:1116:ptlrpc_import_delay_req()) Skipped 253 previous similar messages
          Lustre: server umount lstest-MDT0000 complete
          LustreError: 33405:0:(obd_mount.c:2985:lustre_fill_super()) Unable to mount  (-17)
          

          Is is still worth rebooting, and trying again?

          prakash Prakash Surya (Inactive) added a comment - Originally, this this was on a clean reboot. But the messages I pasted in the description were from a manually retried mount, after the first failed. Here are all the messages from the console: Lustre: Lustre: Build Version: 2.3.53-2chaos-2chaos--PRISTINE-2.6.32-220.23.1.1chaos.ch5.x86_64 Lustre: Found index 0 for lstest-MDT0000, updating log LustreError: 32758:0:(mgc_request.c:248:do_config_log_add()) failed processing sptlrpc log: -2 LustreError: 32761:0:(sec_config.c:1024:sptlrpc_target_local_copy_conf()) missing llog context LustreError: 33225:0:(genops.c:316:class_newdev()) Device lstest-MDT0000-osp-MDT0000 already exists at 136, won't add LustreError: 33225:0:(obd_config.c:374:class_attach()) Cannot create device lstest-MDT0000-osp-MDT0000 of type osp : -17 Lustre: lstest-MDT0000: Temporarily refusing client connection from 0@lo LustreError: 11-0: lstest-MDT0000-osp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11 LustreError: 33225:0:(obd_mount.c:373:lustre_start_simple()) lstest-MDT0000-osp-MDT0000 attach error -17 LustreError: 33225:0:(obd_mount.c:1135:lustre_osp_setup()) lstest-MDT0000-osp-MDT0000: setup up failed: rc -17 LustreError: 15c-8: MGC172.20.5.2@o2ib500: The configuration from log 'lstest-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 32758:0:(obd_mount.c:1865:server_start_targets()) lstest-MDT0000: failed to start OSP: -17 Lustre: lstest-MDT0000: Unable to start target: -17 Lustre: Failing over lstest-MDT0000 LustreError: 32680:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff880f977bb800 x1415277549977961/t0(0) o13->lstest-OST0181-osc-MDT0000@172.20.2.185@o2ib500:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 LustreError: 32680:0:(osp_precreate.c:116:osp_statfs_interpret()) lstest-OST0181-osc-MDT0000: couldn't update statfs: rc = -5 LustreError: 32681:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff882016d06c00 x1415277549977962/t0(0) o13->lstest-OST0182-osc-MDT0000@172.20.2.186@o2ib500:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 LustreError: 32682:0:(osp_precreate.c:116:osp_statfs_interpret()) lstest-OST0183-osc-MDT0000: couldn't update statfs: rc = -5 LustreError: 32682:0:(osp_precreate.c:116:osp_statfs_interpret()) Skipped 1 previous similar message LustreError: 32680:0:(osp_precreate.c:116:osp_statfs_interpret()) Skipped 125 previous similar messages Lustre: server umount lstest-MDT0000 complete LustreError: 32758:0:(obd_mount.c:2985:lustre_fill_super()) Unable to mount (-17) Lustre: Found index 0 for lstest-MDT0000, updating log LustreError: 33410:0:(sec_config.c:1024:sptlrpc_target_local_copy_conf()) missing llog context LustreError: 33836:0:(genops.c:316:class_newdev()) Device lstest-MDT0000-osp-MDT0000 already exists at 136, won't add LustreError: 33836:0:(obd_config.c:374:class_attach()) Cannot create device lstest-MDT0000-osp-MDT0000 of type osp : -17 Lustre: lstest-MDT0000: Temporarily refusing client connection from 0@lo LustreError: 11-0: lstest-MDT0000-osp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11 LustreError: 33836:0:(obd_mount.c:373:lustre_start_simple()) lstest-MDT0000-osp-MDT0000 attach error -17 LustreError: 33836:0:(obd_mount.c:1135:lustre_osp_setup()) lstest-MDT0000-osp-MDT0000: setup up failed: rc -17 LustreError: 15c-8: MGC172.20.5.2@o2ib500: The configuration from log 'lstest-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 33405:0:(obd_mount.c:1865:server_start_targets()) lstest-MDT0000: failed to start OSP: -17 Lustre: lstest-MDT0000: Unable to start target: -17 Lustre: Failing over lstest-MDT0000 LustreError: 32689:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff880faa7f0800 x1415277549978464/t0(0) o13->lstest-OST0181-osc-MDT0000@172.20.2.185@o2ib500:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 LustreError: 32690:0:(osp_precreate.c:116:osp_statfs_interpret()) lstest-OST0182-osc-MDT0000: couldn't update statfs: rc = -5 LustreError: 32689:0:(client.c:1116:ptlrpc_import_delay_req()) Skipped 253 previous similar messages Lustre: server umount lstest-MDT0000 complete LustreError: 33405:0:(obd_mount.c:2985:lustre_fill_super()) Unable to mount (-17) Is is still worth rebooting, and trying again?

          People

            bzzz Alex Zhuravlev
            prakash Prakash Surya (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: