Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8044

class_process_config() no device for: lustre-MDT0021-mdtlov

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.8.0
    • 3
    • 9223372036854775807

    Description

      On startup for the first time after formatting, the MDT fails to process the config provided by the MGS. The MDT then fails to start.
      The config log on the MGS appears to be invalid, with more than one setup and modify_mdc_tgt record for one of the other MDTs.

      The MDT which fails to start reports:

      Lustre: Lustre: Build Version: 2.8.0
      LustreError: 11797:0:(obd_config.c:1262:class_process_config()) no device for: lustre-MDT0021-mdtlov
      LustreError: 11797:0:(obd_config.c:1666:class_config_llog_handler()) MGC192.168.112.240@o2ib15: cfg command failed: rc = -22
      Lustre:    cmd=cf014 0:lustre-MDT0021-mdtlov  1:lustre-MDT0014_UUID  2:20  3:1
      
      LustreError: 15b-f: MGC192.168.112.240@o2ib15: The configuration from log 'lustre-MDT0021'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versions of Lustre.
      LustreError: 11667:0:(obd_mount_server.c:1309:server_start_targets()) failed to start server lustre-MDT0021: -22
      LustreError: 11667:0:(obd_mount_server.c:1798:server_fill_super()) Unable to start targets: -22
      LustreError: 11667:0:(obd_mount_server.c:1512:server_put_super()) no obd lustre-MDT0021
      Lustre: server umount lustre-MDT0021 complete
      LustreError: 11667:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount  (-22)
      

      The config logs CONFIGS/lustre-MDT* do not all have the same number of records. lustre-MDT0021 has 2 more records than the other 29 MDTs.

      The suspicious llog records are:

      #04 (152)setup     0:lustre-MDT0014-osp-MDT0021  1:lustre-MDT0014_UUID  2:192.168.113.6@o2ib15
      #05 (136)modify_mdc_tgts add 0:lustre-MDT0021-mdtlov  1:lustre-MDT0014_UUID  2:20  3:1
      #179 (152)setup     0:lustre-MDT0014-osp-MDT0021  1:lustre-MDT0014_UUID  2:192.168.113.6@o2ib15
      #180 (136)modify_mdc_tgts add 0:lustre-MDT0021-mdtlov  1:lustre-MDT0014_UUID  2:20  3:1
      

      Attachments

        1. dk.catalyst240
          3.25 MB
        2. dmesg.catalyst240
          0.8 kB
        3. ldev.conf
          3 kB
        4. llog.MDT0021.onMGS
          41 kB
        5. llog.MDT0000.onMGS
          41 kB
        6. mgs.register_mdts.dk.gz
          0.2 kB

        Activity

          [LU-8044] class_process_config() no device for: lustre-MDT0021-mdtlov

          patch has landed to master for 2.9.0

          jgmitter Joseph Gmitter (Inactive) added a comment - patch has landed to master for 2.9.0

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19658/
          Subject: LU-8044 mgs: Only add OSP for registered MDT
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: c67a74b55c126ec1be6c195cb2e8cb8c2e6cf868

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19658/ Subject: LU-8044 mgs: Only add OSP for registered MDT Project: fs/lustre-release Branch: master Current Patch Set: Commit: c67a74b55c126ec1be6c195cb2e8cb8c2e6cf868

          Hi Di,
          Assigning to you as I see you have already commented and provided a fix in a new patch.
          Thanks.
          Joe

          jgmitter Joseph Gmitter (Inactive) added a comment - Hi Di, Assigning to you as I see you have already commented and provided a fix in a new patch. Thanks. Joe
          di.wang Di Wang added a comment -

          Olaf: the new debug log seems not catch the failure, probably too late or -1 make the dk log too big to catch all of information? But anyway the patch 19658 should help here. Please try this when you have another chance. Thanks.

          di.wang Di Wang added a comment - Olaf: the new debug log seems not catch the failure, probably too late or -1 make the dk log too big to catch all of information? But anyway the patch 19658 should help here. Please try this when you have another chance. Thanks.

          wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/19658
          Subject: LU-8044 mgs: Only add OSP for registered MDT
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 3ccd18da205192ec0ad527ec88b69793aa5e6670

          gerrit Gerrit Updater added a comment - wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/19658 Subject: LU-8044 mgs: Only add OSP for registered MDT Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3ccd18da205192ec0ad527ec88b69793aa5e6670
          ofaaland Olaf Faaland added a comment -

          Attach debug log from MGS with debug = -1, while MDTs coming up for first time.

          In this log, MDT0002 (on catalyst243, NID 192.168.112.243@o2ib15) encountered the error.

          Lustre: Lustre: Build Version: 2.8.0
          LustreError: 11826:0:(obd_config.c:1262:class_process_config()) no device for: lustre-MDT0002-mdtlov
          LustreError: 11826:0:(obd_config.c:1666:class_config_llog_handler()) MGC192.168.112.240@o2ib15: cfg command failed: rc = -22
          Lustre:    cmd=cf014 0:lustre-MDT0002-mdtlov  1:lustre-MDT0023_UUID  2:35  3:1
          
          LustreError: 15b-f: MGC192.168.112.240@o2ib15: The configuration from log 'lustre-MDT0002'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versions of Lustre.
          LustreError: 11696:0:(obd_mount_server.c:1309:server_start_targets()) failed to start server lustre-MDT0002: -22
          LustreError: 11696:0:(obd_mount_server.c:1798:server_fill_super()) Unable to start targets: -22
          LustreError: 11696:0:(obd_mount_server.c:1512:server_put_super()) no obd lustre-MDT0002
          Lustre: server umount lustre-MDT0002 complete
          LustreError: 11696:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount  (-22)
          
          ofaaland Olaf Faaland added a comment - Attach debug log from MGS with debug = -1, while MDTs coming up for first time. In this log, MDT0002 (on catalyst243, NID 192.168.112.243@o2ib15) encountered the error. Lustre: Lustre: Build Version: 2.8.0 LustreError: 11826:0:(obd_config.c:1262:class_process_config()) no device for: lustre-MDT0002-mdtlov LustreError: 11826:0:(obd_config.c:1666:class_config_llog_handler()) MGC192.168.112.240@o2ib15: cfg command failed: rc = -22 Lustre: cmd=cf014 0:lustre-MDT0002-mdtlov 1:lustre-MDT0023_UUID 2:35 3:1 LustreError: 15b-f: MGC192.168.112.240@o2ib15: The configuration from log 'lustre-MDT0002'failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre. LustreError: 11696:0:(obd_mount_server.c:1309:server_start_targets()) failed to start server lustre-MDT0002: -22 LustreError: 11696:0:(obd_mount_server.c:1798:server_fill_super()) Unable to start targets: -22 LustreError: 11696:0:(obd_mount_server.c:1512:server_put_super()) no obd lustre-MDT0002 Lustre: server umount lustre-MDT0002 complete LustreError: 11696:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount (-22)
          di.wang Di Wang added a comment -

          Ah, it looks like a race when MGS register 2 MDTs at the same time, I will cook a patch.

          di.wang Di Wang added a comment - Ah, it looks like a race when MGS register 2 MDTs at the same time, I will cook a patch.
          di.wang Di Wang added a comment -

          According to the config log, it looks OSP (lustre-MDT0014-osp-MDT0021) setup record is added before "lov setup", which is clearly wrong.

          #01 (224)marker 865 (flags=0x01, v2.8.0.0) lustre-MDT0014  'add osp' Tue Apr 19 08:55:48 2016-
          #02 (088)add_uuid  nid=192.168.113.6@o2ib15(0x5000fc0a87106)  0:  1:192.168.113.6@o2ib15
          #03 (144)attach    0:lustre-MDT0014-osp-MDT0021  1:osp  2:lustre-MDT0021-mdtlov_UUID
          #04 (152)setup     0:lustre-MDT0014-osp-MDT0021  1:lustre-MDT0014_UUID  2:192.168.113.6@o2ib15
          #05 (136)modify_mdc_tgts add 0:lustre-MDT0021-mdtlov  1:lustre-MDT0014_UUID  2:20  3:1
          #06 (224)END   marker 865 (flags=0x02, v2.8.0.0) lustre-MDT0014  'add osp' Tue Apr 19 08:55:48 2016-
          #07 (224)marker 873 (flags=0x01, v2.8.0.0) lustre-MDT0021  'add mdt' Tue Apr 19 08:55:48 2016-
          #08 (120)attach    0:lustre-MDT0021  1:mdt  2:lustre-MDT0021_UUID
          #09 (112)mount_option 0:  1:lustre-MDT0021  2:lustre-MDT0021-mdtlov
          #10 (160)setup     0:lustre-MDT0021  1:lustre-MDT0021_UUID  2:33  3:lustre-MDT0021-mdtlov  4:f
          #11 (224)END   marker 873 (flags=0x02, v2.8.0.0) lustre-MDT0021  'add mdt' Tue Apr 19 08:55:48 2016-
          

          I am checking the debug log on MGS to see why this happen.

          Olaf, Could you please try to reproduce the log with debug level = -1 on MGS? it will help me to figure out what happens there. thanks.

          di.wang Di Wang added a comment - According to the config log, it looks OSP (lustre-MDT0014-osp-MDT0021) setup record is added before "lov setup", which is clearly wrong. #01 (224)marker 865 (flags=0x01, v2.8.0.0) lustre-MDT0014 'add osp' Tue Apr 19 08:55:48 2016- #02 (088)add_uuid nid=192.168.113.6@o2ib15(0x5000fc0a87106) 0: 1:192.168.113.6@o2ib15 #03 (144)attach 0:lustre-MDT0014-osp-MDT0021 1:osp 2:lustre-MDT0021-mdtlov_UUID #04 (152)setup 0:lustre-MDT0014-osp-MDT0021 1:lustre-MDT0014_UUID 2:192.168.113.6@o2ib15 #05 (136)modify_mdc_tgts add 0:lustre-MDT0021-mdtlov 1:lustre-MDT0014_UUID 2:20 3:1 #06 (224)END marker 865 (flags=0x02, v2.8.0.0) lustre-MDT0014 'add osp' Tue Apr 19 08:55:48 2016- #07 (224)marker 873 (flags=0x01, v2.8.0.0) lustre-MDT0021 'add mdt' Tue Apr 19 08:55:48 2016- #08 (120)attach 0:lustre-MDT0021 1:mdt 2:lustre-MDT0021_UUID #09 (112)mount_option 0: 1:lustre-MDT0021 2:lustre-MDT0021-mdtlov #10 (160)setup 0:lustre-MDT0021 1:lustre-MDT0021_UUID 2:33 3:lustre-MDT0021-mdtlov 4:f #11 (224)END marker 873 (flags=0x02, v2.8.0.0) lustre-MDT0021 'add mdt' Tue Apr 19 08:55:48 2016- I am checking the debug log on MGS to see why this happen. Olaf, Could you please try to reproduce the log with debug level = -1 on MGS? it will help me to figure out what happens there. thanks.
          ofaaland Olaf Faaland added a comment -

          Di,

          For the next few hours, I can either gather more information for you, or experiment. About 4 hours from now, I'll have to put the nodes back to their production use and the filesystem will be destroyed.

          thanks,
          Olaf

          ofaaland Olaf Faaland added a comment - Di, For the next few hours, I can either gather more information for you, or experiment. About 4 hours from now, I'll have to put the nodes back to their production use and the filesystem will be destroyed. thanks, Olaf
          ofaaland Olaf Faaland added a comment -

          Config log for MDT0000.
          The log for MDT0021 is already attached.

          ofaaland Olaf Faaland added a comment - Config log for MDT0000. The log for MDT0021 is already attached.

          People

            di.wang Di Wang
            ofaaland Olaf Faaland
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: