Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.4.0
    • Lustre 2.4.0
    • None
    • 3
    • 5533

    Description

      Our sysadmins were expanding one of our 2.1 filesystems yesterday and ran into a problem. Because of previous 1.8 problems that made out-of-order OST registration problematic, the admin was using a script to mount the OSTs one at a time, in sequential ID order.

      With each new OST registration the MGS lock is revoked. Lock timeouts were not infrequent.

      One OST hit a timeout while communicating with the MGS, and this proved to be a fairly non-recoverable event. I was forced to break out hexedit to get things working again reasonably.

      2012-03-22 10:07:58 LustreError: 166-1: MGC172.19.1.100@o2ib100: Connection to MGS (at 172.19.1.100@o2ib100) was lost; in progress operations using this service will fail
      2012-03-22 10:07:58 LustreError: 12150:0:(ldlm_request.c:115:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1332435772, 306s ago), entering recovery for MGS@MGC172.19.1.100@o2ib100_0 ns: MGC172.19.1.100@o2ib100 lock: ffff880314ba0480/0x54303899ba77f6ed lrc: 4/1,0 mode: --/CR res: 6517612/0 rrc: 1 type: PLN flags: 0x10000010 remote: 0x1d01b502e0a66a07 expref: -99 pid: 12150 timeout 0
      2012-03-22 10:07:59 LustreError: 12718:0:(obd_mount.c:1163:server_start_targets()) Required registration failed for lsc-OST0174: -4
      2012-03-22 10:07:59 LustreError: 12718:0:(obd_mount.c:1719:server_fill_super()) Unable to start targets: -4
      2012-03-22 10:07:59 LustreError: 12718:0:(obd_mount.c:1508:server_put_super()) no obd lsc-OST0174
      2012-03-22 10:07:59 LustreError: 12718:0:(obd_mount.c:141:server_deregister_mount()) lsc-OST0174 not registered
      2012-03-22 10:07:59 LustreError: 11-0: MGC172.19.1.100@o2ib100: Communicating with 172.19.1.100@o2ib100, operation mgs_connect failed with -16.
      2012-03-22 10:07:59 Lustre: server umount lsc-OST0174 complete
      2012-03-22 10:07:59 LustreError: 12718:0:(obd_mount.c:2160:lustre_fill_super()) Unable to mount  (-4)
      2012-03-22 10:08:24 LustreError: 11-0: MGC172.19.1.100@o2ib100: Communicating with 172.19.1.100@o2ib100, operation mgs_connect failed with -16.
      2012-03-22 10:08:49 LustreError: 11-0: MGC172.19.1.100@o2ib100: Communicating with 172.19.1.100@o2ib100, operation mgs_connect failed with -16.
      2012-03-22 10:09:14 LustreError: 11-0: MGC172.19.1.100@o2ib100: Communicating with 172.19.1.100@o2ib100, operation mgs_connect failed with -16.
      2012-03-22 10:09:38 LustreError: 11-0: MGC172.19.1.100@o2ib100: Communicating with 172.19.1.100@o2ib100, operation mgs_connect failed with -16.
      

      The admin reran the mount command several more times and each time got either -4 (EINTR) which corresponds with an instance of a lock timeout as above, or -5 (EIO).

      When I was called in, I tried the mount my self to start gathering info. At that point it returned -98 (EADDRINUSE), and from the MGS logs that it HAD finally processed the OST's registration:

      2012-03-22 11:09:22 LustreError: 140-5: Server lsc-OST0174 requested index 372, but that index is already in use. Use --writeconf to force
      2012-03-22 11:09:22 LustreError: 12649:0:(mgs_llog.c:2710:mgs_write_log_target()) Can't get index (-98)
      2012-03-22 11:09:22 LustreError: 12649:0:(mgs_handler.c:520:mgs_handle_target_reg()) Failed to write lsc-OST0174 log (-98)
      

      But the OST does not know that the registration succeeded, so its mountdata still has the flag LDD_F_VIRGIN set. Because of that, the MGS will never let the OST connect.

      That left us two courses of action (to the best of my knowledge)

      1. Unmount the filesystem completely, use --writeconf on the MGS, restart everything
      2. Use a hexeditor on the OST's mountdata file to clear the LDD_F_VIRGIN flag

      Since we did not want to cause a downtime for the filesystem, we chose the latter.

      The mount of the OST seemed to mostly go well, and it appears to be functioning fine now, but I did see this error on the MGS/MDS console:

      2012-03-22 15:09:48 Lustre: Found index 372 for lsc-OST0174, updating log
      2012-03-22 15:12:26 LustreError: 6002:0:(obd_config.c:1019:class_process_config()) no device for: lsc-OST0174-osc
      2012-03-22 15:12:26 LustreError: 6002:0:(obd_config.c:1363:class_config_llog_handler()) Err -22 on cfg command:
      2012-03-22 15:12:26 Lustre:    cmd=cf00b 0:lsc-OST0174-osc  1:172.19.1.127@o2ib100  
      

      The OST appears to be working fine, so I am not sure how worried I should be about that llog error.

      I will attach some console logs to show what was going when the OST registration failed.

      Attachments

        Issue Links

          Activity

            [LU-1257] OST registration snafu

            Thank you, Chris.

            niu Niu Yawei (Inactive) added a comment - Thank you, Chris.

            I suppose it can be closed. In the future though, I would prefer that the side issues be fixed in new tickets and we leave the original ticket open to deal with the root issue.

            morrone Christopher Morrone (Inactive) added a comment - I suppose it can be closed. In the future though, I would prefer that the side issues be fixed in new tickets and we leave the original ticket open to deal with the root issue.

            LU-4966 is created to track the improvement. Chirs, can this be closed?

            niu Niu Yawei (Inactive) added a comment - LU-4966 is created to track the improvement. Chirs, can this be closed?

            patch 2432 (inconsistent osc name problem) has been landed.

            To the problem related to registration design, I'd open another ticket to track it.

            niu Niu Yawei (Inactive) added a comment - patch 2432 (inconsistent osc name problem) has been landed. To the problem related to registration design, I'd open another ticket to track it.

            Yes, Chris. I just rebased the patch, and will keep it moving forward. Thanks.

            niu Niu Yawei (Inactive) added a comment - Yes, Chris. I just rebased the patch, and will keep it moving forward. Thanks.

            This needs attention. Change 2432 for master has been sitting for a few months.

            morrone Christopher Morrone (Inactive) added a comment - This needs attention. Change 2432 for master has been sitting for a few months.

            That is a good start.

            morrone Christopher Morrone (Inactive) added a comment - That is a good start.

            These are all just band-aids for what appears to be a fundamentally a bad design for initial registration.

            Really, the OST should probably generate some kind of random number and use that to identify itself upon first connection. Then if a problem occurs during registration, the MDS will be able to identify the OST as really the same OST when it connects again and can allow registration to be replayed. Or something along those lines.

            I agree, but that will be a feature enhancement work, and I'm not sure if we have avaiable resource to work on that for now. Let's fix the inconsistent osc name defect first (http://review.whamcloud.com/#change,2432), it's not related to the target registeration design. What do you think about, Chris?

            niu Niu Yawei (Inactive) added a comment - These are all just band-aids for what appears to be a fundamentally a bad design for initial registration. Really, the OST should probably generate some kind of random number and use that to identify itself upon first connection. Then if a problem occurs during registration, the MDS will be able to identify the OST as really the same OST when it connects again and can allow registration to be replayed. Or something along those lines. I agree, but that will be a feature enhancement work, and I'm not sure if we have avaiable resource to work on that for now. Let's fix the inconsistent osc name defect first ( http://review.whamcloud.com/#change,2432 ), it's not related to the target registeration design. What do you think about, Chris?

            These are all just band-aids for what appears to be a fundamentally a bad design for initial registration.

            Really, the OST should probably generate some kind of random number and use that to identify itself upon first connection. Then if a problem occurs during registration, the MDS will be able to identify the OST as really the same OST when it connects again and can allow registration to be replayed. Or something along those lines.

            Clearing the virgin flag on the OST requires a level of knowledge of Lustre's internals that few people have. I'm not sure that we should even mention it in a console message. The console message should probably be "consult your lustre support vendor" and only when an expert has decided that the conditions are really correct would they say to use the --clear-virgin option.

            morrone Christopher Morrone (Inactive) added a comment - These are all just band-aids for what appears to be a fundamentally a bad design for initial registration. Really, the OST should probably generate some kind of random number and use that to identify itself upon first connection. Then if a problem occurs during registration, the MDS will be able to identify the OST as really the same OST when it connects again and can allow registration to be replayed. Or something along those lines. Clearing the virgin flag on the OST requires a level of knowledge of Lustre's internals that few people have. I'm not sure that we should even mention it in a console message. The console message should probably be "consult your lustre support vendor" and only when an expert has decided that the conditions are really correct would they say to use the --clear-virgin option.

            Integrated in lustre-reviews » x86_64,server,el6,inkernel #4626
            LU-1257 mgs: treat -EADDRINUSE as non-fatal error (Revision d14d0804d89154ff5a1cac0534f027c97d729b61)

            Result = SUCCESS
            Niu Yawei : d14d0804d89154ff5a1cac0534f027c97d729b61
            Files :

            • lustre/obdclass/obd_mount.c
            • lustre/mgs/mgs_handler.c
            • lustre/mgs/mgs_llog.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-reviews » x86_64,server,el6,inkernel #4626 LU-1257 mgs: treat -EADDRINUSE as non-fatal error (Revision d14d0804d89154ff5a1cac0534f027c97d729b61) Result = SUCCESS Niu Yawei : d14d0804d89154ff5a1cac0534f027c97d729b61 Files : lustre/obdclass/obd_mount.c lustre/mgs/mgs_handler.c lustre/mgs/mgs_llog.c

            People

              niu Niu Yawei (Inactive)
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: