Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14403

lctl dl UP and lfs df problem with conf_param osc.active=0 after client remount

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.13.0, Lustre 2.12.6
    • None
    • CentOS 7.6
    • 3
    • 9223372036854775807

    Description

      This is related to LU-7668. The Lustre Manual, in section 14.9.3. Removing an OST from the File System, recommends to use lctl conf_param ost_name.osc.active=0 to permanently disable OSTs.

      We are trying to permanently disable 12 old empty OSTs on our Oak filesystem. We used commands like these:

      lctl conf_param oak-OST0000.osc.active=0
      

      Lustre logs seem to indicate it works OK:

      20000000:02000400:16.0:1612760209.795689:0:334624:0:(mgs_llog.c:3964:mgs_write_log_param()) Permanently deactivating oak-OST0000
      

      On already mounted clients, lctl dl shows the OBD status inactive:

      [root@oak-rbh01 ~]# lctl dl | grep oak-OST0000
        9 IN osc oak-OST0000-osc-ffff9125e10c3800 f532ae1d-6c67-fa34-deaa-5a130b24844f 4
      

      Also, lfs df works as expected for already mounted clients:

      [root@oak-rbh01 ~]# lfs df -v  /oak | grep OST0000
      OST0000             : inactive device
      

      However, we have observed the following when using Lustre 2.12.6 after client remount:

      • the OBD state as reported by lctl dl comes back to UP instead of IN
      [root@oak-h01v10 ~]# lctl dl | grep oak-OST0000
        9 UP osc oak-OST0000-osc-ffff9c5b6b90f800 523b8803-837d-acf8-a8e6-aae2d47585ac 3
      
      • the OSC state, however, is properly set to 0
      [root@oak-h01v10 ~]# cat /sys/fs/lustre/osc/oak-OST0000-osc-ffff9c5b6b90f800/active 
      0
      
      • a lfs check osts reports the following error:
        [root@oak-h01v10 ~]# lfs check osts
        lfs check: error: check 'oak-OST0000-osc-ffff9c5b6b90f800': Cannot allocate memory (12)
        ...
        
      • lfs df shows the following error for the permanently deactivated OST:
        OST0000             : Invalid argument
        

      I'm attaching client logs of a remounting client. We can see that the OST is disabled:

      00020000:01000000:0.0:1612805236.515621:0:2155:0:(lov_obd.c:166:lov_connect_obd()) not connecting OSC oak-OST0000_UUID; administratively disabled
      

      It looks like at some point, the status of the OBD is not updated properly at mount time and this seems to be causing the confusion. Ideally, we would like to see the same behavior after client remount (IN in lctl dl and lfs df -v showing inactive device). Any ideas on how best to fix/improve this? Thanks!

      Attachments

        Issue Links

          Activity

            [LU-14403] lctl dl UP and lfs df problem with conf_param osc.active=0 after client remount

            The "lctl del_ost" command was included into Lustre 2.15 via LU-7668.

            adilger Andreas Dilger added a comment - The " lctl del_ost " command was included into Lustre 2.15 via LU-7668 .

            Andreas,

            Using lctl llog_cancel seems to work on my test system. We haven't tried on Oak yet though.

            I've also pushed a patch with a proposal for lctl del_ost as described in LU-7668, happy to improve it and add some tests if you think this could make sense.

            sthiell Stephane Thiell added a comment - Andreas, Using lctl llog_cancel seems to work on my test system. We haven't tried on Oak yet though. I've also pushed a patch with a proposal for  lctl del_ost  as described in LU-7668 , happy to improve it and add some tests if you think this could make sense.

            Thanks Andreas. I will try and report back!

            sthiell Stephane Thiell added a comment - Thanks Andreas. I will try and report back!

            Stephane,
            it is possible to permanently remove/deactivate the configuration records for those OSTs from the config record itself, so that they are no longer even present in "lctl dl", rather than being present but inactive. Please see instructions in LU-7668, which I've just updated to have examples. If that process works for you, I can add this information into the manual, though it would be better in the long term to actually implement the logic for "lctl del_ost" as described in that ticket.

            adilger Andreas Dilger added a comment - Stephane, it is possible to permanently remove/deactivate the configuration records for those OSTs from the config record itself, so that they are no longer even present in " lctl dl ", rather than being present but inactive. Please see instructions in LU-7668 , which I've just updated to have examples. If that process works for you, I can add this information into the manual, though it would be better in the long term to actually implement the logic for " lctl del_ost " as described in that ticket.

            People

              sthiell Stephane Thiell
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: