Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14403

lctl dl UP and lfs df problem with conf_param osc.active=0 after client remount

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.13.0, Lustre 2.12.6
    • None
    • CentOS 7.6
    • 3
    • 9223372036854775807

    Description

      This is related to LU-7668. The Lustre Manual, in section 14.9.3. Removing an OST from the File System, recommends to useĀ lctl conf_param ost_name.osc.active=0 to permanently disable OSTs.

      We are trying to permanently disable 12 old empty OSTs on our Oak filesystem. We used commands like these:

      lctl conf_param oak-OST0000.osc.active=0
      

      Lustre logs seem to indicate it works OK:

      20000000:02000400:16.0:1612760209.795689:0:334624:0:(mgs_llog.c:3964:mgs_write_log_param()) Permanently deactivating oak-OST0000
      

      On already mounted clients, lctl dl shows the OBD status inactive:

      [root@oak-rbh01 ~]# lctl dl | grep oak-OST0000
        9 IN osc oak-OST0000-osc-ffff9125e10c3800 f532ae1d-6c67-fa34-deaa-5a130b24844f 4
      

      Also, lfs df works as expected for already mounted clients:

      [root@oak-rbh01 ~]# lfs df -v  /oak | grep OST0000
      OST0000             : inactive device
      

      However, we have observed the following when using Lustre 2.12.6 after client remount:

      • the OBD state as reported by lctl dl comes back to UP instead of IN
      [root@oak-h01v10 ~]# lctl dl | grep oak-OST0000
        9 UP osc oak-OST0000-osc-ffff9c5b6b90f800 523b8803-837d-acf8-a8e6-aae2d47585ac 3
      
      • the OSC state, however, is properly set to 0
      [root@oak-h01v10 ~]# cat /sys/fs/lustre/osc/oak-OST0000-osc-ffff9c5b6b90f800/active 
      0
      
      • a lfs check osts reports the following error:
        [root@oak-h01v10 ~]# lfs check osts
        lfs check: error: check 'oak-OST0000-osc-ffff9c5b6b90f800': Cannot allocate memory (12)
        ...
        
      • lfs df shows the following error for the permanently deactivated OST:
        OST0000             : Invalid argument
        

      I'm attaching client logs of a remounting client. We can see that the OST is disabled:

      00020000:01000000:0.0:1612805236.515621:0:2155:0:(lov_obd.c:166:lov_connect_obd()) not connecting OSC oak-OST0000_UUID; administratively disabled
      

      It looks like at some point, the status of the OBD is not updated properly at mount time and this seems to be causing the confusion. Ideally, we would like to see the same behavior after client remount (IN in lctl dl and lfs df -v showing inactive device). Any ideas on how best to fix/improve this? Thanks!

      Attachments

        Issue Links

          Activity

            People

              sthiell Stephane Thiell
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: