Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4397

Permanently disabled OST causes clients to hang on df (statfs)

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.6.0, Lustre 2.5.2
    • Lustre 2.4.1
    • None
    • RHEL6 2.6.32-358.18.1.el6.x86_64

    Description

      A no longer existing OST has been permanently disabled on the MGS using

      lctl conf_param vegas-OST0059.osc.active=0

      After this, clients hang on df (after lustre is remounted the next time). strace shows it hanging on a statfs call.

      Tried with the lustre mount option lazystatfs (undocumented?) which is supposed to work around this issue, but it did not help with lustre 2.4.1, clients still hang on df. This is the standard df, lfs df seems to work OK.

      Attachments

        Issue Links

          Activity

            [LU-4397] Permanently disabled OST causes clients to hang on df (statfs)

            This seems to work. The syntax is

            lctl conf_param <fsname>.llite.lazystafs=1

            So, is this expected behaviour and setting this option is required on any lustre filesystem with permanently deactivated OSTs? Or is it still a bug? The conf_param option does not seem to be documented.

            wbaudler Wolfgang Baudler added a comment - This seems to work. The syntax is lctl conf_param <fsname>.llite.lazystafs=1 So, is this expected behaviour and setting this option is required on any lustre filesystem with permanently deactivated OSTs? Or is it still a bug? The conf_param option does not seem to be documented.

            Note that it is also possible to set lazystatfs permanently on all clients using:

            lctl conf_param llite.lazystatfs=1
            
            adilger Andreas Dilger added a comment - Note that it is also possible to set lazystatfs permanently on all clients using: lctl conf_param llite.lazystatfs=1

            I can confirm the behaviour described by Eric Kolb above. Problem exists on 2.5.3. I did not get a chance to test it with 2.5.2, so not sure if it was re-introduced or if it was never fixed.

            wbaudler Wolfgang Baudler added a comment - I can confirm the behaviour described by Eric Kolb above. Problem exists on 2.5.3. I did not get a chance to test it with 2.5.2, so not sure if it was re-introduced or if it was never fixed.
            ekolb Eric Kolb added a comment -

            Hello,

            We recently upgraded out clients to 2.5.3 and this very issue appears to have manifested itself again?

            Lustre: setting import RSF1-OST0007_UUID INACTIVE by administrator request
            Lustre: Layout lock feature supported.
            Lustre: Mounted RSF1-client

            $ strace -v -f df /RSF1
            statfs("/var/lib/nfs/rpc_pipefs", {f_type=0x67596969, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid=

            {0, 0}

            , f_namelen=255, f_frsize=4096}) = 0
            statfs("/RSF1",

            At which point the df hangs until reboot.

            Because some of these client re-export Lustre via NFS and Samba the exportfs etc. hand on boot. For now we put a hack in /etc/rc.local as work around

            /etc/init.d/samba stop
            /etc/init.d/nfs stop
            mount -t lustre 10.82.0.15@tcp1:/RSF1 /RSF1
            /usr/sbin/lctl set_param llite.*.lazystatfs=1
            /etc/init.d/nfs start
            /etc/init.d/samba start

            Perhaps we missed something but this seem fit our experience.


            Eric Kolb
            Data Centre Services
            University of Victoria
            Office: 250-721-7658

            ekolb Eric Kolb added a comment - Hello, We recently upgraded out clients to 2.5.3 and this very issue appears to have manifested itself again? Lustre: setting import RSF1-OST0007_UUID INACTIVE by administrator request Lustre: Layout lock feature supported. Lustre: Mounted RSF1-client $ strace -v -f df /RSF1 statfs("/var/lib/nfs/rpc_pipefs", {f_type=0x67596969, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid= {0, 0} , f_namelen=255, f_frsize=4096}) = 0 statfs("/RSF1", At which point the df hangs until reboot. Because some of these client re-export Lustre via NFS and Samba the exportfs etc. hand on boot. For now we put a hack in /etc/rc.local as work around /etc/init.d/samba stop /etc/init.d/nfs stop mount -t lustre 10.82.0.15@tcp1:/RSF1 /RSF1 /usr/sbin/lctl set_param llite.*.lazystatfs=1 /etc/init.d/nfs start /etc/init.d/samba start Perhaps we missed something but this seem fit our experience. – Eric Kolb Data Centre Services University of Victoria Office: 250-721-7658
            pjones Peter Jones added a comment -

            Landed for 2.5.2 and 2.6

            pjones Peter Jones added a comment - Landed for 2.5.2 and 2.6

            The change 7762 was landed for 2.5.0, but I found another problem related to this on my test system: http://review.whamcloud.com/8949

            adilger Andreas Dilger added a comment - The change 7762 was landed for 2.5.0, but I found another problem related to this on my test system: http://review.whamcloud.com/8949

            Was this problem resolved with Change, 7762?
            Let us know if this ticket can be closed.

            jlevi Jodi Levi (Inactive) added a comment - Was this problem resolved with Change, 7762? Let us know if this ticket can be closed.

            Seems this may be the same as LU-4010 - patch is http://review.whamcloud.com/7762

            adilger Andreas Dilger added a comment - Seems this may be the same as LU-4010 - patch is http://review.whamcloud.com/7762

            People

              wc-triage WC Triage
              wbaudler Wolfgang Baudler
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: