[LU-4397] Permanently disabled OST causes clients to hang on df (statfs) Created: 19/Dec/13  Updated: 01/Dec/17  Resolved: 20/May/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.1
Fix Version/s: Lustre 2.6.0, Lustre 2.5.2

Type: Bug Priority: Major
Reporter: Wolfgang Baudler Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None
Environment:

RHEL6 2.6.32-358.18.1.el6.x86_64


Issue Links:
Duplicate
duplicates LU-4010 Don't wait for active target if OBD_S... Resolved
Related
is related to LU-7668 permanently remove deactivated OSTs f... Resolved
Epic/Theme: OST, client
Severity: 4
Epic: client, hang
Rank (Obsolete): 12067

 Description   

A no longer existing OST has been permanently disabled on the MGS using

lctl conf_param vegas-OST0059.osc.active=0

After this, clients hang on df (after lustre is remounted the next time). strace shows it hanging on a statfs call.

Tried with the lustre mount option lazystatfs (undocumented?) which is supposed to work around this issue, but it did not help with lustre 2.4.1, clients still hang on df. This is the standard df, lfs df seems to work OK.



 Comments   
Comment by Andreas Dilger [ 21/Dec/13 ]

Seems this may be the same as LU-4010 - patch is http://review.whamcloud.com/7762

Comment by Jodi Levi (Inactive) [ 08/Jan/14 ]

Was this problem resolved with Change, 7762?
Let us know if this ticket can be closed.

Comment by Andreas Dilger [ 21/Jan/14 ]

The change 7762 was landed for 2.5.0, but I found another problem related to this on my test system: http://review.whamcloud.com/8949

Comment by Peter Jones [ 20/May/14 ]

Landed for 2.5.2 and 2.6

Comment by Eric Kolb [ 11/Mar/15 ]

Hello,

We recently upgraded out clients to 2.5.3 and this very issue appears to have manifested itself again?

Lustre: setting import RSF1-OST0007_UUID INACTIVE by administrator request
Lustre: Layout lock feature supported.
Lustre: Mounted RSF1-client

$ strace -v -f df /RSF1
statfs("/var/lib/nfs/rpc_pipefs", {f_type=0x67596969, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid=

{0, 0}

, f_namelen=255, f_frsize=4096}) = 0
statfs("/RSF1",

At which point the df hangs until reboot.

Because some of these client re-export Lustre via NFS and Samba the exportfs etc. hand on boot. For now we put a hack in /etc/rc.local as work around

/etc/init.d/samba stop
/etc/init.d/nfs stop
mount -t lustre 10.82.0.15@tcp1:/RSF1 /RSF1
/usr/sbin/lctl set_param llite.*.lazystatfs=1
/etc/init.d/nfs start
/etc/init.d/samba start

Perhaps we missed something but this seem fit our experience.


Eric Kolb
Data Centre Services
University of Victoria
Office: 250-721-7658

Comment by Wolfgang Baudler [ 07/Dec/15 ]

I can confirm the behaviour described by Eric Kolb above. Problem exists on 2.5.3. I did not get a chance to test it with 2.5.2, so not sure if it was re-introduced or if it was never fixed.

Comment by Andreas Dilger [ 08/Dec/15 ]

Note that it is also possible to set lazystatfs permanently on all clients using:

lctl conf_param llite.lazystatfs=1
Comment by Wolfgang Baudler [ 08/Dec/15 ]

This seems to work. The syntax is

lctl conf_param <fsname>.llite.lazystafs=1

So, is this expected behaviour and setting this option is required on any lustre filesystem with permanently deactivated OSTs? Or is it still a bug? The conf_param option does not seem to be documented.

Generated at Sat Feb 10 01:42:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.