[LU-9210] 'ls' hung because of statahead delay Created: 14/Mar/17  Updated: 04/Nov/18  Due: 31/Mar/17  Resolved: 19/Jun/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Major
Reporter: Lai Siyao Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Related
is related to LU-11616 Optimize handling statahead delay Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Our main problem is that sometimes ls -l takes a long time to answer as shown below.

[sh-5-34 ~]$ alias ls
alias ls='ls --color=auto'

[sh-5-34 ~]$ echo $GROUP
/groups/alice

Sometimes ls takes 30 seconds to 1 minute:

[sthiell@sh-5-34 ~]$ time ls $GROUP
addiso feel_free_to_test foobar pictures preese-dir software some_group_datadir vmd-1.9.2
real 0m30.012s
user 0m0.001s
sys 0m0.004s

Note 1: when I enable lustre debugging on the client, I cannot reproduce the issue, which is a bit annoying for the bug report.
Note 2: when I set statahead_max to 0 the problem is gone, and is easily seen as soon as I re-enable statahead_max (set to 1 or more)
Note 3: it’s also very difficult to reproduce the issue when using ls with strace, but I have seen one time that ls was blocking in lstat()

The relevant part of the log is as below:

00000080:00400000:3.0:1489184551.164004:0:25907:0:(statahead.c:683:ll_statahead_interpret()) sa_entry software rc -13
00000080:00400000:0.0:1489184581.163471:0:25086:0:(statahead.c:1666:ll_statahead()) revalidate statahead software: -11.

Obviously statahead failure didn't notify 'ls' process in time, and cause it slow.



 Comments   
Comment by Gerrit Updater [ 30/May/17 ]

Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/27329
Subject: LU-9210 statahead: missing barrier before wake_up
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a291004810c0f17f0410da80238d82127b78d8d8

Comment by Gerrit Updater [ 30/May/17 ]

Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/27330
Subject: LU-9210 statahead: missing barrier before wake_up
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1536a7b780d77fcfd11ce8e81b1ca4316a7040ad

Comment by Gerrit Updater [ 19/Jun/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27330/
Subject: LU-9210 statahead: missing barrier before wake_up
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d7fe5b6152d9f01f003c93cc1367455c30dc35ed

Comment by Peter Jones [ 19/Jun/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:24:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.