[LU-5106] Test failure sanity test_123a: ls 10000 files is slower with statahead! Created: 27/May/14  Updated: 13/Feb/21  Resolved: 19/Jul/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Major
Reporter: Maloo Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-8643 sanity test_123a: test failed to resp... Resolved
Severity: 3
Rank (Obsolete): 14086

 Description   

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite run:
http://maloo.whamcloud.com/test_sets/a4d76d66-baba-11e3-a27d-52540035b04c
http://maloo.whamcloud.com/test_sets/e3f486ac-e317-11e3-93d9-52540035b04c

The sub-test test_123a failed with the following error:

ls 10000 files is slower with statahead!

Info required for matching: sanity 123a



 Comments   
Comment by Andreas Dilger [ 27/May/14 ]

This is most likely related to the VM being slow or similar.

Comment by Jian Yu [ 10/Jun/14 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/61/
Distro/Arch: RHEL6.5/x86_64 + SLES11SP3/x86_64 (Server + Client)

The same failure occurred: https://maloo.whamcloud.com/test_sets/82588a38-ef83-11e3-b8c2-52540035b04c

Comment by Bob Glossman (Inactive) [ 25/Sep/15 ]

another seen with el6.7 client/server on master:
https://testing.hpdd.intel.com/test_sets/69e7442c-6321-11e5-b25a-5254006e85c2

Comment by James Nunez (Inactive) [ 16/Nov/15 ]

More failures on master:
2015-11-14 06:00:54 - https://testing.hpdd.intel.com/test_sets/1d980b04-8add-11e5-86aa-5254006e85c2
2015-11-18 02:21:57 - https://testing.hpdd.intel.com/test_sets/588d486c-8de1-11e5-a4b1-5254006e85c2
2015-12-02 06:56:45 - https://testing.hpdd.intel.com/test_sets/ae2b6ed8-9913-11e5-aeec-5254006e85c2
2015-12-07 14:40:39 - https://testing.hpdd.intel.com/test_sets/17b4eaa2-9d40-11e5-ade6-5254006e85c2
2015-12-09 05:53:36 - https://testing.hpdd.intel.com/test_sets/cd3cb85e-9e86-11e5-87a9-5254006e85c2
2015-12-09 18:40:23 - https://testing.hpdd.intel.com/test_sets/24479cda-9ef6-11e5-ba94-5254006e85c2
2016-01-04 03:38:41 - https://testing.hpdd.intel.com/test_sets/21928502-b2df-11e5-aa1f-5254006e85c2
2016-01-14 21:54:15 - https://testing.hpdd.intel.com/test_sets/2efcd76e-bb50-11e5-acbb-5254006e85c2
2016-01-20 09:06:34 - https://testing.hpdd.intel.com/test_sets/642dee9c-bf9b-11e5-a659-5254006e85c2
2016-01-24 07:57:29 - https://testing.hpdd.intel.com/test_sets/e1d05f34-c2b5-11e5-8d4d-5254006e85c2
2016-01-26 09:25:22 - https://testing.hpdd.intel.com/test_sets/2a3e8606-c455-11e5-8866-5254006e85c2
2016-02-02 08:13:41 - https://testing.hpdd.intel.com/test_sets/264d4cb4-c9d0-11e5-b71a-5254006e85c2
2016-02-15 14:02:16 - https://testing.hpdd.intel.com/test_sets/5798cb4c-d437-11e5-aabf-5254006e85c2
2016-02-22 16:05:58 - https://testing.hpdd.intel.com/test_sets/b40373ea-d9c9-11e5-8b17-5254006e85c2

Comment by Richard Henwood (Inactive) [ 09/Mar/16 ]

Another failure on Master:

https://testing.hpdd.intel.com/test_sets/287c20f2-e48b-11e5-bbef-5254006e85c2

Comment by Lai Siyao [ 12/Jun/17 ]

I reproduced this in local system, and found the root cause is that on DNE system, it takes long time to prepare a page for readdir, because dirents are distributed on several stripes, but the directory page should be in hash order, so it will iterate each stripes for a single dirent until it fills a directory page, it looks like statahead is often slower than 'ls' in building directory page, so 'lookup' from 'ls' couldn't find cached statahead entries, thus statahead failed and quit later. In the end 'ls' without statahead is often faster.

I'll see how this can be improved.

Comment by Andreas Dilger [ 15/Jun/17 ]

Lai, the generation of DNE2 readdir pages is something that I discussed with Di in the past. Essentially, the llite-level readdir is a merge sort of the individual readdir pages from the various MDTs, which are already in hash order. Currently, it appears that this is implemented in a sub-optimal manner - essentially an O(n^2) sort currently.

One option would be to implement a secondary readdir cache at the llite level that maintains the entries in sorted order, possibly in a linked list of dentries. This would allow fixing the readdir vs. unlink problem in LU-3308, and would potentially allow unlinks to drop dentries from the readdir list at lock cancellation, rather than having to drop all whenever the directory lock is lost.

Comment by Gerrit Updater [ 15/Jun/17 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/27663
Subject: LU-5106 readdir: improve striped readdir
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 596f4129e8d9763febe297f6c82cd95efb405fbd

Comment by Gerrit Updater [ 16/Jun/17 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/27683
Subject: LU-5106 statahead: support striped directory
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 852c199f31da0ae9a64ef0ee63398c973373cca4

Comment by Lai Siyao [ 16/Jun/17 ]

Andreas, this may not be so complicated, in https://review.whamcloud.com/27663 I introduced a struct lmv_dir_ctxt which saves directory page and current dirent for all stripes, so to get the next dirent for this directory, it only needs to compare current dirent of all stripes and find the one with the smallest hash. This is O IMO.

BTW, to pass sanity.sh 123a, both patches are needed.

Comment by Gerrit Updater [ 19/Jul/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27683/
Subject: LU-5106 statahead: support striped directory
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9e7952c045a3ce2041a2fa325cc4a147be6549bb

Comment by Gerrit Updater [ 19/Jul/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27663/
Subject: LU-5106 readdir: improve striped readdir
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 98fc9a77446a1539bca18215ad57f21712218ecc

Comment by Peter Jones [ 19/Jul/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 04/Jun/20 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38826
Subject: LU-5106 readdir: improve striped readdir
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 23935cdc099876e5945979b11d9ef80863a5cb2f

Generated at Sat Feb 10 01:48:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.