[LU-5106] Test failure sanity test_123a: ls 10000 files is slower with statahead! Created: 27/May/14 Updated: 13/Feb/21 Resolved: 19/Jul/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Maloo | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 14086 | ||||||||
| Description |
|
This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com> This issue relates to the following test suite run: The sub-test test_123a failed with the following error:
Info required for matching: sanity 123a |
| Comments |
| Comment by Andreas Dilger [ 27/May/14 ] |
|
This is most likely related to the VM being slow or similar. |
| Comment by Jian Yu [ 10/Jun/14 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/61/ The same failure occurred: https://maloo.whamcloud.com/test_sets/82588a38-ef83-11e3-b8c2-52540035b04c |
| Comment by Bob Glossman (Inactive) [ 25/Sep/15 ] |
|
another seen with el6.7 client/server on master: |
| Comment by James Nunez (Inactive) [ 16/Nov/15 ] |
|
More failures on master: |
| Comment by Richard Henwood (Inactive) [ 09/Mar/16 ] |
|
Another failure on Master: https://testing.hpdd.intel.com/test_sets/287c20f2-e48b-11e5-bbef-5254006e85c2 |
| Comment by Lai Siyao [ 12/Jun/17 ] |
|
I reproduced this in local system, and found the root cause is that on DNE system, it takes long time to prepare a page for readdir, because dirents are distributed on several stripes, but the directory page should be in hash order, so it will iterate each stripes for a single dirent until it fills a directory page, it looks like statahead is often slower than 'ls' in building directory page, so 'lookup' from 'ls' couldn't find cached statahead entries, thus statahead failed and quit later. In the end 'ls' without statahead is often faster. I'll see how this can be improved. |
| Comment by Andreas Dilger [ 15/Jun/17 ] |
|
Lai, the generation of DNE2 readdir pages is something that I discussed with Di in the past. Essentially, the llite-level readdir is a merge sort of the individual readdir pages from the various MDTs, which are already in hash order. Currently, it appears that this is implemented in a sub-optimal manner - essentially an O(n^2) sort currently. One option would be to implement a secondary readdir cache at the llite level that maintains the entries in sorted order, possibly in a linked list of dentries. This would allow fixing the readdir vs. unlink problem in LU-3308, and would potentially allow unlinks to drop dentries from the readdir list at lock cancellation, rather than having to drop all whenever the directory lock is lost. |
| Comment by Gerrit Updater [ 15/Jun/17 ] |
|
Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/27663 |
| Comment by Gerrit Updater [ 16/Jun/17 ] |
|
Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/27683 |
| Comment by Lai Siyao [ 16/Jun/17 ] |
|
Andreas, this may not be so complicated, in https://review.whamcloud.com/27663 I introduced a struct lmv_dir_ctxt which saves directory page and current dirent for all stripes, so to get the next dirent for this directory, it only needs to compare current dirent of all stripes and find the one with the smallest hash. This is O BTW, to pass sanity.sh 123a, both patches are needed. |
| Comment by Gerrit Updater [ 19/Jul/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27683/ |
| Comment by Gerrit Updater [ 19/Jul/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27663/ |
| Comment by Peter Jones [ 19/Jul/17 ] |
|
Landed for 2.11 |
| Comment by Gerrit Updater [ 04/Jun/20 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38826 |