[LU-16762] sanity 123ad fails on a local setup Created: 23/Apr/23  Updated: 20/Dec/23  Resolved: 20/Dec/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Alex Zhuravlev Assignee: Qian Yingjin
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

this test is introduced in LU-14139 statahead: add test for batch statahead processing

== sanity test 123ad: Verify batching statahead works correctly ========================================================== 09:48:12 (1682243292)

batching: statahead_max=32 statahead_batch_max=32
mdc.lustre-MDT0000-mdc-ffff94639828b000.batch_stats=0
mdc.lustre-MDT0001-mdc-ffff94639828b000.batch_stats=0
llite.lustre-ffff94639828b000.statahead_max=32
llite.lustre-ffff94639828b000.statahead_batch_max=32
seq.cli-lustre-OST0000-super.width=0x1ffffff
seq.cli-lustre-OST0001-super.width=0x1ffffff
bochs
mdc.lustre-MDT0000-mdc-ffff94639828b000.batch_stats=0
mdc.lustre-MDT0001-mdc-ffff94639828b000.batch_stats=0
striped dir -i1 -c2 -H crush2 /mnt/lustre/d123ad.sanity
total: 100 open/close in 0.21 seconds: 470.99 ops/second
llite.lustre-ffff94639828b000.statahead_max=0
101

real	0m0.176s
user	0m0.000s
sys	0m0.085s
ls -l 100 files without statahead: 0 sec
llite.lustre-ffff94639828b000.statahead_max=32
32
101

real	0m0.226s
user	0m0.022s
sys	0m0.135s
ls -l 100 files with statahead: 1 sec
statahead total: 32
statahead wrong: 7
agl total: 32
hit_total: 886
miss_total: 322
statahead was stopped, maybe too many locks held!
total: 900 open/close in 1.80 seconds: 499.58 ops/second
llite.lustre-ffff94639828b000.statahead_max=0
1001

real	0m0.870s
user	0m0.008s
sys	0m0.374s
ls -l 1000 files without statahead: 0 sec
llite.lustre-ffff94639828b000.statahead_max=32
32
1001

real	0m1.147s
user	0m0.077s
sys	0m0.396s
ls -l 1000 files with statahead: 1 sec
statahead total: 33
statahead wrong: 8
agl total: 33
hit_total: 886
miss_total: 335
statahead was stopped, maybe too many locks held!
 - open/close 5164 (time 1682243309.81 total 10.01 last 515.66)
total: 9000 open/close in 18.85 seconds: 477.41 ops/second
llite.lustre-ffff94639828b000.statahead_max=0
10001

real	0m14.353s
user	0m0.235s
sys	0m6.314s
ls -l 10000 files without statahead: 15 sec
llite.lustre-ffff94639828b000.statahead_max=32
32
10001

real	0m12.395s
user	0m0.193s
sys	0m5.184s
ls -l 10000 files with statahead: 13 sec
statahead total: 34
statahead wrong: 9
agl total: 34
hit_total: 1054
miss_total: 375
statahead was stopped, maybe too many locks held!
ls -l done
rm -r /mnt/lustre/d123ad.sanity/: 18 seconds
rm done
statahead total: 34
statahead wrong: 9
agl total: 34
hit_total: 1054
miss_total: 375
mdc.lustre-MDT0000-mdc-ffff94639828b000.batch_stats=
snapshot_time:            1682243365.822393085 secs.nsecs
start_time:               1682243292.581304155 secs.nsecs
elapsed_time:             73.241088930 secs.nsecs
subreqs per batch   batches   % cum %
1:		         4  40  40
2:		         1  10  50
4:		         0   0  50
8:		         0   0  50
16:		         3  30  80
32:		         2  20 100
mdc.lustre-MDT0001-mdc-ffff94639828b000.batch_stats=
snapshot_time:            1682243365.827668934 secs.nsecs
start_time:               1682243292.581363839 secs.nsecs
elapsed_time:             73.246305095 secs.nsecs
subreqs per batch   batches   % cum %
1:		         3  33  33
2:		         0   0  33
4:		         0   0  33
8:		         1  11  44
16:		         5  55 100
 - open/close 5281 (time 1682243376.01 total 10.00 last 528.07)
total: 10000 open/close in 18.26 seconds: 547.71 ops/second
llite.lustre-ffff94639828b000.statahead_batch_max=0
llite.lustre-ffff94639828b000.statahead_stats=clear
mdc.lustre-MDT0000-mdc-ffff94639828b000.stats=clear
mdc.lustre-MDT0001-mdc-ffff94639828b000.stats=clear
10001

real	0m12.165s
user	0m0.156s
sys	0m4.960s
 sanity test_123ad: @@@@@@ FAIL: unbatched statahead hit count (0) is too low 
  Trace dump:
  = ./../tests/test-framework.sh:6579:error()
  = sanity.sh:13740:test_batch_statahead()
  = sanity.sh:13780:test_123ad()
  = ./../tests/test-framework.sh:6919:run_one()
  = ./../tests/test-framework.sh:6968:run_one_logged()
  = ./../tests/test-framework.sh:6805:run_test()
  = sanity.sh:13783:main()
Dumping lctl log to /tmp/ltest-logs/sanity.test_123ad.*.1682243399.log
Dumping logs only on local client.
seq.cli-lustre-OST0000-super.width=65536
seq.cli-lustre-OST0001-super.width=65536
llite.lustre-ffff94639828b000.statahead_batch_max=0
llite.lustre-ffff94639828b000.statahead_max=32
FAIL 123ad (125s)


 Comments   
Comment by Peter Jones [ 23/Apr/23 ]

Yingjin

Can you please advise?

Thanks

Peter

Comment by Qian Yingjin [ 23/Apr/23 ]

It seems that the stat-ahead does not trigger...
I will try it on my local env.

Thanks,
Qian

Comment by Arshad Hussain [ 06/Sep/23 ]

+1 on Master (https://testing.whamcloud.com/test_sets/4b7b1d2a-a093-4534-9338-e3ede0aa54de)

 
== sanity test 123ad: Verify batching statahead works correctly ========================================================== 15:25:17 (1693927517)
...<snip>

CMD: trevis-40vm1,trevis-40vm2,trevis-40vm3 /usr/sbin/lctl set_param debug=\"trace inode super iotrace malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec lfsck hsm snapshot layout\"
llite.lustre-ffff000045311000.statahead_batch_max=0
llite.lustre-ffff000045311000.statahead_stats=clear
mdc.lustre-MDT0000-mdc-ffff000045311000.stats=clear
mdc.lustre-MDT0001-mdc-ffff000045311000.stats=clear
mdc.lustre-MDT0002-mdc-ffff000045311000.stats=clear
mdc.lustre-MDT0003-mdc-ffff000045311000.stats=clear
10001

real 0m24.957s
user 0m0.300s
sys 0m12.101s
llite.lustre-ffff000045311000.statahead_batch_max=32
llite.lustre-ffff000045311000.statahead_stats=clear
mdc.lustre-MDT0000-mdc-ffff000045311000.batch_stats=clear
mdc.lustre-MDT0001-mdc-ffff000045311000.batch_stats=clear
mdc.lustre-MDT0002-mdc-ffff000045311000.batch_stats=clear
mdc.lustre-MDT0003-mdc-ffff000045311000.batch_stats=clear
mdc.lustre-MDT0000-mdc-ffff000045311000.stats=clear
mdc.lustre-MDT0001-mdc-ffff000045311000.stats=clear
mdc.lustre-MDT0002-mdc-ffff000045311000.stats=clear
mdc.lustre-MDT0003-mdc-ffff000045311000.stats=clear
10001

real 0m47.010s
user 0m0.283s
sys 0m14.385s
sanity test_123ad: @@@@@@ FAIL: batched statahead hit count (6940) is too low
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:6674:error()
= /usr/lib64/lustre/tests/sanity.sh:14055:test_batch_statahead()
= /usr/lib64/lustre/tests/sanity.sh:14078:test_123ad()
= /usr/lib64/lustre/tests/test-framework.sh:7014:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:7063:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:6900:run_test()
= /usr/lib64/lustre/tests/sanity.sh:14081:main()
 

Comment by Gerrit Updater [ 07/Sep/23 ]

"Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52302
Subject: LU-16762 statahead: wait until statahead thread quit
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0ab5cc9b1f04d3c89167a1760e0eb88f58cd025c

Comment by Gerrit Updater [ 20/Dec/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52302/
Subject: LU-16762 statahead: wait until statahead thread quit
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 76eddbeeace4684507650652ce4fa343c2d14e1a

Comment by Peter Jones [ 20/Dec/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:29:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.