[LU-2245] failure on sanity.sh test_101c: Small 4k read IO 1! Created: 29/Oct/12  Updated: 17/Apr/14  Resolved: 17/Apr/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Li Wei (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

server/client are both RHEL6


Issue Links:
Related
is related to LU-3963 cleanup libcfs wrappers Resolved
Severity: 3
Rank (Obsolete): 5317

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/34cdc80e-21c2-11e2-b552-52540035b04c.

The sub-test test_101c failed with the following error:

Small 4k read IO 1!

== sanity test 101c: check stripe_size aligned read-ahead =================== 00:07:32 (1351494452)
CMD: client-21-ib /usr/sbin/lctl set_param -n obdfilter.*.read_cache_enable=0
client-21-ib: error: set_param: /proc/{fs,sys}/{lnet,lustre}/obdfilter/*/read_cache_enable: Found no match
CMD: client-21-ib /usr/sbin/lctl set_param -n obdfilter.*.writethrough_cache_enable=0
client-21-ib: error: set_param: /proc/{fs,sys}/{lnet,lustre}/obdfilter/*/writethrough_cache_enable: Found no match
osc.lustre-OST0000-osc-ffff8803348f2000.rpc_stats=0
osc.lustre-OST0001-osc-ffff8803348f2000.rpc_stats=0
osc.lustre-OST0002-osc-ffff8803348f2000.rpc_stats=0
osc.lustre-OST0003-osc-ffff8803348f2000.rpc_stats=0
osc.lustre-OST0004-osc-ffff8803348f2000.rpc_stats=0
osc.lustre-OST0005-osc-ffff8803348f2000.rpc_stats=0
osc.lustre-OST0006-osc-ffff8803348f2000.rpc_stats=0

6.364788s, 102.967MB/s
osc.lustre-OST0000-osc-ffff8803348f2000 rpc check passed!
osc.lustre-OST0001-osc-ffff8803348f2000 rpc check passed!
 sanity test_101c: @@@@@@ FAIL: Small 4k read IO 1! 
  Trace dump:


 Comments   
Comment by Li Wei (Inactive) [ 29/Oct/12 ]

The procfs access problems should be fixed by updating the test to use set_obdfilter_param(), as the parameters in question are now exported by OSDs.

Comment by Sarah Liu [ 31/Oct/12 ]
{comment removed, unrelated to this bug}
Comment by Sarah Liu [ 02/Nov/12 ]

SLES11 SP2 client also has this issue:
https://maloo.whamcloud.com/test_sets/d0072b9c-2534-11e2-9e7c-52540035b04c

Comment by Li Wei (Inactive) [ 18/Nov/12 ]

Wang Di, could you take a look at this?

Comment by Jian Yu [ 20/Nov/12 ]

More instances:
https://maloo.whamcloud.com/test_sets/313ec4a8-2fe2-11e2-866f-52540035b04c
https://maloo.whamcloud.com/test_sets/669d8a72-2fb4-11e2-b30e-52540035b04c

Comment by Di Wang [ 14/Jan/13 ]

I just checked these failures, Sorry for the delay. The test is failed because there are single page RPC during read-ahead. One possible reason might be we do random 64k reading here, if some pages are being evicted because of memory pressure, and these area(64) are being read again, we might see single page RPC here. Probably we should add no-duplicate options in reads, i.e. no area should be read twice even for random read.

But I am not sure it is the real reason here, since there are not enough debug log here. Is this reproducible ?

Comment by Andreas Dilger [ 08/Feb/13 ]

Found only two failures of test_101c() in the past 4 weeks:
https://maloo.whamcloud.com/sub_tests/313362c4-6bac-11e2-a7b9-52540035b04c
https://maloo.whamcloud.com/sub_tests/9e3f4c7a-5d83-11e2-8199-52540035b04c

Lowered priority, since there are more important bugs to fix for now.

Comment by James Nunez (Inactive) [ 15/Apr/14 ]

I'm hitting this error with different values for the RPCs at https://maloo.whamcloud.com/test_sets/ecfe391a-c41c-11e3-a793-52540035b04c

Small 32k read IO 240 !
Comment by Andreas Dilger [ 17/Apr/14 ]

This recent burst of failures was caused by patch 7803 landing.

Comment by Andreas Dilger [ 17/Apr/14 ]

Closing this issue since we haven't seen this failure recently, outside of the previously reported and unrelated issue.

Generated at Sat Feb 10 01:23:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.