[LU-5240] Test failure sanity test_69: write error Created: 20/Jun/14  Updated: 11/Oct/14  Resolved: 02/Jul/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.5.3
Fix Version/s: Lustre 2.6.0, Lustre 2.5.4

Type: Bug Priority: Critical
Reporter: Maloo Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: HB, llnl, revzfs

Issue Links:
Related
is related to LU-5164 Limit lu_object cache (ZFS and osd-zfs) Resolved
Severity: 3
Rank (Obsolete): 14612

 Description   

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite runs:
http://maloo.whamcloud.com/test_sets/136a09ee-d5c5-11e3-8758-52540035b04c
https://maloo.whamcloud.com/test_sets/8000796a-f787-11e3-b15e-52540035b04c
https://maloo.whamcloud.com/test_sets/90f9e2fc-f790-11e3-b0db-52540035b04c

The sub-test test_69 failed with the following error:

write error

Info required for matching: sanity 69



 Comments   
Comment by Andreas Dilger [ 20/Jun/14 ]

Looks like this is a very recent regression. Is it possible to track it down to a patch that landed in the last couple of days?

Comment by Andreas Dilger [ 20/Jun/14 ]

Lots of the failures are on the two patches:
http://review.whamcloud.com/10620 LU-5150 mdc: Handle empty but non-zero acl xattr
http://review.whamcloud.com/10741 LU-5221 vvp: release mmap_sem in error case

which puts the likely candidate as:
http://review.whamcloud.com/10237 LU-5164 osd: Limit lu_object cache

This is reinforced by the fact that the first version of the http://review.whamcloud.com/10237 patch also failed sanity.sh once before it was landed (the only such failure in the past year). The second version of the patch did not fail so it wasn't noticed before landing.

Comment by Brian Behlendorf [ 23/Jun/14 ]

One suggestion would be to leave the limiting code in place but simply increase LU_CACHE_NR_ZFS_LIMIT. In practice we haven't seen any issue caused by setting it to 256 but perhaps an Andreas suggested earlier it needs to be at least a large as the number of service threads.

Comment by Nathaniel Clark [ 24/Jun/14 ]

Speculative patch:
http://review.whamcloud.com/10802

Comment by Andreas Dilger [ 26/Jun/14 ]

I've also submitted http://review.whamcloud.com/10855 to revert the original 10237 patch as a fallback measure in case 10802 doesn't solve the problem.

Comment by nasf (Inactive) [ 26/Jun/14 ]

Another failure instance:
https://testing.hpdd.intel.com/test_sets/06ab90ae-fd54-11e3-a270-5254006e85c2

Comment by nasf (Inactive) [ 30/Jun/14 ]

Another failure instance:

https://testing.hpdd.intel.com/sub_tests/34c145f8-ff87-11e3-af4b-5254006e85c2

Comment by Jodi Levi (Inactive) [ 02/Jul/14 ]

Patch landed to Master.

Comment by Jian Yu [ 20/Sep/14 ]

Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/12002

Generated at Sat Feb 10 01:49:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.