[LU-168] Test failure on test suite sanityn, subtest test_40b Created: 26/Mar/11 Updated: 21/Jul/11 Resolved: 08/Jul/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0 |
| Fix Version/s: | Lustre 2.1.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 4967 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/b08fc4cc-5754-11e0-a272-52540025f9af. The sub-test test_40b failed with the following error:
|
| Comments |
| Comment by Mikhail Pershin [ 28/Mar/11 ] |
|
can we track when this bug occurred? Is that after the latest commit? |
| Comment by nasf (Inactive) [ 28/Mar/11 ] |
|
The "root (/)" is a special object in Lustre, it is in dcache without lookup. We call "__ll_inode_revalidate_it()" to update its attributes. Before getattr-by-fid enabled, it is just simple mds_getattr RPC without lock granted, means triggering mds_getattr RPC every time. After getattr-by-fid enabled, client triggers intent_getattr RPC to fetch attributes with root's UPDATE lock granted. That is the difference. Under such mode, sanityn test_40 must be failed, because the lock conflict on root object. That is to say, POD mechanism is almost ineffective under root directory with getattr-by-fid enabled. As I can see, the purpose of test_40 is to verify PDO mechanism, it is unnecessary to make the conflict just under root directory. If we do not want to disable getattr-by-fid, we can make test_40 to run under some sub-directory, then can avoid above cases. Any idea? |
| Comment by nasf (Inactive) [ 28/Mar/11 ] |
|
The patch: |
| Comment by Build Master (Inactive) [ 28/Mar/11 ] |
|
Integrated in nasf : de3edec5f7c477b7cbef0d47cb4f7de552cbfa17
|
| Comment by Oleg Drokin [ 28/Mar/11 ] |
|
Hm, in fact I think we might want to relieve a bit of stress on the root dir too. |
| Comment by Build Master (Inactive) [ 28/Mar/11 ] |
|
Integrated in nasf : bf9cd24165eebd34e52d3753f2f0857449d4d3f8
|
| Comment by nasf (Inactive) [ 29/Mar/11 ] |
|
I think we can adjust the lock policy for root object. In fact, when revalidate root object or permission against root object, we only need MDS_INODELOCK_LOOKUP. Then getattr-by-fid triggered by these two operations will not conflict with other modifications under root directory. As for stat(root), "ll_getattr_it()" will claim both MDS_INODELOCK_LOOKUP and MDS_INODELOCK_UPDATE. Does it make sense? The patch to be reviewed: |
| Comment by Build Master (Inactive) [ 29/Mar/11 ] |
|
Integrated in nasf : a88916a66f9f9b663677ab69803e42b421c92450
|
| Comment by Build Master (Inactive) [ 29/Mar/11 ] |
|
Integrated in nasf : a88916a66f9f9b663677ab69803e42b421c92450
|
| Comment by Build Master (Inactive) [ 29/Mar/11 ] |
|
Integrated in nasf : a88916a66f9f9b663677ab69803e42b421c92450
|
| Comment by Build Master (Inactive) [ 29/Mar/11 ] |
|
Integrated in nasf : a88916a66f9f9b663677ab69803e42b421c92450
|
| Comment by Build Master (Inactive) [ 29/Mar/11 ] |
|
Integrated in nasf : a88916a66f9f9b663677ab69803e42b421c92450
|
| Comment by Build Master (Inactive) [ 29/Mar/11 ] |
|
Integrated in nasf : a88916a66f9f9b663677ab69803e42b421c92450
|
| Comment by Build Master (Inactive) [ 29/Mar/11 ] |
|
Integrated in nasf : a88916a66f9f9b663677ab69803e42b421c92450
|
| Comment by Build Master (Inactive) [ 31/Mar/11 ] |
|
Integrated in nasf : a88916a66f9f9b663677ab69803e42b421c92450
|
| Comment by Build Master (Inactive) [ 31/Mar/11 ] |
|
Integrated in nasf : a88916a66f9f9b663677ab69803e42b421c92450
|
| Comment by Build Master (Inactive) [ 31/Mar/11 ] |
|
Integrated in nasf : a88916a66f9f9b663677ab69803e42b421c92450
|
| Comment by Build Master (Inactive) [ 31/Mar/11 ] |
|
Integrated in nasf : a88916a66f9f9b663677ab69803e42b421c92450
|
| Comment by Build Master (Inactive) [ 31/Mar/11 ] |
|
Integrated in nasf : a88916a66f9f9b663677ab69803e42b421c92450
|
| Comment by Build Master (Inactive) [ 31/Mar/11 ] |
|
Integrated in Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
|
| Comment by Build Master (Inactive) [ 31/Mar/11 ] |
|
Integrated in Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
|
| Comment by Build Master (Inactive) [ 31/Mar/11 ] |
|
Integrated in Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
|
| Comment by Build Master (Inactive) [ 31/Mar/11 ] |
|
Integrated in Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
|
| Comment by Build Master (Inactive) [ 31/Mar/11 ] |
|
Integrated in Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
|
| Comment by nasf (Inactive) [ 31/Mar/11 ] |
|
The patch is landed to lustre-2.1 |
| Comment by Sarah Liu [ 31/Mar/11 ] |
|
verified on the following build. server: http://build.whamcloud.com/view/Lustre%202.x/job/lustre-master-centos5/178/ |
| Comment by Sarah Liu [ 31/Mar/11 ] |
|
this issue has been verified and the patch has been landed on 2.1 |
| Comment by Build Master (Inactive) [ 01/Apr/11 ] |
|
Integrated in Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
|
| Comment by Build Master (Inactive) [ 01/Apr/11 ] |
|
Integrated in Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
|
| Comment by Build Master (Inactive) [ 01/Apr/11 ] |
|
Integrated in Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
|
| Comment by Build Master (Inactive) [ 01/Apr/11 ] |
|
Integrated in Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
|
| Comment by Chris Gearing (Inactive) [ 05/Apr/11 ] |
|
Although this says fixed and was commited to master around 1st April it still seems to fail when running regression. This is an example https://maloo.whamcloud.com/test_sets/c0f9153a-5f91-11e0-a2b4-52540025f9af. This could be a failure in the test system or maybe an unfixed bug. Do we have an example of this code passing the test, and can someone try and work out why the sanityn test stills fails. I would have expected maloo to contain an example of sanityn passing, with a reference to that passing gas in the issue. |
| Comment by Sarah Liu [ 05/Apr/11 ] |
|
yes, we have passed sanityn on Maloo, here is the link: |
| Comment by Build Master (Inactive) [ 07/Apr/11 ] |
|
Integrated in Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
|
| Comment by Sarah Liu [ 27/Jun/11 ] |
|
It seems we see this issue again. |
| Comment by nasf (Inactive) [ 28/Jun/11 ] |
|
According to the failure log: The second open_create operation was blocked by the first one as expected. But there is interval between the first open_create and the first close, just in such interval, the second open_create was done and then the second close. So the subsequent "check_pdo_conflict" found the first "multiop" still there but the second "multiop" is finished already. I will fix the test scripts to make it run as expected. |
| Comment by Build Master (Inactive) [ 08/Jul/11 ] |
|
Integrated in Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
|
| Comment by Peter Jones [ 08/Jul/11 ] |
|
Patch landed for 2.1 |
| Comment by Build Master (Inactive) [ 08/Jul/11 ] |
|
Integrated in Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
|
| Comment by Build Master (Inactive) [ 08/Jul/11 ] |
|
Integrated in Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
|
| Comment by Build Master (Inactive) [ 08/Jul/11 ] |
|
Integrated in Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
|
| Comment by Build Master (Inactive) [ 08/Jul/11 ] |
|
Integrated in Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
|
| Comment by Build Master (Inactive) [ 08/Jul/11 ] |
|
Integrated in Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
|
| Comment by Build Master (Inactive) [ 08/Jul/11 ] |
|
Integrated in Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
|
| Comment by Build Master (Inactive) [ 08/Jul/11 ] |
|
Integrated in Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
|
| Comment by Build Master (Inactive) [ 08/Jul/11 ] |
|
Integrated in Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
|
| Comment by Build Master (Inactive) [ 08/Jul/11 ] |
|
Integrated in Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
|
| Comment by Build Master (Inactive) [ 08/Jul/11 ] |
|
Integrated in Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
|
| Comment by Build Master (Inactive) [ 08/Jul/11 ] |
|
Integrated in Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
|
| Comment by Build Master (Inactive) [ 08/Jul/11 ] |
|
Integrated in Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
|
| Comment by Build Master (Inactive) [ 08/Jul/11 ] |
|
Integrated in Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
|
| Comment by Niu Yawei (Inactive) [ 21/Jul/11 ] |
|
There is a sanityn 40c failure: https://maloo.whamcloud.com/test_sets/69d07264-b345-11e0-b33f-52540025f9af Looks we still have defects in the pdirop tests. |
| Comment by nasf (Inactive) [ 21/Jul/11 ] |
|
It is contrary failure against former failure cases. I will check it. |
| Comment by nasf (Inactive) [ 21/Jul/11 ] |
|
see |