[LU-168] Test failure on test suite sanityn, subtest test_40b Created: 26/Mar/11  Updated: 21/Jul/11  Resolved: 08/Jul/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 4967

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/b08fc4cc-5754-11e0-a272-52540025f9af.

The sub-test test_40b failed with the following error:

create is blocked



 Comments   
Comment by Mikhail Pershin [ 28/Mar/11 ]

can we track when this bug occurred? Is that after the latest commit?

Comment by nasf (Inactive) [ 28/Mar/11 ]

The "root (/)" is a special object in Lustre, it is in dcache without lookup. We call "__ll_inode_revalidate_it()" to update its attributes. Before getattr-by-fid enabled, it is just simple mds_getattr RPC without lock granted, means triggering mds_getattr RPC every time. After getattr-by-fid enabled, client triggers intent_getattr RPC to fetch attributes with root's UPDATE lock granted. That is the difference. Under such mode, sanityn test_40 must be failed, because the lock conflict on root object. That is to say, POD mechanism is almost ineffective under root directory with getattr-by-fid enabled.

As I can see, the purpose of test_40 is to verify PDO mechanism, it is unnecessary to make the conflict just under root directory. If we do not want to disable getattr-by-fid, we can make test_40 to run under some sub-directory, then can avoid above cases.

Any idea?

Comment by nasf (Inactive) [ 28/Mar/11 ]

The patch:

http://review.whamcloud.com/#change,370

Comment by Build Master (Inactive) [ 28/Mar/11 ]

Integrated in reviews-centos5 #578
LU-168 Verify PDO mechanism under sub-directory of root object

nasf : de3edec5f7c477b7cbef0d47cb4f7de552cbfa17
Files :

  • lustre/tests/sanityn.sh
Comment by Oleg Drokin [ 28/Mar/11 ]

Hm, in fact I think we might want to relieve a bit of stress on the root dir too.
Why do we update the attrs on the root dir at all, if it is to check permissions and such, we might just request LOOKUP only lock for that case and it would make it seem like normal dir on MDT.
What do you think?

Comment by Build Master (Inactive) [ 28/Mar/11 ]

Integrated in reviews-centos5 #580
LU-168 Verify PDO mechanism under sub-directory of root object

nasf : bf9cd24165eebd34e52d3753f2f0857449d4d3f8
Files :

  • lustre/tests/sanityn.sh
Comment by nasf (Inactive) [ 29/Mar/11 ]

I think we can adjust the lock policy for root object. In fact, when revalidate root object or permission against root object, we only need MDS_INODELOCK_LOOKUP. Then getattr-by-fid triggered by these two operations will not conflict with other modifications under root directory. As for stat(root), "ll_getattr_it()" will claim both MDS_INODELOCK_LOOKUP and MDS_INODELOCK_UPDATE.

Does it make sense?

The patch to be reviewed:
http://review.whamcloud.com/#change,370

Comment by Build Master (Inactive) [ 29/Mar/11 ]

Integrated in lustre-reviews » client,el5 #19
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

nasf : a88916a66f9f9b663677ab69803e42b421c92450
Files :

  • lustre/llite/file.c
  • lustre/llite/namei.c
  • lustre/llite/llite_internal.h
Comment by Build Master (Inactive) [ 29/Mar/11 ]

Integrated in reviews-centos5 #595
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

nasf : a88916a66f9f9b663677ab69803e42b421c92450
Files :

  • lustre/llite/file.c
  • lustre/llite/namei.c
  • lustre/llite/llite_internal.h
Comment by Build Master (Inactive) [ 29/Mar/11 ]

Integrated in lustre-reviews » server,el6 #19
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

nasf : a88916a66f9f9b663677ab69803e42b421c92450
Files :

  • lustre/llite/file.c
  • lustre/llite/namei.c
  • lustre/llite/llite_internal.h
Comment by Build Master (Inactive) [ 29/Mar/11 ]

Integrated in lustre-reviews » server,el5 #19
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

nasf : a88916a66f9f9b663677ab69803e42b421c92450
Files :

  • lustre/llite/file.c
  • lustre/llite/namei.c
  • lustre/llite/llite_internal.h
Comment by Build Master (Inactive) [ 29/Mar/11 ]

Integrated in lustre-reviews » client,el6 #19
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

nasf : a88916a66f9f9b663677ab69803e42b421c92450
Files :

  • lustre/llite/llite_internal.h
  • lustre/llite/namei.c
  • lustre/llite/file.c
Comment by Build Master (Inactive) [ 29/Mar/11 ]

Integrated in lustre-reviews » client,ubuntu #19
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

nasf : a88916a66f9f9b663677ab69803e42b421c92450
Files :

  • lustre/llite/llite_internal.h
  • lustre/llite/namei.c
  • lustre/llite/file.c
Comment by Build Master (Inactive) [ 29/Mar/11 ]

Integrated in reviews-centos5 #607
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

nasf : a88916a66f9f9b663677ab69803e42b421c92450
Files :

  • lustre/llite/file.c
  • lustre/llite/namei.c
  • lustre/llite/llite_internal.h
Comment by Build Master (Inactive) [ 31/Mar/11 ]

Integrated in lustre-reviews » client,ubuntu #32
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

nasf : a88916a66f9f9b663677ab69803e42b421c92450
Files :

  • lustre/llite/llite_internal.h
  • lustre/llite/file.c
  • lustre/llite/namei.c
Comment by Build Master (Inactive) [ 31/Mar/11 ]

Integrated in lustre-reviews » client,el6 #32
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

nasf : a88916a66f9f9b663677ab69803e42b421c92450
Files :

  • lustre/llite/namei.c
  • lustre/llite/file.c
  • lustre/llite/llite_internal.h
Comment by Build Master (Inactive) [ 31/Mar/11 ]

Integrated in lustre-reviews » server,el5 #32
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

nasf : a88916a66f9f9b663677ab69803e42b421c92450
Files :

  • lustre/llite/namei.c
  • lustre/llite/llite_internal.h
  • lustre/llite/file.c
Comment by Build Master (Inactive) [ 31/Mar/11 ]

Integrated in lustre-reviews » server,el6 #32
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

nasf : a88916a66f9f9b663677ab69803e42b421c92450
Files :

  • lustre/llite/namei.c
  • lustre/llite/llite_internal.h
  • lustre/llite/file.c
Comment by Build Master (Inactive) [ 31/Mar/11 ]

Integrated in lustre-reviews » client,el5 #32
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

nasf : a88916a66f9f9b663677ab69803e42b421c92450
Files :

  • lustre/llite/file.c
  • lustre/llite/namei.c
  • lustre/llite/llite_internal.h
Comment by Build Master (Inactive) [ 31/Mar/11 ]

Integrated in lustre-master » client,el6-x86_64 #7
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
Files :

  • lustre/llite/file.c
  • lustre/llite/llite_internal.h
  • lustre/llite/namei.c
Comment by Build Master (Inactive) [ 31/Mar/11 ]

Integrated in lustre-master » client,ubuntu-x86_64 #7
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
Files :

  • lustre/llite/file.c
  • lustre/llite/llite_internal.h
  • lustre/llite/namei.c
Comment by Build Master (Inactive) [ 31/Mar/11 ]

Integrated in lustre-master » client,el6-i686 #7
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
Files :

  • lustre/llite/namei.c
  • lustre/llite/file.c
  • lustre/llite/llite_internal.h
Comment by Build Master (Inactive) [ 31/Mar/11 ]

Integrated in lustre-master-centos5 #176
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
Files :

  • lustre/llite/namei.c
  • lustre/llite/file.c
  • lustre/llite/llite_internal.h
Comment by Build Master (Inactive) [ 31/Mar/11 ]

Integrated in lustre-master » server,el6-x86_64 #7
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
Files :

  • lustre/llite/namei.c
  • lustre/llite/llite_internal.h
  • lustre/llite/file.c
Comment by nasf (Inactive) [ 31/Mar/11 ]

The patch is landed to lustre-2.1

Comment by Sarah Liu [ 31/Mar/11 ]

verified on the following build.

server: http://build.whamcloud.com/view/Lustre%202.x/job/lustre-master-centos5/178/
client: http://build.whamcloud.com/view/Lustre%202.x/job/lustre-master-client-centos5/139/

Comment by Sarah Liu [ 31/Mar/11 ]

this issue has been verified and the patch has been landed on 2.1

Comment by Build Master (Inactive) [ 01/Apr/11 ]

Integrated in lustre-master » client,el5-x86_64 #7
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
Files :

  • lustre/llite/namei.c
  • lustre/llite/file.c
  • lustre/llite/llite_internal.h
Comment by Build Master (Inactive) [ 01/Apr/11 ]

Integrated in lustre-master » server,el5-i686 #7
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
Files :

  • lustre/llite/file.c
  • lustre/llite/llite_internal.h
  • lustre/llite/namei.c
Comment by Build Master (Inactive) [ 01/Apr/11 ]

Integrated in lustre-master » server,el5-x86_64 #7
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
Files :

  • lustre/llite/llite_internal.h
  • lustre/llite/namei.c
  • lustre/llite/file.c
Comment by Build Master (Inactive) [ 01/Apr/11 ]

Integrated in lustre-master » client,el5-i686 #7
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
Files :

  • lustre/llite/llite_internal.h
  • lustre/llite/file.c
  • lustre/llite/namei.c
Comment by Chris Gearing (Inactive) [ 05/Apr/11 ]

Although this says fixed and was commited to master around 1st April it still seems to fail when running regression. This is an example https://maloo.whamcloud.com/test_sets/c0f9153a-5f91-11e0-a2b4-52540025f9af.

This could be a failure in the test system or maybe an unfixed bug. Do we have an example of this code passing the test, and can someone try and work out why the sanityn test stills fails. I would have expected maloo to contain an example of sanityn passing, with a reference to that passing gas in the issue.

Comment by Sarah Liu [ 05/Apr/11 ]

yes, we have passed sanityn on Maloo, here is the link:
https://maloo.whamcloud.com/test_sets/87c3a5e2-5cd9-11e0-a272-52540025f9af
https://maloo.whamcloud.com/test_sets/f98e61fc-5c26-11e0-a272-52540025f9af

Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-master » server,el6-i686 #20
LU-168 Claim MDS_INODELOCK_LOOKUP lock when revalidate root object

Oleg Drokin : ee7926c6e54892923ebd46a9a4088669cfdc7f7a
Files :

  • lustre/llite/file.c
  • lustre/llite/namei.c
  • lustre/llite/llite_internal.h
Comment by Sarah Liu [ 27/Jun/11 ]

It seems we see this issue again.
https://maloo.whamcloud.com/test_sets/c909f970-a031-11e0-aee5-52540025f9af

Comment by nasf (Inactive) [ 28/Jun/11 ]

According to the failure log:
https://maloo.whamcloud.com/test_sets/d2f9a67c-a0ba-11e0-aee5-52540025f9af

The second open_create operation was blocked by the first one as expected. But there is interval between the first open_create and the first close, just in such interval, the second open_create was done and then the second close. So the subsequent "check_pdo_conflict" found the first "multiop" still there but the second "multiop" is finished already. I will fix the test scripts to make it run as expected.

http://review.whamcloud.com/#change,1030

Comment by Build Master (Inactive) [ 08/Jul/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #197
LU-168 Fix schedule race in sanityn PDO lock tests

Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
Files :

  • lustre/tests/sanityn.sh
Comment by Peter Jones [ 08/Jul/11 ]

Patch landed for 2.1

Comment by Build Master (Inactive) [ 08/Jul/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #197
LU-168 Fix schedule race in sanityn PDO lock tests

Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
Files :

  • lustre/tests/sanityn.sh
Comment by Build Master (Inactive) [ 08/Jul/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #197
LU-168 Fix schedule race in sanityn PDO lock tests

Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
Files :

  • lustre/tests/sanityn.sh
Comment by Build Master (Inactive) [ 08/Jul/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #197
LU-168 Fix schedule race in sanityn PDO lock tests

Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
Files :

  • lustre/tests/sanityn.sh
Comment by Build Master (Inactive) [ 08/Jul/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #197
LU-168 Fix schedule race in sanityn PDO lock tests

Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
Files :

  • lustre/tests/sanityn.sh
Comment by Build Master (Inactive) [ 08/Jul/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #197
LU-168 Fix schedule race in sanityn PDO lock tests

Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
Files :

  • lustre/tests/sanityn.sh
Comment by Build Master (Inactive) [ 08/Jul/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #197
LU-168 Fix schedule race in sanityn PDO lock tests

Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
Files :

  • lustre/tests/sanityn.sh
Comment by Build Master (Inactive) [ 08/Jul/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #197
LU-168 Fix schedule race in sanityn PDO lock tests

Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
Files :

  • lustre/tests/sanityn.sh
Comment by Build Master (Inactive) [ 08/Jul/11 ]

Integrated in lustre-master » i686,client,el5,ofa #197
LU-168 Fix schedule race in sanityn PDO lock tests

Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
Files :

  • lustre/tests/sanityn.sh
Comment by Build Master (Inactive) [ 08/Jul/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #197
LU-168 Fix schedule race in sanityn PDO lock tests

Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
Files :

  • lustre/tests/sanityn.sh
Comment by Build Master (Inactive) [ 08/Jul/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #197
LU-168 Fix schedule race in sanityn PDO lock tests

Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
Files :

  • lustre/tests/sanityn.sh
Comment by Build Master (Inactive) [ 08/Jul/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #197
LU-168 Fix schedule race in sanityn PDO lock tests

Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
Files :

  • lustre/tests/sanityn.sh
Comment by Build Master (Inactive) [ 08/Jul/11 ]

Integrated in lustre-master » i686,server,el5,ofa #197
LU-168 Fix schedule race in sanityn PDO lock tests

Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
Files :

  • lustre/tests/sanityn.sh
Comment by Build Master (Inactive) [ 08/Jul/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #197
LU-168 Fix schedule race in sanityn PDO lock tests

Oleg Drokin : 5c1e9f9dada5f2f44a4fc9f46c4bb0789b747df6
Files :

  • lustre/tests/sanityn.sh
Comment by Niu Yawei (Inactive) [ 21/Jul/11 ]

There is a sanityn 40c failure:

https://maloo.whamcloud.com/test_sets/69d07264-b345-11e0-b33f-52540025f9af

Looks we still have defects in the pdirop tests.

Comment by nasf (Inactive) [ 21/Jul/11 ]

It is contrary failure against former failure cases. I will check it.

Comment by nasf (Inactive) [ 21/Jul/11 ]

see LU-524 comment.

Generated at Sat Feb 10 01:04:25 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.