[LU-10443] sanity - test_255c: Ladvise test 13, bad lock count, returned 100, actual 0 Created: 28/Dec/17 Updated: 02/Mar/23 Resolved: 22/Feb/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Patrick Farrell (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Rolling Upgrade/Downgrade |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/515b6450-ebfb-11e7-8c23-52540065bddc This issue occurred while performing rolling upgrade/downgrade testing for tag 2.10.56_62. test logs: == sanity test 255c: suite of ladvise lockahead tests ================================================ 07:51:57 (1514361117) Starting test test10 at 1514361118 Finishing test test10 at 1514361118 Starting test test11 at 1514361118 Finishing test test11 at 1514361118 Starting test test12 at 1514361118 Finishing test test12 at 1514361119 Starting test test13 at 1514361119 Finishing test test13 at 1514361119 sanity test_255c: @@@@@@ FAIL: Ladvise test 13, bad lock count, returned 100, actual 0 Might be related to |
| Comments |
| Comment by Peter Jones [ 03/Jan/18 ] |
|
Patrick Could you please advise on this one? Peter |
| Comment by Patrick Farrell (Inactive) [ 03/Jan/18 ] |
|
Sure, will take a look. |
| Comment by Saurabh Tandan (Inactive) [ 03/Jan/18 ] |
|
Steps followed for Rolling Upgrade testing: |
| Comment by Peter Jones [ 06/Feb/18 ] |
|
paf when do you expect to have a chance to get to this? |
| Comment by Patrick Farrell (Inactive) [ 08/Feb/18 ] |
|
Tomorrow or early next week, sorry, I didn't realize this was urgent (passed on to me from the LWG today). I can try to reproduce the procedure described. So to be clear, sanity was run with everything at 2.10.2, OSSes were unmounted and upgraded - I assume the file system remains up for this, this is a failover type scenario? Then sanity run again. Then the same for the MDS, followed by sanity. And then finally, clients were all unmounted and upgraded, then remounted, followed by sanity, which had this failure? I can replicate most of this, but I strongly suspect I won't hit this issue. Lockahead creates no on disk state, only LDLM state which should be destroyed well, A) when the file is deleted, and B) when clients are unmounted and remounted. I think it's way more likely we hit some other rare issue with the test rather than this being upgrade related. First I'll dig through the logs and see if I can find anything. |
| Comment by Patrick Farrell (Inactive) [ 09/Feb/18 ] |
|
... woah. Well, this test should never pass and neither should any of the others, really. We unlink the file and check the lock count after that. I guess we're just consistently winning the race. I'll get a patch generated. This is a bug in the test, if that affects urgency. |
| Comment by Gerrit Updater [ 09/Feb/18 ] |
|
Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/31254 |
| Comment by Patrick Farrell (Inactive) [ 09/Feb/18 ] |
|
This bug is not specific to interop. Just a question of timing. Patch should resolve. |
| Comment by Peter Jones [ 09/Feb/18 ] |
|
Thanks Patrick! Good to know that this is not something that would affect those using the feature for real. |
| Comment by Gerrit Updater [ 22/Feb/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31254/ |
| Comment by Peter Jones [ 22/Feb/18 ] |
|
Landed for 2.11 |