[LU-12428] sanity-sec: test_13 nodemap_del failed with 1 Created: 12/Jun/19 Updated: 12/Sep/19 Resolved: 12/Sep/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0 |
| Fix Version/s: | Lustre 2.13.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Sebastien Buisson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for Li Xi <pkuelelixi@gmail.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/2fe8e80e-8cce-11e9-abe3-52540065bddc By checking the test logs, we can find that in test_13, after running "lctl nodemap_del 48714_1", the test script check whether the nodemap has been deleted or not immediately in delete_nodemaps() of sanity-sec.sh. However, "lctl get_param nodemap.48714_1.id" still prints a result, which is unexpected by delete_nodemaps(). And thus, delete_nodemaps() quit with error reporting failure of test_13. test_14 and test_15 failed too, but that is consequence of test_13 failure. In test_13, delete_nodemaps() didn't remove the existing nodemaps after 48714_1, so the nodemap_add of 48714_2 fails. I think we need to have improvemens here. test_13, test_14 and test_15 are unrelated, so before running these test cases, delete_nodemaps() need to delete existing nodemaps to avoid failure. |
| Comments |
| Comment by James Nunez (Inactive) [ 03/Jul/19 ] |
|
We’re seeing a varying number of sanity-sec tests fail with “nodemap_del failed with 1” followed by several tests failing with “nodemap_add failed with 1”. This looks like is started with on June 9, 2019 for Lustre version 2.12.54.52, for review-dne-zfs-part-2 and review-dne-part-2 only. Some examples of failures are:
https://testing.whamcloud.com/test_sets/cdd12596-9c81-11e9-8dbe-52540065bddc - sanity-sec test_8 fails with “nodemap_del failed with 1” and then tests 9, 10a, 11, 12, 13, 14, 15 fail with “nodemap_add failed with 1” |
| Comment by Peter Jones [ 03/Jul/19 ] |
|
Sebastien can you please investigate? |
| Comment by Gerrit Updater [ 04/Jul/19 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/35418 |
| Comment by Gerrit Updater [ 05/Jul/19 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/35421 |
| Comment by Gerrit Updater [ 15/Aug/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35421/ |
| Comment by Peter Jones [ 15/Aug/19 ] |
|
Landed for 2.13 |
| Comment by Jian Yu [ 28/Aug/19 ] |
|
The failure occurred 5 times on master branch last week: |
| Comment by Sebastien Buisson [ 29/Aug/19 ] |
|
Hmm, when comparing test log from one of the recent failures (https://testing.whamcloud.com/test_sets/969517b8-c9a9-11e9-9fc9-52540065bddc) and test log from patch https://review.whamcloud.com/35421/ when it passed Maloo (https://testing.whamcloud.com/sub_tests/7f7dee14-9f70-11e9-9e3d-52540065bddc), it appears that there are no such message as "On MGS 10.9.4.124, 40996_0.id = nodemap.40996_0.id=1" in the failure case. |
| Comment by Gerrit Updater [ 29/Aug/19 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/35990 |
| Comment by Sebastien Buisson [ 30/Aug/19 ] |
|
Oh, I finally found out the reason behind this strange behavior with wait_nm_sync() in sanity-sec.sh. |
| Comment by Gerrit Updater [ 30/Aug/19 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/36009 |
| Comment by Gerrit Updater [ 12/Sep/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36009/ |
| Comment by Peter Jones [ 12/Sep/19 ] |
|
Landed for 2.13 |