[LU-8305] sanity-sec test_27: @@@@@@ FAIL: fileset not cleared on nodemap c0 Created: 20/Jun/16 Updated: 20/Apr/17 Resolved: 12/Apr/17 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0 |
| Fix Version/s: | Lustre 2.9.0, Lustre 2.10.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Sebastien Buisson (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
This issue was created by maloo for parinay <parinay_kondekar@xyratex.com> Please provide additional information about the failure here. This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/aba1b6c8-34f8-11e6-80b9-5254006e85c2. sanity-sec test_27: @@@@@@ FAIL: fileset not cleared on nodemap c0
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:4789:error()
= /usr/lib64/lustre/tests/sanity-sec.sh:1784:test_27()
= /usr/lib64/lustre/tests/test-framework.sh:5054:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:5093:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:4939:run_test()
= /usr/lib64/lustre/tests/sanity-sec.sh:1789:main()
Dumping lctl log to /logdir/test_logs/2016-06-17/lustre-reviews-el7-x86_64--review-dne-part-2--1_7_1__39912__-69959014310400-174851/sanity-sec.test_27.*.1466214942.log
CMD: trevis-55vm1.trevis.hpdd.intel.com,trevis-55vm2,trevis-55vm3,trevis-55vm7,trevis-55vm8 /usr/sbin/lctl dk > /logdir/test_logs/2016-06-17/lustre-reviews-el7-x86_64--review-dne-part-2--1_7_1__39912__-69959014310400-174851/sanity-sec.test_27.debug_log.\$(hostname -s).1466214942.log;
dmesg > /logdir/test_logs/2016-06-17/lustre-reviews-el7-x86_64--review-dne-part-2--1_7_1__39912__-69959014310400-174851/sanity-sec.test_27.dmesg.\$(hostname -s).1466214942.log
CMD: trevis-55vm3,trevis-55vm7,trevis-55vm8 /usr/sbin/lctl set_param debug=\"\"
trevis-55vm3: error: set_param: setting debug: no value
trevis-55vm8: error: set_param: setting debug: no value
trevis-55vm7: error: set_param: setting debug: no value
Resetting fail_loc on all nodes...CMD: trevis-55vm1.trevis.hpdd.intel.com,trevis-55vm2,trevis-55vm3,trevis-55vm7,trevis-55vm8 lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null
done.
parinay@osh:~/Code/lustre-intel$ git show 5042d4c --pretty=fuller
commit 5042d4c8ecad287276e04c52c6a1fee9c9b597a9
Author: Sebastien Buisson <sbuisson@ddn.com>
AuthorDate: Sat Apr 30 18:27:57 2016 +0200
Commit: Oleg Drokin <oleg.drokin@intel.com>
CommitDate: Thu Jun 16 22:15:58 2016 +0000
LU-7846 tests: test for nodemap fileset in sanity-sec
Add new tests in sanity-sec.sh to test for fileset in nodemap
|
| Comments |
| Comment by parinay v kondekar (Inactive) [ 20/Jun/16 ] |
|
The test was added by the commit of |
| Comment by Gerrit Updater [ 27/Jun/16 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: http://review.whamcloud.com/20990 |
| Comment by Gerrit Updater [ 11/Aug/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20990/ |
| Comment by Peter Jones [ 11/Aug/16 ] |
|
Landed for 2.9 |
| Comment by nasf (Inactive) [ 07/Nov/16 ] |
|
Hit on master again: |
| Comment by James Nunez (Inactive) [ 09/Nov/16 ] |
|
The patch that was landed only had trace/debug added to santy-sec. This test is still failing and we have several instances with the debug information in the client test_log: https://testing.hpdd.intel.com/test_sets/7f5995f6-a5c1-11e6-bf77-5254006e85c2 In the past week or so, It looks like sanity-sec test_27 is failing with this error about once every two days. |
| Comment by Gerrit Updater [ 10/Nov/16 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: http://review.whamcloud.com/23693 |
| Comment by Sebastien Buisson (Inactive) [ 10/Nov/16 ] |
|
The problem is that fileset is not properly cleared on MGS side first. I pushed a new patch to improve fileset cleanup on MGS side. |
| Comment by Bob Glossman (Inactive) [ 13/Nov/16 ] |
|
another on master: |
| Comment by Gerrit Updater [ 17/Nov/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/23693/ |
| Comment by Peter Jones [ 17/Nov/16 ] |
|
Landed for 2.9 |
| Comment by Niu Yawei (Inactive) [ 23/Nov/16 ] |
|
Still hit the problem on master, though the message is slightly different: "On MGS, fileset cannnot be cleared" |
| Comment by Sebastien Buisson (Inactive) [ 23/Nov/16 ] |
|
Hi, Thanks to the new messages recently added to sanity-sec, I can see that the error is due to the fact that the command "lctl nodemap_set_fileset --name c0 --fileset ''" fails on the MGS. This is weird as it is a local command which purpose is just to modify the fileset info on a nodemap, and especially weird because the named command does not return any error. I am able to reproduce the problem on my test system, but it does not occur systematically, and the only configuration in which I am able to trigger it is with a combo MGT/MDT and a regular MDT on the same node. I am trying to debug the issue at the moment. Thanks, |
| Comment by Sebastien Buisson (Inactive) [ 24/Nov/16 ] |
|
The problem with sanity-sec test_27 stems from the way the fileset info is setup on the nodemap entry. If we do 'lctl set_param' and then 'lctl set_param -P', it will set the fileset info twice on MGS side. Under certain circumstances, the propagation of 'lctl set_param -P' can be delayed, and happen in test_27 after we have reset the fileset info with 'lctl nodemap_set_fileset'. |
| Comment by Gerrit Updater [ 24/Nov/16 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: http://review.whamcloud.com/23936 |
| Comment by nasf (Inactive) [ 10/Dec/16 ] |
|
+1 on master: |
| Comment by Gerrit Updater [ 17/Dec/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23936/ |
| Comment by James Casper [ 10/Apr/17 ] |
|
This issue has resurfaced in the latest tag (2.9.55, b3550). https://testing.hpdd.intel.com/test_sessions/630a4991-3c6d-4755-b50e-9932f0cf69fb |
| Comment by Sebastien Buisson (Inactive) [ 12/Apr/17 ] |
|
> This issue has resurfaced in the latest tag (2.9.55, b3550). Not exactly, as clients in this test are running Lustre 2.9.0. Patch https://review.whamcloud.com/23936 landed after 2.9.0, so sanity-sec.sh executed in this run does not contain the fix (the purpose of the patch was only to address a test script issue). I think this ticket can be closed again. Thanks, |
| Comment by Peter Jones [ 12/Apr/17 ] |
|
Do we just need to adjust the test script to skip running in interop mode with older releases? |
| Comment by James Casper [ 12/Apr/17 ] |
|
Will open a new ticket to modify the sanity-sec test on 2.9.0 clients. |