[LU-8305] sanity-sec test_27: @@@@@@ FAIL: fileset not cleared on nodemap c0 Created: 20/Jun/16  Updated: 20/Apr/17  Resolved: 12/Apr/17

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: Lustre 2.9.0, Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Sebastien Buisson (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for parinay <parinay_kondekar@xyratex.com>

Please provide additional information about the failure here.

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/aba1b6c8-34f8-11e6-80b9-5254006e85c2.

sanity-sec test_27: @@@@@@ FAIL: fileset not cleared on nodemap c0 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4789:error()
  = /usr/lib64/lustre/tests/sanity-sec.sh:1784:test_27()
  = /usr/lib64/lustre/tests/test-framework.sh:5054:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5093:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4939:run_test()
  = /usr/lib64/lustre/tests/sanity-sec.sh:1789:main()
Dumping lctl log to /logdir/test_logs/2016-06-17/lustre-reviews-el7-x86_64--review-dne-part-2--1_7_1__39912__-69959014310400-174851/sanity-sec.test_27.*.1466214942.log
CMD: trevis-55vm1.trevis.hpdd.intel.com,trevis-55vm2,trevis-55vm3,trevis-55vm7,trevis-55vm8 /usr/sbin/lctl dk > /logdir/test_logs/2016-06-17/lustre-reviews-el7-x86_64--review-dne-part-2--1_7_1__39912__-69959014310400-174851/sanity-sec.test_27.debug_log.\$(hostname -s).1466214942.log;
         dmesg > /logdir/test_logs/2016-06-17/lustre-reviews-el7-x86_64--review-dne-part-2--1_7_1__39912__-69959014310400-174851/sanity-sec.test_27.dmesg.\$(hostname -s).1466214942.log
CMD: trevis-55vm3,trevis-55vm7,trevis-55vm8 /usr/sbin/lctl set_param debug=\"\"
trevis-55vm3: error: set_param: setting debug: no value
trevis-55vm8: error: set_param: setting debug: no value
trevis-55vm7: error: set_param: setting debug: no value
Resetting fail_loc on all nodes...CMD: trevis-55vm1.trevis.hpdd.intel.com,trevis-55vm2,trevis-55vm3,trevis-55vm7,trevis-55vm8 lctl set_param -n fail_loc=0 	    fail_val=0 2>/dev/null
done.
parinay@osh:~/Code/lustre-intel$ git show 5042d4c --pretty=fuller 
commit 5042d4c8ecad287276e04c52c6a1fee9c9b597a9
Author:     Sebastien Buisson <sbuisson@ddn.com>
AuthorDate: Sat Apr 30 18:27:57 2016 +0200
Commit:     Oleg Drokin <oleg.drokin@intel.com>
CommitDate: Thu Jun 16 22:15:58 2016 +0000

    LU-7846 tests: test for nodemap fileset in sanity-sec
    
    Add new tests in sanity-sec.sh to test for fileset in nodemap


 Comments   
Comment by parinay v kondekar (Inactive) [ 20/Jun/16 ]

The test was added by the commit of LU-7846

Comment by Gerrit Updater [ 27/Jun/16 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: http://review.whamcloud.com/20990
Subject: LU-8305 tests: add traces for sanity-sec
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 03c4796dd2809e01b236035ffb81d57a66091f22

Comment by Gerrit Updater [ 11/Aug/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20990/
Subject: LU-8305 tests: add traces for sanity-sec
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: aa9938a26b2fd58afc53bd25020e8940464d1855

Comment by Peter Jones [ 11/Aug/16 ]

Landed for 2.9

Comment by nasf (Inactive) [ 07/Nov/16 ]

Hit on master again:
https://testing.hpdd.intel.com/test_sets/fdfba8e0-a45e-11e6-8a63-5254006e85c2

Comment by James Nunez (Inactive) [ 09/Nov/16 ]

The patch that was landed only had trace/debug added to santy-sec. This test is still failing and we have several instances with the debug information in the client test_log:

https://testing.hpdd.intel.com/test_sets/7f5995f6-a5c1-11e6-bf77-5254006e85c2
https://testing.hpdd.intel.com/test_sets/fdfba8e0-a45e-11e6-8a63-5254006e85c2
https://testing.hpdd.intel.com/test_sets/e5dee85c-a2cf-11e6-8e31-5254006e85c2
https://testing.hpdd.intel.com/test_sets/32ccc30a-a0af-11e6-a5bb-5254006e85c2

In the past week or so, It looks like sanity-sec test_27 is failing with this error about once every two days.

Comment by Gerrit Updater [ 10/Nov/16 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: http://review.whamcloud.com/23693
Subject: LU-8305 tests: strengthen fileset cleanup in sanity-sec
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7f23055e66fbbe234c0ad9b00552473bf554690e

Comment by Sebastien Buisson (Inactive) [ 10/Nov/16 ]

The problem is that fileset is not properly cleared on MGS side first. I pushed a new patch to improve fileset cleanup on MGS side.

Comment by Bob Glossman (Inactive) [ 13/Nov/16 ]

another on master:
https://testing.hpdd.intel.com/test_sets/3b325832-a8b8-11e6-8969-5254006e85c2

Comment by Gerrit Updater [ 17/Nov/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/23693/
Subject: LU-8305 tests: strengthen fileset cleanup in sanity-sec
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4827d4e3a3712ca74dbe276aee4ba3dc6069a78b

Comment by Peter Jones [ 17/Nov/16 ]

Landed for 2.9

Comment by Niu Yawei (Inactive) [ 23/Nov/16 ]

Still hit the problem on master, though the message is slightly different: "On MGS, fileset cannnot be cleared"
https://testing.hpdd.intel.com/test_sets/30023954-b0b1-11e6-9c4b-5254006e85c2

Comment by Sebastien Buisson (Inactive) [ 23/Nov/16 ]

Hi,

Thanks to the new messages recently added to sanity-sec, I can see that the error is due to the fact that the command "lctl nodemap_set_fileset --name c0 --fileset ''" fails on the MGS. This is weird as it is a local command which purpose is just to modify the fileset info on a nodemap, and especially weird because the named command does not return any error.

I am able to reproduce the problem on my test system, but it does not occur systematically, and the only configuration in which I am able to trigger it is with a combo MGT/MDT and a regular MDT on the same node. I am trying to debug the issue at the moment.

Thanks,
Sebastien.

Comment by Sebastien Buisson (Inactive) [ 24/Nov/16 ]

The problem with sanity-sec test_27 stems from the way the fileset info is setup on the nodemap entry. If we do 'lctl set_param' and then 'lctl set_param -P', it will set the fileset info twice on MGS side. Under certain circumstances, the propagation of 'lctl set_param -P' can be delayed, and happen in test_27 after we have reset the fileset info with 'lctl nodemap_set_fileset'.
So we fix test_27 by only setting fileset on nodemap with 'set_param -P' command. But we need to wait into wait_nm_sync() for it to be set on MGS as well.

Comment by Gerrit Updater [ 24/Nov/16 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: http://review.whamcloud.com/23936
Subject: LU-8305 tests: fix fileset setup in sanity-sec
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8dcfaf64f1d62580c3e8cc3d4b3b64542e02bdd2

Comment by nasf (Inactive) [ 10/Dec/16 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/ae2f0e8a-be33-11e6-9b28-5254006e85c2

Comment by Gerrit Updater [ 17/Dec/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23936/
Subject: LU-8305 tests: fix fileset setup in sanity-sec
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 947d81daa1ba218bb027b17061c1c24ccaf85c22

Comment by James Casper [ 10/Apr/17 ]

This issue has resurfaced in the latest tag (2.9.55, b3550).

https://testing.hpdd.intel.com/test_sessions/630a4991-3c6d-4755-b50e-9932f0cf69fb

Comment by Sebastien Buisson (Inactive) [ 12/Apr/17 ]

> This issue has resurfaced in the latest tag (2.9.55, b3550).
> https://testing.hpdd.intel.com/test_sessions/630a4991-3c6d-4755-b50e-9932f0cf69fb

Not exactly, as clients in this test are running Lustre 2.9.0. Patch https://review.whamcloud.com/23936 landed after 2.9.0, so sanity-sec.sh executed in this run does not contain the fix (the purpose of the patch was only to address a test script issue).

I think this ticket can be closed again.

Thanks,
Sebastien.

Comment by Peter Jones [ 12/Apr/17 ]

Do we just need to adjust the test script to skip running in interop mode with older releases?

Comment by James Casper [ 12/Apr/17 ]

Will open a new ticket to modify the sanity-sec test on 2.9.0 clients.

Generated at Sat Feb 10 02:16:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.