[LU-7133] Interop 2.7.0 <-> master- conf-sanity test_43: check lustre-MDTall.mdt.nosquash_nids failed! Created: 10/Sep/15 Updated: 10/Sep/18 |
|
| Status: | Reopened |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Maloo | Assignee: | Bob Glossman (Inactive) |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Client: 2.7.0 |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/d46e0c62-514d-11e5-9f68-5254006e85c2. The sub-test test_43 failed with the following error: check lustre-MDTall.mdt.nosquash_nids failed! Test log: Setting lustre.mdt.root_squash from 0:0 to 500:500 CMD: shadow-18vm12 /usr/sbin/lctl conf_param lustre.mdt.root_squash='500:500' CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.root_squash CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.root_squash Waiting 90 secs for update CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.root_squash CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.root_squash Updated after 2s: wanted '500:500' got '500:500' CMD: shadow-18vm5.shadow.whamcloud.com /usr/sbin/lctl get_param -n llite.lustre*.root_squash CMD: shadow-18vm5.shadow.whamcloud.com /usr/sbin/lctl get_param -n llite.lustre*.root_squash /mnt/lustre/f43.conf-sanity-userfile: owner uid 500 (-rw-------): root read permission is granted - ok /mnt/lustre/f43.conf-sanity-userfile: owner uid 500 (-rw-------): root write permission is granted - ok /mnt/lustre/f43.conf-sanity-rootfile: owner uid 0 (-rw-------): root read permission is denied - ok /mnt/lustre/f43.conf-sanity-rootfile: owner uid 0 (-rw-------): root write permission is denied - ok /mnt/lustre/d43.conf-sanity-rootdir: owner uid 0 (drwx------): root unlink permission is denied - ok /mnt/lustre/d43.conf-sanity-rootdir: owner uid 0 (drwx------): root create permission is denied - ok /mnt/lustre/f43.conf-sanity-user1file: owner uid 501 (-rw-------): root read permission is denied - ok /mnt/lustre/f43.conf-sanity-user1file: owner uid 501 (-rw-------): root write permission is denied - ok /usr/lib64/lustre/tests/conf-sanity.sh: line 2844: 29182 Terminated runas -u $ID1 tail -f $DIR/$tfile-user1file > /dev/null 2>&1 CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids Setting lustre-MDTall.mdt.nosquash_nids from NONE to 2@elan 0@lo 10.1.4.215@tcp 192.168.0.[2,10]@tcp CMD: shadow-18vm12 /usr/sbin/lctl conf_param lustre-MDTall.mdt.nosquash_nids='2@elan 0@lo 10.1.4.215@tcp 192.168.0.[2,10]@tcp' CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids Waiting 90 secs for update CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids Waiting 80 secs for update CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids Waiting 70 secs for update CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids Waiting 60 secs for update CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids Waiting 50 secs for update CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids Waiting 40 secs for update CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids Waiting 30 secs for update CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids Waiting 20 secs for update CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids Waiting 10 secs for update CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids CMD: shadow-18vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.nosquash_nids Update not seen after 90s: wanted '2@elan 0@lo 10.1.4.215@tcp 192.168.0.[2,10]@tcp' got 'NONE' conf-sanity test_43: @@@@@@ FAIL: check lustre-MDTall.mdt.nosquash_nids failed! Console : 09:31:40:Lustre: DEBUG MARKER: == conf-sanity test 43: check root_squash and nosquash_nids == 09:28:26 (1441099706) 09:31:40:Lustre: DEBUG MARKER: mkdir -p /mnt/lustre 09:31:40:Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock shadow-18vm12@tcp:/lustre /mnt/lustre 09:31:40:LustreError: 28945:0:(obd_config.c:1322:class_process_proc_param()) llite: lustre-client-ffff8800795b0800 unknown param some_wrong_param=10 09:31:40:Lustre: Mounted lustre-client 09:31:40:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n llite.lustre*.root_squash 09:31:40:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n llite.lustre*.root_squash 09:31:40:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n llite.lustre*.nosquash_nids 09:31:40:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n llite.lustre*.nosquash_nids 09:31:40:Lustre: lustre: nosquash_nids is cleared 09:31:40:Lustre: lustre: root_squash is set to 500:500 09:31:40:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n llite.lustre*.root_squash 09:31:40:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n llite.lustre*.root_squash 09:31:40:Lustre: lustre: nosquash_nids set to 2@elan 0@lo 10.1.4.215@tcp 192.168.0.[2,10]@tcp 09:31:40:Lustre: DEBUG MARKER: /usr/sbin/lctl mark conf-sanity test_43: @@@@@@ FAIL: check lustre-MDTall.mdt.nosquash_nids failed! 09:31:40:Lustre: DEBUG MARKER: conf-sanity test_43: @@@@@@ FAIL: check lustre-MDTall.mdt.nosquash_nids failed! |
| Comments |
| Comment by Andreas Dilger [ 29/Sep/15 ] |
|
This is one of the top failing autotests: |
| Comment by Peter Jones [ 29/Sep/15 ] |
|
Bob Could you please look into this one? Thanks Peter |
| Comment by Andreas Dilger [ 29/Sep/15 ] |
|
This just started before 2.7.59, so it may be possible to trace this to a specific patch landing. It might just be a test failure due to a feature, but it needs to be verified that it isn't an interop regression. |
| Comment by Bob Glossman (Inactive) [ 02/Oct/15 ] |
|
here's the problem. from dmesg log of mds1, running new (master) version: [29879.051694] LNet: 14647:0:(nidstrings.c:271:parse_nidrange()) can't parse nidrange: "2@elan" [29879.053687] Lustre: 14647:0:(lprocfs_status.c:1981:lprocfs_wr_nosquash_nids()) lustre-MDT0000: failed to set nosquash_nids to "2@elan 0@lo 10.1.4.215@tcp 192.168.0.[2,10]@tcp", can't parse rc = -22 [29879.057391] LustreError: 14647:0:(obd_config.c:1389:class_process_proc_param()) mdt.: error writing proc entry 'nosquash_nids': rc = -22 elan is one of the obsolete lnds eliminated from master. however it's still used in example test nidlist in old version of conf-sanity.sh in v2.7.0. master server code can't parse it, so just throws up its hands and complains. I don't see this as easily fixable on the server side in master. could be fixed by moving part of the master fix in cont-sanity.sh into b2_7, but that won't fix the problem with interop of current released 2.7 with master. |
| Comment by Bob Glossman (Inactive) [ 02/Oct/15 ] |
|
from the commit header of Remove old LND types from the netstrfns table, as they are Clearly this was a misstatement. At least one obsolete LND is still needed for interop, as there's a reference to it embedded in old cont-sanity.sh |
| Comment by Bob Glossman (Inactive) [ 02/Oct/15 ] |
|
A possible fix might be to just put back an entry to the otherwise unsupported elan LND in the libcfs_netstrfns[] table. This would allow it to be parsed. However I'm unclear if putting an unsupported nidlist entry into lnet data structures might have bad side effects. It might get referenced and assume a functional LND is really there underneath. |
| Comment by James A Simmons [ 02/Oct/15 ] |
|
You are correct putting the élan LND support back will have negative effects. The proper fix is to update the test like we did for master to test for gnilnd instead of élan. |
| Comment by James A Simmons [ 02/Oct/15 ] |
|
I pushed a patch : http://review.whamcloud.com/#/c/16717. I assume we need a patch for 2.6 and 2.5 as well? Lets land this to 2.7.1 before it is officially released, then we will have no further interop issues. |
| Comment by Peter Jones [ 02/Oct/15 ] |
|
James We only test 2.8 interop with 2.5.x and 2.7.x releases, so I think that is the limit of what is needed. Peter |
| Comment by James A Simmons [ 02/Oct/15 ] |
|
I see you pushed a patch Bob so I will abandon my patch. |
| Comment by Peter Jones [ 02/Oct/15 ] |
|
To summarize though, I think that we can discount this from a fix version 2.8 and just plan to tidy up the tests on the maintenance branches for future interop testing. As such I think that we can close this ticket and track that effort separately. |
| Comment by Andreas Dilger [ 04/Oct/15 ] |
|
The patch for b2_7 still needs to land. |
| Comment by Peter Jones [ 04/Oct/15 ] |
|
..which will be tracked separately |
| Comment by Saurabh Tandan (Inactive) [ 29/Oct/15 ] |
|
Encountered same issue for interop testing for 2.7.62 Tag. https://testing.hpdd.intel.com/test_sets/44cc8dd8-7b67-11e5-a83c-5254006e85c2 |
| Comment by Saurabh Tandan (Inactive) [ 15/Dec/15 ] |
|
Another instance for following interop config |
| Comment by Saurabh Tandan (Inactive) [ 16/Dec/15 ] |
|
Server: Master, Build# 3266, Tag 2.7.64 , RHEL 7 |
| Comment by Saurabh Tandan (Inactive) [ 19/Jan/16 ] |
|
Another instance found for interop : EL6.7 Server/2.5.5 Client |
| Comment by Saurabh Tandan (Inactive) [ 08/Feb/16 ] |
|
This is issue is seen 21 times in past 30 days. |
| Comment by Saurabh Tandan (Inactive) [ 10/Feb/16 ] |
|
Another instance found for interop tag 2.7.66 - EL6.7 Server/2.5.5 Client, build# 3316 Another instance found for interop tag 2.7.66 - EL7 Server/2.5.5 Client, build# 3316 |
| Comment by Saurabh Tandan (Inactive) [ 24/Feb/16 ] |
|
Another instance found for interop - EL6.7 Server/2.5.5 Client, tag 2.7.90. |
| Comment by James A Simmons [ 10/Sep/18 ] |
|
Can we close this? |