[LU-664] 1.8<->2.1 interop: recovery-small test_59 FAIL: Failed to mount /mnt/lustre2 Created: 07/Sep/11  Updated: 03/Jun/15  Resolved: 03/Jun/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Jian Yu Assignee: Minh Diep
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

Old Lustre Versions: 1.8.5 and 1.8.6-wc1
Lustre 1.8.6-wc1 Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/

New Lustre Version: 2.1.0
Lustre 2.1.0 Build: http://newbuild.whamcloud.com/job/lustre-master/276/
Network: TCP (1GigE)

Clean upgrading (Lustre servers and clients were upgraded all at once) from Lustre 1.8.5 and 1.8.6-wc1 to Lustre 2.1.0 under the following configuration:
OSS1: RHEL5/x86_64 upgrade from 1.8.6-wc1 to 2.1.0
OSS2: RHEL5/x86_64 upgrade from 1.8.5 to 2.1.0
MDS: RHEL5/x86_64 upgrade from 1.8.6-wc1 to 2.1.0
Client1: RHEL6/x86_64 upgrade from 1.8.6-wc1 to 2.1.0
Client2: RHEL5/x86_64 upgrade from 1.8.5 to 2.1.0

Test nodes:
OSS1: fat-amd-2 10.10.4.133
OSS2: fat-amd-3 10.10.4.134
MDS: fat-amd-1 10.10.4.132
Client1: client-12 10.10.4.12
Client2: client-13 10.10.4.13


Issue Links:
Related
is related to LU-1193 test script incompatibility when runn... Resolved
Severity: 3
Rank (Obsolete): 6957

 Description   

After the clean upgrading, recovery-small test 59 failed as follows:

client-12: == recovery-small test 59: Read cancel race on client eviction == 01:24:02 (1315383842)
client-12: Starting client: client-12: -o user_xattr,acl,flock fat-amd-1:/lustre /mnt/lustre2
client-12: mount.lustre: mount fat-amd-1:/lustre at /mnt/lustre2 failed: File exists
client-12:  recovery-small test_59: @@@@@@ FAIL: Failed to mount /mnt/lustre2 
client-12: Dumping lctl log to /home/yujian/test_logs/1315383773/recovery-small.test_59.*.1315383843.log
client-12: tar: Removing leading `/' from member names
client-12: /home/yujian/test_logs/1315383773/recovery-small-1315383843.tar.bz2
client-12: FAIL 59 (4s)

Dmesg on client-12 showed that:

Lustre: DEBUG MARKER: == recovery-small test 59: Read cancel race on client eviction == 01:24:02 (1315383842)
LustreError: 10592:0:(genops.c:304:class_newdev()) Device MGC10.10.4.132@tcp already exists, won't add
LustreError: 10592:0:(obd_config.c:327:class_attach()) Cannot create device MGC10.10.4.132@tcp of type mgc : -17
LustreError: 10592:0:(obd_mount.c:512:lustre_start_simple()) MGC10.10.4.132@tcp attach error -17
LustreError: 10592:0:(obd_mount.c:2160:lustre_fill_super()) Unable to mount  (-17)
Lustre: DEBUG MARKER: recovery-small test_59: @@@@@@ FAIL: Failed to mount /mnt/lustre2

The same issue also occurred in test 57.

Maloo report: https://maloo.whamcloud.com/test_sets/8fd0ba26-d92b-11e0-8d02-52540025f9af



 Comments   
Comment by Andreas Dilger [ 25/May/12 ]

https://maloo.whamcloud.com/test_sets/7fd57656-a686-11e1-90f2-52540035b04c

It makes sense to land a patch to b1_8 and/or b2_1 (and/or master) to preferably fix this test (if easily done) or skip this test (as was done in LU-1193).

Generated at Sat Feb 10 01:09:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.