Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.1.0, Lustre 1.8.6
-
None
-
Lustre Branch: b1_8
Lustre Build: http://newbuild.whamcloud.com/job/lustre-b1_8/61/
Distro/Arch: RHEL6/x86_64(patchless client, in-kernel OFED), RHEL5/x86_64(server, OFED 1.5.3, ext4)
MGS/MDS Nodes: client-1-ib(active), client-2-ib(passive)
\ /
1 combined MGS/MDT
OSS Nodes: client-6-ib(active), client-8-ib(active)
\ /
OST1 (active in client-6-ib)
OST2 (active in client-8-ib)
OST3 (active in client-6-ib)
OST4 (active in client-8-ib)
OST5 (active in client-6-ib)
OST6 (active in client-8-ib)
Client Nodes: client-4-ib, client-12-ib
Lustre Branch: b1_8 Lustre Build: http://newbuild.whamcloud.com/job/lustre-b1_8/61/ Distro/Arch: RHEL6/x86_64(patchless client, in-kernel OFED), RHEL5/x86_64(server, OFED 1.5.3, ext4) MGS/MDS Nodes: client-1-ib(active), client-2-ib(passive) \ / 1 combined MGS/MDT OSS Nodes: client-6-ib(active), client-8-ib(active) \ / OST1 (active in client-6-ib) OST2 (active in client-8-ib) OST3 (active in client-6-ib) OST4 (active in client-8-ib) OST5 (active in client-6-ib) OST6 (active in client-8-ib) Client Nodes: client-4-ib, client-12-ib
-
3
-
4942
Description
While running replay-single tests under the failover configuration, test 61d failed as follows:
== test 61d: error in llog_setup should cleanup the llog context correctly == 08:53:13 fail_loc=0x80000605 Starting mgs: -o user_xattr,acl /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds client-1-ib: mount.lustre: mount /dev/disk/by-id/scsi-1IET_00010001 at /mnt/mds failed: Invalid argument client-1-ib: This may have multiple causes. client-1-ib: Are the mount options correct? client-1-ib: Check the syslog for more info. mount -t lustre /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds Start of /dev/disk/by-id/scsi-1IET_00010001 on mgs failed 22 fail_loc=0 Starting mgs: -o user_xattr,acl /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds client-1-ib: mount.lustre: mount /dev/disk/by-id/scsi-1IET_00010001 at /mnt/mds failed: Invalid argument client-1-ib: This may have multiple causes. client-1-ib: Are the mount options correct? client-1-ib: Check the syslog for more info. mount -t lustre /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds Start of /dev/disk/by-id/scsi-1IET_00010001 on mgs failed 22 replay-single test_61d: @@@@@@ FAIL: cannot restart mgs Dumping lctl log to /home/yujian/test_logs/2011-05-25/072205/replay-single.test_61d.*.1306338816.log tar: Removing leading `/' from member names /home/yujian/test_logs/2011-05-25/072205/replay-single-1306338816.tar.bz2 Resetting fail_loc on all nodes...done. FAIL (33s)
Maloo report: https://maloo.whamcloud.com/test_sets/172b0dd4-8745-11e0-b4df-52540025f9af
This is a test script issue that "do_facet mgs" did not figure out the active MGS node while the MGS and MDS nodes were combined and had the same failover pair.
From the Maloo report we could see, the MDS node had been failed over to client-2-ib in test 61b. However, the "do_facet mgs" called by "stop mgs" and "start mgs" in test 61d still thought client-1-ib was the active one. We need add a $TMP/mgsactive file to indicate which is the active partner for the combined MGS/MDS node, and then "facet_active mgs" called by "do_facet mgs" could figure out the active MGS node correctly.