[LU-372] replay-single test_61d: FAIL: cannot restart mgs Created: 30/May/11 Updated: 12/Oct/11 Resolved: 12/Oct/11 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0, Lustre 1.8.6 |
| Fix Version/s: | Lustre 2.1.0, Lustre 1.8.6 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Jian Yu | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Branch: b1_8 MGS/MDS Nodes: client-1-ib(active), client-2-ib(passive) OSS Nodes: client-6-ib(active), client-8-ib(active) Client Nodes: client-4-ib, client-12-ib |
||
| Severity: | 3 |
| Rank (Obsolete): | 4942 |
| Description |
|
While running replay-single tests under the failover configuration, test 61d failed as follows: == test 61d: error in llog_setup should cleanup the llog context correctly == 08:53:13 fail_loc=0x80000605 Starting mgs: -o user_xattr,acl /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds client-1-ib: mount.lustre: mount /dev/disk/by-id/scsi-1IET_00010001 at /mnt/mds failed: Invalid argument client-1-ib: This may have multiple causes. client-1-ib: Are the mount options correct? client-1-ib: Check the syslog for more info. mount -t lustre /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds Start of /dev/disk/by-id/scsi-1IET_00010001 on mgs failed 22 fail_loc=0 Starting mgs: -o user_xattr,acl /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds client-1-ib: mount.lustre: mount /dev/disk/by-id/scsi-1IET_00010001 at /mnt/mds failed: Invalid argument client-1-ib: This may have multiple causes. client-1-ib: Are the mount options correct? client-1-ib: Check the syslog for more info. mount -t lustre /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds Start of /dev/disk/by-id/scsi-1IET_00010001 on mgs failed 22 replay-single test_61d: @@@@@@ FAIL: cannot restart mgs Dumping lctl log to /home/yujian/test_logs/2011-05-25/072205/replay-single.test_61d.*.1306338816.log tar: Removing leading `/' from member names /home/yujian/test_logs/2011-05-25/072205/replay-single-1306338816.tar.bz2 Resetting fail_loc on all nodes...done. FAIL (33s) Maloo report: https://maloo.whamcloud.com/test_sets/172b0dd4-8745-11e0-b4df-52540025f9af This is a test script issue that "do_facet mgs" did not figure out the active MGS node while the MGS and MDS nodes were combined and had the same failover pair. From the Maloo report we could see, the MDS node had been failed over to client-2-ib in test 61b. However, the "do_facet mgs" called by "stop mgs" and "start mgs" in test 61d still thought client-1-ib was the active one. We need add a $TMP/mgsactive file to indicate which is the active partner for the combined MGS/MDS node, and then "facet_active mgs" called by "do_facet mgs" could figure out the active MGS node correctly. |
| Comments |
| Comment by Jian Yu [ 30/May/11 ] |
|
Patch for b1_8 is in http://review.whamcloud.com/871. |
| Comment by Chris Gearing (Inactive) [ 30/May/11 ] |
|
Yu Jian, This is a good addition does it also need to be applied to master is this already implemented in master? I'm sorry but I can't look easily where I am. Would it also be possible to write a Wiki page about the use of environment variables and the $tmp/files for failover on the Wiki. You have I realised based this code on what happens for the mds but that behaviour is not really documented. I'm trying to create some information here: http://wiki.whamcloud.com/display/PUB/Lustre+Test+Tools+Environment+Variables maybe you could create a child-page on this particular failover topic. Thanks Chris |
| Comment by Chris Gearing (Inactive) [ 30/May/11 ] |
|
I've been thinking about this and can't see why we need a separate file for combined mds-mgs. If we have a combined mds-mgs then the active mgs will be in the /tmp/[mds] file and if we don't then it will just be the mgs. I'll look some more at the code, but this seems to add more complication than is needed. |
| Comment by Jian Yu [ 30/May/11 ] |
This is also needed on master branch. I'd make a patch for it.
OK, I'll do this. |
| Comment by Jian Yu [ 30/May/11 ] |
This is because "do_facet mgs" -> "facet_active_host mgs" -> "facet_active mgs": if [ -f $TMP/mgsactive ] ; then It will source the $TMP/mgsactive file instead of $TMP/mdsactive file. What's more, if we made a change here to source the $TMP/mdsactive file for combined MGS/MDS node, the content of the $TMP/mdsactive file would be incorrect for the "mgs" facet. It contained "mdsactive=xxx" instead of "mgsactive=xxx". |
| Comment by Build Master (Inactive) [ 07/Jun/11 ] |
|
Integrated in Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
|
| Comment by Build Master (Inactive) [ 07/Jun/11 ] |
|
Integrated in Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
|
| Comment by Build Master (Inactive) [ 07/Jun/11 ] |
|
Integrated in Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
|
| Comment by Build Master (Inactive) [ 07/Jun/11 ] |
|
Integrated in Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
|
| Comment by Build Master (Inactive) [ 07/Jun/11 ] |
|
Integrated in Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
|
| Comment by Build Master (Inactive) [ 07/Jun/11 ] |
|
Integrated in Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
|
| Comment by Build Master (Inactive) [ 07/Jun/11 ] |
|
Integrated in Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
|
| Comment by Build Master (Inactive) [ 07/Jun/11 ] |
|
Integrated in Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
|
| Comment by Build Master (Inactive) [ 07/Jun/11 ] |
|
Integrated in Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
|
| Comment by Build Master (Inactive) [ 07/Jun/11 ] |
|
Integrated in Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
|
| Comment by Peter Jones [ 07/Jun/11 ] |
|
YuJian Do we need this same change for master? Peter |
| Comment by Build Master (Inactive) [ 07/Jun/11 ] |
|
Integrated in Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
|
| Comment by Jian Yu [ 07/Jun/11 ] |
Yes, Peter, I'll make a patch for master branch. |
| Comment by Jian Yu [ 08/Jun/11 ] |
|
Patch for master is in: http://review.whamcloud.com/913. |
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
|
| Comment by Jian Yu [ 12/Oct/11 ] |
|
Patches have been pushed to both b1_8 and master branch. The issue is resolved. |