[LU-372] replay-single test_61d: FAIL: cannot restart mgs Created: 30/May/11  Updated: 12/Oct/11  Resolved: 12/Oct/11

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0, Lustre 1.8.6
Fix Version/s: Lustre 2.1.0, Lustre 1.8.6

Type: Bug Priority: Minor
Reporter: Jian Yu Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: None
Environment:

Lustre Branch: b1_8
Lustre Build: http://newbuild.whamcloud.com/job/lustre-b1_8/61/
Distro/Arch: RHEL6/x86_64(patchless client, in-kernel OFED), RHEL5/x86_64(server, OFED 1.5.3, ext4)

MGS/MDS Nodes: client-1-ib(active), client-2-ib(passive)
\ /
1 combined MGS/MDT

OSS Nodes: client-6-ib(active), client-8-ib(active)
\ /
OST1 (active in client-6-ib)
OST2 (active in client-8-ib)
OST3 (active in client-6-ib)
OST4 (active in client-8-ib)
OST5 (active in client-6-ib)
OST6 (active in client-8-ib)

Client Nodes: client-4-ib, client-12-ib


Severity: 3
Rank (Obsolete): 4942

 Description   

While running replay-single tests under the failover configuration, test 61d failed as follows:

== test 61d: error in llog_setup should cleanup the llog context correctly == 08:53:13
fail_loc=0x80000605
Starting mgs: -o user_xattr,acl  /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds
client-1-ib: mount.lustre: mount /dev/disk/by-id/scsi-1IET_00010001 at /mnt/mds failed: Invalid argument
client-1-ib: This may have multiple causes.
client-1-ib: Are the mount options correct?
client-1-ib: Check the syslog for more info.
mount -t lustre  /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds
Start of /dev/disk/by-id/scsi-1IET_00010001 on mgs failed 22
fail_loc=0
Starting mgs: -o user_xattr,acl  /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds
client-1-ib: mount.lustre: mount /dev/disk/by-id/scsi-1IET_00010001 at /mnt/mds failed: Invalid argument
client-1-ib: This may have multiple causes.
client-1-ib: Are the mount options correct?
client-1-ib: Check the syslog for more info.
mount -t lustre  /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds
Start of /dev/disk/by-id/scsi-1IET_00010001 on mgs failed 22
 replay-single test_61d: @@@@@@ FAIL: cannot restart mgs 
Dumping lctl log to /home/yujian/test_logs/2011-05-25/072205/replay-single.test_61d.*.1306338816.log
tar: Removing leading `/' from member names
/home/yujian/test_logs/2011-05-25/072205/replay-single-1306338816.tar.bz2
Resetting fail_loc on all nodes...done.
FAIL   (33s)

Maloo report: https://maloo.whamcloud.com/test_sets/172b0dd4-8745-11e0-b4df-52540025f9af

This is a test script issue that "do_facet mgs" did not figure out the active MGS node while the MGS and MDS nodes were combined and had the same failover pair.

From the Maloo report we could see, the MDS node had been failed over to client-2-ib in test 61b. However, the "do_facet mgs" called by "stop mgs" and "start mgs" in test 61d still thought client-1-ib was the active one. We need add a $TMP/mgsactive file to indicate which is the active partner for the combined MGS/MDS node, and then "facet_active mgs" called by "do_facet mgs" could figure out the active MGS node correctly.



 Comments   
Comment by Jian Yu [ 30/May/11 ]

Patch for b1_8 is in http://review.whamcloud.com/871.

Comment by Chris Gearing (Inactive) [ 30/May/11 ]

Yu Jian,

This is a good addition does it also need to be applied to master is this already implemented in master? I'm sorry but I can't look easily where I am.

Would it also be possible to write a Wiki page about the use of environment variables and the $tmp/files for failover on the Wiki. You have I realised based this code on what happens for the mds but that behaviour is not really documented.

I'm trying to create some information here: http://wiki.whamcloud.com/display/PUB/Lustre+Test+Tools+Environment+Variables maybe you could create a child-page on this particular failover topic.

Thanks

Chris

Comment by Chris Gearing (Inactive) [ 30/May/11 ]

I've been thinking about this and can't see why we need a separate file for combined mds-mgs. If we have a combined mds-mgs then the active mgs will be in the /tmp/[mds] file and if we don't then it will just be the mgs.

I'll look some more at the code, but this seems to add more complication than is needed.

Comment by Jian Yu [ 30/May/11 ]

This is a good addition does it also need to be applied to master is this already implemented in master?

This is also needed on master branch. I'd make a patch for it.

I'm trying to create some information here: http://wiki.whamcloud.com/display/PUB/Lustre+Test+Tools+Environment+Variables maybe you could create a child-page on this particular failover topic.

OK, I'll do this.
FYI, here is a wiki page for describing some of the variables used in the test suite: http://wiki.lustre.org/index.php/Acceptance_Small_%28acc-sm%29_Testing_on_Lustre.

Comment by Jian Yu [ 30/May/11 ]

I've been thinking about this and can't see why we need a separate file for combined mds-mgs. If we have a combined mds-mgs then the active mgs will be in the /tmp/[mds] file and if we don't then it will just be the mgs.

This is because "do_facet mgs" -> "facet_active_host mgs" -> "facet_active mgs":

if [ -f $TMP/mgsactive ] ; then
source $TMP/mgsactive
fi

It will source the $TMP/mgsactive file instead of $TMP/mdsactive file.

What's more, if we made a change here to source the $TMP/mdsactive file for combined MGS/MDS node, the content of the $TMP/mdsactive file would be incorrect for the "mgs" facet. It contained "mdsactive=xxx" instead of "mgsactive=xxx".

Comment by Build Master (Inactive) [ 07/Jun/11 ]

Integrated in lustre-b1_8 » x86_64,client,ubuntu1004,inkernel #66
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
Files :

  • lustre/tests/recovery-small.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 07/Jun/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,ofa #66
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
Files :

  • lustre/tests/recovery-small.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 07/Jun/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,ofa #66
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/recovery-small.sh
Comment by Build Master (Inactive) [ 07/Jun/11 ]

Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #66
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
Files :

  • lustre/tests/recovery-small.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 07/Jun/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #66
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
Files :

  • lustre/tests/recovery-small.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 07/Jun/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #66
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/recovery-small.sh
Comment by Build Master (Inactive) [ 07/Jun/11 ]

Integrated in lustre-b1_8 » i686,client,el6,inkernel #66
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/recovery-small.sh
Comment by Build Master (Inactive) [ 07/Jun/11 ]

Integrated in lustre-b1_8 » i686,client,el5,inkernel #67
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/recovery-small.sh
Comment by Build Master (Inactive) [ 07/Jun/11 ]

Integrated in lustre-b1_8 » i686,client,el5,ofa #67
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
Files :

  • lustre/tests/recovery-small.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 07/Jun/11 ]

Integrated in lustre-b1_8 » i686,server,el5,ofa #67
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
Files :

  • lustre/tests/recovery-small.sh
  • lustre/tests/test-framework.sh
Comment by Peter Jones [ 07/Jun/11 ]

YuJian

Do we need this same change for master?

Peter

Comment by Build Master (Inactive) [ 07/Jun/11 ]

Integrated in lustre-b1_8 » i686,server,el5,inkernel #67
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Johann Lombardi : 997c1e4b251867deedf5f9ee97beea72505ee36f
Files :

  • lustre/tests/recovery-small.sh
  • lustre/tests/test-framework.sh
Comment by Jian Yu [ 07/Jun/11 ]

Do we need this same change for master?

Yes, Peter, I'll make a patch for master branch.

Comment by Jian Yu [ 08/Jun/11 ]

Patch for master is in: http://review.whamcloud.com/913.

Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #232
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/recovery-small.sh
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #232
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
Files :

  • lustre/tests/recovery-small.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #232
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
Files :

  • lustre/tests/recovery-small.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #232
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
Files :

  • lustre/tests/recovery-small.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #232
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/recovery-small.sh
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #232
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
Files :

  • lustre/tests/recovery-small.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #232
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
Files :

  • lustre/tests/recovery-small.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #232
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/recovery-small.sh
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #232
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
Files :

  • lustre/tests/recovery-small.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #232
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/recovery-small.sh
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #232
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/recovery-small.sh
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » i686,client,el5,ofa #232
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
Files :

  • lustre/tests/recovery-small.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #232
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
Files :

  • lustre/tests/recovery-small.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » i686,server,el5,ofa #232
LU-372 add $TMP/mgsactive to indicate the active combined MGS/MDS node

Oleg Drokin : 07d01fce91f2412322b94e2fd2b021689fc0c035
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/recovery-small.sh
Comment by Jian Yu [ 12/Oct/11 ]

Patches have been pushed to both b1_8 and master branch. The issue is resolved.

Generated at Sat Feb 10 01:06:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.