Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-372

replay-single test_61d: FAIL: cannot restart mgs

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.1.0, Lustre 1.8.6
    • Lustre 2.1.0, Lustre 1.8.6
    • None
    • 3
    • 4942

    Description

      While running replay-single tests under the failover configuration, test 61d failed as follows:

      == test 61d: error in llog_setup should cleanup the llog context correctly == 08:53:13
      fail_loc=0x80000605
      Starting mgs: -o user_xattr,acl  /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds
      client-1-ib: mount.lustre: mount /dev/disk/by-id/scsi-1IET_00010001 at /mnt/mds failed: Invalid argument
      client-1-ib: This may have multiple causes.
      client-1-ib: Are the mount options correct?
      client-1-ib: Check the syslog for more info.
      mount -t lustre  /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds
      Start of /dev/disk/by-id/scsi-1IET_00010001 on mgs failed 22
      fail_loc=0
      Starting mgs: -o user_xattr,acl  /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds
      client-1-ib: mount.lustre: mount /dev/disk/by-id/scsi-1IET_00010001 at /mnt/mds failed: Invalid argument
      client-1-ib: This may have multiple causes.
      client-1-ib: Are the mount options correct?
      client-1-ib: Check the syslog for more info.
      mount -t lustre  /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds
      Start of /dev/disk/by-id/scsi-1IET_00010001 on mgs failed 22
       replay-single test_61d: @@@@@@ FAIL: cannot restart mgs 
      Dumping lctl log to /home/yujian/test_logs/2011-05-25/072205/replay-single.test_61d.*.1306338816.log
      tar: Removing leading `/' from member names
      /home/yujian/test_logs/2011-05-25/072205/replay-single-1306338816.tar.bz2
      Resetting fail_loc on all nodes...done.
      FAIL   (33s)
      

      Maloo report: https://maloo.whamcloud.com/test_sets/172b0dd4-8745-11e0-b4df-52540025f9af

      This is a test script issue that "do_facet mgs" did not figure out the active MGS node while the MGS and MDS nodes were combined and had the same failover pair.

      From the Maloo report we could see, the MDS node had been failed over to client-2-ib in test 61b. However, the "do_facet mgs" called by "stop mgs" and "start mgs" in test 61d still thought client-1-ib was the active one. We need add a $TMP/mgsactive file to indicate which is the active partner for the combined MGS/MDS node, and then "facet_active mgs" called by "do_facet mgs" could figure out the active MGS node correctly.

      Attachments

        Activity

          People

            yujian Jian Yu
            yujian Jian Yu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: