Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3101

Interop 1.8.9<->2.4 failure on test suite replay-single test_61d: cannot restart mgs

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.4.1, Lustre 2.5.0
    • Lustre 2.4.0
    • None
    • client: 1.8.9
      server: lustre-master build #1346
    • 3
    • 7538

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/a0617196-9725-11e2-9ec7-52540035b04c.

      The sub-test test_61d failed with the following error:

      cannot restart mgs

      MDS console shows:

      00:09:37:Lustre: DEBUG MARKER: == replay-single test 61d: error in llog_setup should cleanup the llog context correctly == 00:09:35 (1364368175)
      00:09:37:Lustre: DEBUG MARKER: grep -c /mnt/mds' ' /proc/mounts
      00:09:37:Lustre: DEBUG MARKER: umount -d /mnt/mds
      00:09:49:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      00:09:50:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000605
      00:09:50:Lustre: DEBUG MARKER: mkdir -p /mnt/mds
      00:09:50:Lustre: DEBUG MARKER: mkdir -p /mnt/mds; mount -t lustre -o loop  /dev/lvm-MDS/P1 /mnt/mds
      00:09:50:LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quota=on. Opts: 
      00:09:50:Lustre: *** cfs_fail_loc=605, val=0***
      00:09:50:LustreError: 5059:0:(llog_obd.c:207:llog_setup()) MGS: ctxt 0 lop_setup=ffffffffa0631ce0 failed: rc = -95
      00:09:50:LustreError: 5059:0:(obd_config.c:572:class_setup()) setup MGS failed (-95)
      00:09:50:LustreError: 5059:0:(obd_mount.c:378:lustre_start_simple()) MGS setup error -95
      00:09:50:LustreError: 15e-a: Failed to start MGS 'MGS' (-95). Is the 'mgs' module loaded?
      00:09:50:LustreError: 5059:0:(obd_mount.c:1379:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client.
      00:09:50:LustreError: 5059:0:(obd_mount.c:2115:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2)
      00:09:50:LustreError: 5059:0:(obd_mount.c:2145:server_put_super()) no obd lustre-MDT0000
      00:09:51:LustreError: 5059:0:(obd_mount.c:139:server_deregister_mount()) lustre-MDT0000 not registered
      00:09:51:LustreError: 5059:0:(obd_mount.c:2989:lustre_fill_super()) Unable to mount /dev/loop0 (-95)
      00:09:51:Lustre: DEBUG MARKER: lctl set_param fail_loc=0
      00:09:51:Lustre: DEBUG MARKER: mkdir -p /mnt/mds
      00:09:51:Lustre: DEBUG MARKER: mkdir -p /mnt/mds; mount -t lustre -o loop  /dev/lvm-MDS/P1 /mnt/mds
      00:09:51:LustreError: 15d-9: The MGS service was already started from server
      00:09:51:LustreError: 5228:0:(obd_mount.c:1379:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client.
      00:09:51:LustreError: 5228:0:(obd_mount.c:2115:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2)
      00:09:51:LustreError: 5228:0:(obd_mount.c:2145:server_put_super()) no obd lustre-MDT0000
      00:09:51:LustreError: 5228:0:(obd_mount.c:139:server_deregister_mount()) lustre-MDT0000 not registered
      00:09:51:LustreError: 5228:0:(obd_mount.c:2989:lustre_fill_super()) Unable to mount  (-114)
      00:09:51:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-single test_61d: @@@@@@ FAIL: cannot restart mgs 
      00:09:51:Lustre: DEBUG MARKER: replay-single test_61d: @@@@@@ FAIL: cannot restart mgs
      00:09:51:Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2013-03-26/lustre-master-el6-x86_64-vs-lustre-b1_8-el6-x86_64--full--2_4_1__1346__-70011898121780-141237/replay-single.test_61d.debug_log.$(hostname -s).1364368184.log;
      00:09:51:         dmesg > /logdir/test_logs/2013-03-26/lu
      00:09:51:Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 2>/dev/null || true
      00:09:51:Lustre: DEBUG MARKER: rc=$([ -f /proc/sys/lnet/catastrophe ] && echo $(< /proc/sys/lnet/catastrophe) || echo 0);
      00:09:51:if [ $rc -ne 0 ]; then echo $(hostname): $rc; fi
      00:09:51:exit $rc;
      00:09:51:Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-single test 62: don\'t mis-drop resent replay == 00:09:46 \(1364368186\)
      

      Attachments

        Activity

          [LU-3101] Interop 1.8.9<->2.4 failure on test suite replay-single test_61d: cannot restart mgs
          yujian Jian Yu added a comment -

          The patch was also cherry-picked to Lustre b2_4 branch.

          yujian Jian Yu added a comment - The patch was also cherry-picked to Lustre b2_4 branch.

          the patch is landed on master

          hongchao.zhang Hongchao Zhang added a comment - the patch is landed on master
          yujian Jian Yu added a comment -

          Lustre client build: http://build.whamcloud.com/job/lustre-b1_8/258/ (1.8.9-wc1)
          Lustre server build: http://build.whamcloud.com/job/lustre-b2_4/31/

          replay-single test 61d hit the same failure:
          https://maloo.whamcloud.com/test_sets/cf9987d6-0486-11e3-90ba-52540035b04c

          Hi Oleg,
          Could you please cherry-pick the patch to Lustre b2_4 branch? Thanks.

          yujian Jian Yu added a comment - Lustre client build: http://build.whamcloud.com/job/lustre-b1_8/258/ (1.8.9-wc1) Lustre server build: http://build.whamcloud.com/job/lustre-b2_4/31/ replay-single test 61d hit the same failure: https://maloo.whamcloud.com/test_sets/cf9987d6-0486-11e3-90ba-52540035b04c Hi Oleg, Could you please cherry-pick the patch to Lustre b2_4 branch? Thanks.

          the issue is reproduced on master locally, and it's caused by the wrong cleanup after MGS failed to start up.
          the patch is tracked at http://review.whamcloud.com/#change,6035

          hongchao.zhang Hongchao Zhang added a comment - the issue is reproduced on master locally, and it's caused by the wrong cleanup after MGS failed to start up. the patch is tracked at http://review.whamcloud.com/#change,6035
          pjones Peter Jones added a comment -

          Hongchao

          Could you please investigate?

          Thanks

          Peter

          pjones Peter Jones added a comment - Hongchao Could you please investigate? Thanks Peter

          People

            hongchao.zhang Hongchao Zhang
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: