[LU-3101] Interop 1.8.9<->2.4 failure on test suite replay-single test_61d: cannot restart mgs Created: 03/Apr/13 Updated: 19/Aug/13 Resolved: 19/Aug/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.1, Lustre 2.5.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
client: 1.8.9 |
||
| Severity: | 3 |
| Rank (Obsolete): | 7538 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/a0617196-9725-11e2-9ec7-52540035b04c. The sub-test test_61d failed with the following error:
MDS console shows: 00:09:37:Lustre: DEBUG MARKER: == replay-single test 61d: error in llog_setup should cleanup the llog context correctly == 00:09:35 (1364368175) 00:09:37:Lustre: DEBUG MARKER: grep -c /mnt/mds' ' /proc/mounts 00:09:37:Lustre: DEBUG MARKER: umount -d /mnt/mds 00:09:49:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 00:09:50:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000605 00:09:50:Lustre: DEBUG MARKER: mkdir -p /mnt/mds 00:09:50:Lustre: DEBUG MARKER: mkdir -p /mnt/mds; mount -t lustre -o loop /dev/lvm-MDS/P1 /mnt/mds 00:09:50:LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quota=on. Opts: 00:09:50:Lustre: *** cfs_fail_loc=605, val=0*** 00:09:50:LustreError: 5059:0:(llog_obd.c:207:llog_setup()) MGS: ctxt 0 lop_setup=ffffffffa0631ce0 failed: rc = -95 00:09:50:LustreError: 5059:0:(obd_config.c:572:class_setup()) setup MGS failed (-95) 00:09:50:LustreError: 5059:0:(obd_mount.c:378:lustre_start_simple()) MGS setup error -95 00:09:50:LustreError: 15e-a: Failed to start MGS 'MGS' (-95). Is the 'mgs' module loaded? 00:09:50:LustreError: 5059:0:(obd_mount.c:1379:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client. 00:09:50:LustreError: 5059:0:(obd_mount.c:2115:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2) 00:09:50:LustreError: 5059:0:(obd_mount.c:2145:server_put_super()) no obd lustre-MDT0000 00:09:51:LustreError: 5059:0:(obd_mount.c:139:server_deregister_mount()) lustre-MDT0000 not registered 00:09:51:LustreError: 5059:0:(obd_mount.c:2989:lustre_fill_super()) Unable to mount /dev/loop0 (-95) 00:09:51:Lustre: DEBUG MARKER: lctl set_param fail_loc=0 00:09:51:Lustre: DEBUG MARKER: mkdir -p /mnt/mds 00:09:51:Lustre: DEBUG MARKER: mkdir -p /mnt/mds; mount -t lustre -o loop /dev/lvm-MDS/P1 /mnt/mds 00:09:51:LustreError: 15d-9: The MGS service was already started from server 00:09:51:LustreError: 5228:0:(obd_mount.c:1379:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client. 00:09:51:LustreError: 5228:0:(obd_mount.c:2115:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2) 00:09:51:LustreError: 5228:0:(obd_mount.c:2145:server_put_super()) no obd lustre-MDT0000 00:09:51:LustreError: 5228:0:(obd_mount.c:139:server_deregister_mount()) lustre-MDT0000 not registered 00:09:51:LustreError: 5228:0:(obd_mount.c:2989:lustre_fill_super()) Unable to mount (-114) 00:09:51:Lustre: DEBUG MARKER: /usr/sbin/lctl mark replay-single test_61d: @@@@@@ FAIL: cannot restart mgs 00:09:51:Lustre: DEBUG MARKER: replay-single test_61d: @@@@@@ FAIL: cannot restart mgs 00:09:51:Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2013-03-26/lustre-master-el6-x86_64-vs-lustre-b1_8-el6-x86_64--full--2_4_1__1346__-70011898121780-141237/replay-single.test_61d.debug_log.$(hostname -s).1364368184.log; 00:09:51: dmesg > /logdir/test_logs/2013-03-26/lu 00:09:51:Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 2>/dev/null || true 00:09:51:Lustre: DEBUG MARKER: rc=$([ -f /proc/sys/lnet/catastrophe ] && echo $(< /proc/sys/lnet/catastrophe) || echo 0); 00:09:51:if [ $rc -ne 0 ]; then echo $(hostname): $rc; fi 00:09:51:exit $rc; 00:09:51:Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-single test 62: don\'t mis-drop resent replay == 00:09:46 \(1364368186\) |
| Comments |
| Comment by Peter Jones [ 04/Apr/13 ] |
|
Hongchao Could you please investigate? Thanks Peter |
| Comment by Hongchao Zhang [ 12/Apr/13 ] |
|
the issue is reproduced on master locally, and it's caused by the wrong cleanup after MGS failed to start up. |
| Comment by Jian Yu [ 14/Aug/13 ] |
|
Lustre client build: http://build.whamcloud.com/job/lustre-b1_8/258/ (1.8.9-wc1) replay-single test 61d hit the same failure: Hi Oleg, |
| Comment by Hongchao Zhang [ 19/Aug/13 ] |
|
the patch is landed on master |
| Comment by Jian Yu [ 19/Aug/13 ] |
|
The patch was also cherry-picked to Lustre b2_4 branch. |