Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
Lustre 2.4.0
-
None
-
2 Lustre server +1 client server
-
3
-
11824
Description
1.mount 1 MDT and 4 OSTs on the Lustre Server1.
2.mount 4 OSTs on the Lustre Server2.
3.Config Lustre failover between the 2 Lustre Servers.
4.mount the Lustre File System on the Lustre Client.
5.Write and Read datas on the Lustre Client.
6.Simulating the Lustre Server1 crashed.
7.The Lustre Server2 crashed unexpectedly when it take the task of the Lustre Server1 ,the call trace info as follow:
LustreError: 137-5: lustre-OST0000_UUID: not available for connect from 192.168.22.202@tcp (no target)
LustreError: Skipped 3 previous similar messages
LDISKFS-fs (sde): recovery complete
LDISKFS-fs (sde): mounted filesystem with ordered data mode. quota=on. Opts:
LustreError: 10026:0:(genops.c:320:class_newdev()) Device MGC192.168.22.50@tcp already exists at 2, won't add
LustreError: 10026:0:(obd_config.c:374:class_attach()) Cannot create device MGC192.168.22.50@tcp of type mgc : -17
LustreError: 10026:0:(obd_mount.c:196:lustre_start_simple()) MGC192.168.22.50@tcp attach error -17
LustreError: 10026:0:(obd_mount_server.c:844:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client.
LustreError: 10026:0:(obd_mount_server.c:1426:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2)
LustreError: 10026:0:(obd_mount_server.c:1456:server_put_super()) no obd lustre-MDT0000
LustreError: 10026:0:(obd_mount_server.c:135:server_deregister_mount()) lustre-MDT0000 not registered
LustreError: 10026:0:(genops.c:1570:obd_exports_barrier()) ASSERTION( list_empty(&obd->obd_exports) ) failed:
LustreError: 10026:0:(genops.c:1570:obd_exports_barrier()) LBUG
Pid: 10026, comm: mount.lustre
Call Trace:
[<ffffffffa070f8a5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa070feb7>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffffa0818d91>] obd_exports_barrier+0x181/0x190 [obdclass]
[<ffffffffa0f23886>] mgs_device_fini+0xf6/0x5c0 [mgs]
[<ffffffffa0843837>] class_cleanup+0x817/0xe00 [obdclass]
[<ffffffffa081ce2c>] ? class_name2dev+0x7c/0xe0 [obdclass]
[<ffffffffa0847e9b>] class_process_config+0x1b6b/0x2f60 [obdclass]
[<ffffffffa0710b90>] ? cfs_alloc+0x30/0x60 [libcfs]
[<ffffffffa0849723>] class_manual_cleanup+0x493/0xe80 [obdclass]
[<ffffffff8147a1fe>] ? _read_unlock+0xe/0x10
[<ffffffffa081ce2c>] ? class_name2dev+0x7c/0xe0 [obdclass]
[<ffffffffa0884b9d>] server_put_super+0x42d/0x2580 [obdclass]
[<ffffffffa0887440>] server_fill_super+0x750/0x1580 [obdclass]
[<ffffffffa0854c98>] lustre_fill_super+0x1d8/0x530 [obdclass]
[<ffffffffa0854ac0>] ? lustre_fill_super+0x0/0x530 [obdclass]
[<ffffffff8114d21f>] get_sb_nodev+0x5f/0xa0
[<ffffffffa084c3f5>] lustre_get_sb+0x25/0x30 [obdclass]
[<ffffffff8114c74b>] vfs_kern_mount+0x7b/0x1b0
[<ffffffff8114c8f2>] do_kern_mount+0x52/0x130
[<ffffffff81168912>] do_mount+0x2d2/0x8c0
[<ffffffff81168f90>] sys_mount+0x90/0xe0
[<ffffffff81002f5b>] system_call_fastpath+0x16/0x1b
Message fromKernel panic - not syncing: LBUG
Pid: 10026, comm: mount.lustre Tainted: PF --------------- 2.6.32-358.6.2.l2.08 #2
Call Trace:
[<ffffffff81476fa7>] ? panic+0xa1/0x163
[<ffffffffa070ff0b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
[<ffffffffa0818d91>] ? obd_exports_barrier+0x181/0x190 [obdclass]
[<ffffffffa0f23886>] ? mgs_device_fini+0xf6/0x5c0 [mgs]
[<ffffffffa0843837>] ? class_cleanup+0x817/0xe00 [obdclass]
[<ffffffffa081ce2c>] ? class_name2dev+0x7c/0xe0 [obdclass]
[<ffffffffa0847e9b>] ? class_process_config+0x1b6b/0x2f60 [obdclass]
syslogd@50:B3:4 [<ffffffffa0710b90>] ? cfs_alloc+0x30/0x60 [libcfs]
[<ffffffffa0849723>] ? class_manual_cleanup+0x493/0xe80 [obdclass]
2:00:01:01 at Se [<ffffffff8147a1fe>] ? _read_unlock+0xe/0x10
[<ffffffffa081ce2c>] ? class_name2dev+0x7c/0xe0 [obdclass]
[<ffffffffa0884b9d>] ? server_put_super+0x42d/0x2580 [obdclass]
[<ffffffffa0887440>] ? server_fill_super+0x750/0x1580 [obdclass]
p 22 12:53:12 .. [<ffffffffa0854c98>] ? lustre_fill_super+0x1d8/0x530 [obdclass]
[<ffffffffa0854ac0>] ? lustre_fill_super+0x0/0x530 [obdclass]
[<ffffffff8114d21f>] ? get_sb_nodev+0x5f/0xa0
[<ffffffffa084c3f5>] ? lustre_get_sb+0x25/0x30 [obdclass]
[<ffffffff8114c74b>] ? vfs_kern_mount+0x7b/0x1b0
[<ffffffff8114c8f2>] ? do_kern_mount+0x52/0x130
[<ffffffff81168912>] ? do_mount+0x2d2/0x8c0
[<ffffffff81168f90>] ? sys_mount+0x90/0xe0
[<ffffffff81002f5b>] ? system_call_fastpath+0x16/0x1b
*******show para for nt_memcpy16*******
src: ffff880285fc4f00, dst: ffffc90112030e70, len: 56
*******show para for panic done*******
ODSP:MSG:BUGON: This stack is bug.
ODSP:MSG:BUGON: Local was taken over by peer. Suspend CPU.
ODSP:MSG:BUGON: Local was taken over by peer. Suspend CPU.
Thank you for your attention to the two problems.
You are right, the two problems have the same stacktrace. Because I pay more attention to the phenomenon of the problem, not the stacktrace. I think the phenomenon of the two problems are different , so the two problems are different.