Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
lustre-master tag-2.12.54
-
3
-
9223372036854775807
Description
MDS hit LBUG when first time has routers in the configuration.
I tried to see if this problem can be reproduced, so I cleaned the update log on the MDS and tried again, soak ran normally.
[365459.643165] Lustre: 58072:0:(ldlm_lib.c:1777:extend_recovery_timer()) soaked-MDT0000: extended recovery timer reaching hard limit: 900, extend: 1 [365459.657843] Lustre: 58072:0:(ldlm_lib.c:1777:extend_recovery_timer()) Skipped 2 previous similar messages [365485.628240] LNet: 57059:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 192.168.1.110@o2ib: 7 seconds [365513.040207] Lustre: 58072:0:(ldlm_lib.c:1777:extend_recovery_timer()) soaked-MDT0000: extended recovery timer reaching hard limit: 900, extend: 1 [365513.054890] Lustre: 58072:0:(ldlm_lib.c:1777:extend_recovery_timer()) Skipped 2 previous similar messages [365554.587216] Lustre: MGS: Connection restored to 192.168.1.110@o2ib (at 192.168.1.110@o2ib) [365555.266913] Lustre: 58072:0:(ldlm_lib.c:1777:extend_recovery_timer()) soaked-MDT0000: extended recovery timer reaching hard limit: 900, extend: 1 [365555.281597] Lustre: 58072:0:(ldlm_lib.c:1777:extend_recovery_timer()) Skipped 3 previous similar messages [365579.714847] Lustre: 58072:0:(ldlm_lib.c:1777:extend_recovery_timer()) soaked-MDT0000: extended recovery timer reaching hard limit: 900, extend: 1 [365579.729526] Lustre: 58072:0:(ldlm_lib.c:1777:extend_recovery_timer()) Skipped 3 previous similar messages [365586.653587] Lustre: soaked-MDT0000: recovery is timed out, evict stale exports [365586.664897] Lustre: soaked-MDT0000: Recovery over after 6:14, of 3 clients 3 recovered and 0 were evicted. [365587.461671] Lustre: soaked-MDT0000: Connection restored to 192.168.1.107@o2ib (at 192.168.1.107@o2ib) [365587.472122] Lustre: Skipped 20 previous similar messages [366665.834308] Lustre: MGS: Connection restored to 3e1153ef-cd3c-4 (at 172.16.1.36@o2ib1) [366665.843291] Lustre: Skipped 8 previous similar messages [366714.243215] Lustre: 57971:0:(mdd_device.c:1811:mdd_changelog_clear()) soaked-MDD0000: No entry for user 1 [366796.507210] Lustre: MGS: Connection restored to a82300e0-23e7-4 (at 172.16.1.40@o2ib1) [366796.516214] Lustre: Skipped 7 previous similar messages [366812.601393] Lustre: MGS: Connection restored to f35a7a93-e477-4 (at 172.16.1.23@o2ib1) [366812.610382] Lustre: Skipped 13 previous similar messages [366894.739493] Lustre: MGS: Connection restored to 205e25f3-b54e-4 (at 172.16.1.17@o2ib1) [366894.748458] Lustre: Skipped 19 previous similar messages [367624.449318] LustreError: 58673:0:(osd_handler.c:2146:osd_object_release()) ASSERTION( !(o->oo_destroyed == 0 && o->oo_inode && o->oo_inode->i_nlink == 0) ) faile d: [367624.465857] LustreError: 58673:0:(osd_handler.c:2146:osd_object_release()) LBUG [367624.474135] Pid: 58673, comm: mdt_out01_002 3.10.0-957.12.2.el7_lustre.x86_64 #1 SMP Wed Jun 5 07:00:13 UTC 2019 [367624.485598] Call Trace: [367624.488439] [<ffffffffc0a017cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [367624.495875] [<ffffffffc0a0187c>] lbug_with_loc+0x4c/0xa0 [libcfs] [367624.502903] [<ffffffffc132268c>] osd_object_release+0x7c/0x80 [osd_ldiskfs] [367624.510909] [<ffffffffc0bf7430>] lu_object_put+0x190/0x3d0 [obdclass] [367624.518362] [<ffffffffc0f400ec>] out_tx_end+0x1ec/0x5c0 [ptlrpc] [367624.525380] [<ffffffffc0f442b2>] out_handle+0x1452/0x1bc0 [ptlrpc] [367624.532547] [<ffffffffc0f3a6da>] tgt_request_handle+0x91a/0x15c0 [ptlrpc] [367624.540382] [<ffffffffc0ede7ee>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc] [367624.549104] [<ffffffffc0ee22dc>] ptlrpc_main+0xbac/0x1560 [ptlrpc] [367624.556257] [<ffffffff818c1d21>] kthread+0xd1/0xe0 [367624.561844] [<ffffffff81f75c37>] ret_from_fork_nospec_end+0x0/0x39 [367624.568961] [<ffffffffffffffff>] 0xffffffffffffffff [367624.574638] Kernel panic - not syncing: LBUG [367624.579502] CPU: 25 PID: 58673 Comm: mdt_out01_002 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.12.2.el7_lustre.x86_64 #1 [367624.593766] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [367624.606390] Call Trace: [367624.609223] [<ffffffff81f63041>] dump_stack+0x19/0x1b [367624.615061] [<ffffffff81f5c750>] panic+0xe8/0x21f [367624.620514] [<ffffffffc0a018cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [367624.627513] [<ffffffffc132268c>] osd_object_release+0x7c/0x80 [osd_ldiskfs] [367624.635501] [<ffffffffc0bf7430>] lu_object_put+0x190/0x3d0 [obdclass] [367624.642919] [<ffffffffc0f400ec>] out_tx_end+0x1ec/0x5c0 [ptlrpc] [367624.649853] [<ffffffffc0f442b2>] out_handle+0x1452/0x1bc0 [ptlrpc] [367624.656967] [<ffffffffc0e8a650>] ? target_send_reply_msg+0x170/0x170 [ptlrpc] [367624.665156] [<ffffffffc0f3a6da>] tgt_request_handle+0x91a/0x15c0 [ptlrpc] [367624.672955] [<ffffffffc0f143e1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [367624.681496] [<ffffffffc0a01bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [367624.689487] [<ffffffffc0ede7ee>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc] [367624.698137] [<ffffffff818ced54>] ? __wake_up+0x44/0x50 [367624.704092] [<ffffffffc0ee22dc>] ptlrpc_main+0xbac/0x1560 [ptlrpc] [367624.711209] [<ffffffffc0ee1730>] ? ptlrpc_register_service+0xfa0/0xfa0 [ptlrpc] [367624.719560] [<ffffffff818c1d21>] kthread+0xd1/0xe0 [367624.725099] [<ffffffff818c1c50>] ? insert_kthread_work+0x40/0x40 [367624.731997] [<ffffffff81f75c37>] ret_from_fork_nospec_begin+0x21/0x21 [367624.739378] [<ffffffff818c1c50>] ? insert_kthread_work+0x40/0x40 [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Initializing cgroup subsys cpuacct [ 0.000000] Linux version 3.10.0-957.12.2.el7_lustr