Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.10.6
-
3
-
9223372036854775807
Description
2.10.6-RC3 EL7.6 mlx build #91
Hit LBUG on 1 MDS after system reboot and in recovery
on soak-9
[2018-12-07T07:44:37+00:00] INFO: Report handlers complete [ 330.682671] LNet: HW NUMA nodes: 2, HW CPU cores: 32, npartitions: 2 [ 330.693009] alg: No test for adler32 (adler32-zlib) [ 331.546478] Lustre: Lustre: Build Version: 2.10.6_RC3 [ 331.781927] LNet: Added LNI 192.168.1.109@o2ib [8/256/0/180] [ 332.277308] LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,user_xattr,no_mbcache,nodelalloc [ 332.597960] LustreError: 137-5: soaked-MDT0001_UUID: not available for connect from 192.168.1.105@o2ib (no target). If you are running an HA pair check that the target i s mounted on the other server. [ 332.617898] LustreError: Skipped 1 previous similar message [ 332.685778] Lustre: soaked-MDT0001: Not available for connect from 192.168.1.110@o2ib (not set up) [ 332.796813] Lustre: soaked-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [ 332.828088] LustreError: 12736:0:(llog_osd.c:978:llog_osd_next_block()) soaked-MDT0003-osp-MDT0001: missed desired record? 3 > 1 [ 332.841078] LustreError: 12736:0:(lod_dev.c:419:lod_sub_recovery_thread()) soaked-MDT0003-osp-MDT0001 getting update log failed: rc = -2 [ 337.090525] Lustre: soaked-MDT0001: Connection restored to 192.168.1.105@o2ib (at 192.168.1.105@o2ib) [ 337.100943] Lustre: Skipped 1 previous similar message [ 338.169752] Lustre: soaked-MDT0001: Connection restored to b2e346f6-066a-02d8-6774-b6710264a342 (at 192.168.1.123@o2ib) [ 338.184884] Lustre: 12737:0:(ldlm_lib.c:2059:target_recovery_overseer()) recovery is aborted, evict exports in recovery [ 338.197356] Lustre: soaked-MDT0001: disconnecting 27 stale clients [ 339.169010] LustreError: 12748:0:(lu_object.h:862:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed: [ 339.182782] LustreError: 12748:0:(lu_object.h:862:lu_object_attr()) LBUG [ 339.190342] Pid: 12748, comm: mdt00_005 3.10.0-957.el7_lustre.x86_64 #1 SMP Fri Nov 30 18:46:05 UTC 2018 [ 339.201005] Call Trace: [ 339.203791] [<ffffffffc0c687cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 339.211192] [<ffffffffc0c6887c>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 339.218193] [<ffffffffc14959f7>] mo_invalidate.part.29+0x0/0x36 [mdt] [ 339.225606] [<ffffffffc1455d5a>] mdt_intent_layout+0xfca/0xfe0 [mdt] [ 339.232927] [<ffffffffc1459681>] mdt_intent_policy+0x441/0xc70 [mdt] [ 339.240246] [<ffffffffc0fa12ba>] ldlm_lock_enqueue+0x38a/0x980 [ptlrpc] [ 339.247934] [<ffffffffc0fcab53>] ldlm_handle_enqueue0+0x9d3/0x16a0 [ptlrpc] [ 339.255982] [<ffffffffc10504f2>] tgt_enqueue+0x62/0x210 [ptlrpc] [ 339.262969] [<ffffffffc105442a>] tgt_request_handle+0x92a/0x1370 [ptlrpc] [ 339.270812] [<ffffffffc0ffce5b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] [ 339.279529] [<ffffffffc10005a2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [ 339.286697] [<ffffffff8f8c1c31>] kthread+0xd1/0xe0 [ 339.292269] [<ffffffff8ff74c37>] ret_from_fork_nospec_end+0x0/0x39 [ 339.299373] [<ffffffffffffffff>] 0xffffffffffffffff [ 339.305038] Kernel panic - not syncing: LBUG [ 339.309856] CPU: 7 PID: 12748 Comm: mdt00_005 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7_lustre.x86_64 #1 [ 339.323162] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [ 339.338236] Call Trace: [ 339.343554] [<ffffffff8ff61dc1>] dump_stack+0x19/0x1b [ 339.351863] [<ffffffff8ff5b4d0>] panic+0xe8/0x21f [ 339.359694] [<ffffffffc0c688cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [ 339.369077] [<ffffffffc14959f7>] lu_object_attr.isra.26.part.27+0x36/0x36 [mdt] [ 339.379806] [<ffffffffc1455d5a>] mdt_intent_layout+0xfca/0xfe0 [mdt] [ 339.389463] [<ffffffffc1459681>] mdt_intent_policy+0x441/0xc70 [mdt] [ 339.399140] [<ffffffffc0fa81db>] ? ldlm_resource_get+0xab/0xa60 [ptlrpc] [ 339.409196] [<ffffffffc0fa12ba>] ldlm_lock_enqueue+0x38a/0x980 [ptlrpc] [ 339.413469] Lustre: soaked-MDT0001: Connection restored to 0f40ef87-b54f-1f70-d2cc-cb9f522aad77 (at 192.168.1.119@o2ib) [ 339.413472] Lustre: Skipped 6 previous similar messages [ 339.441901] [<ffffffffc0fcab53>] ldlm_handle_enqueue0+0x9d3/0x16a0 [ptlrpc] [ 339.452226] [<ffffffffc0ff2e10>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] [ 339.463035] [<ffffffffc10504f2>] tgt_enqueue+0x62/0x210 [ptlrpc] [ 339.472295] [<ffffffffc105442a>] tgt_request_handle+0x92a/0x1370 [ptlrpc] [ 339.482324] [<ffffffffc0ffce5b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] [ 339.493185] [<ffffffffc0ff9488>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [ 339.502977] [<ffffffff8f8d67c2>] ? default_wake_function+0x12/0x20 [ 339.512214] [<ffffffff8f8cba9b>] ? __wake_up_common+0x5b/0x90 [ 339.520955] [<ffffffffc10005a2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [ 339.530165] [<ffffffffc0fffb10>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc] [ 339.540551] [<ffffffff8f8c1c31>] kthread+0xd1/0xe0 [ 339.548088] [<ffffffff8f8c1b60>] ? insert_kthread_work+0x40/0x40 [ 339.556962] [<ffffffff8ff74c37>] ret_from_fork_nospec_begin+0x21/0x21 [ 339.566286] [<ffffffff8f8c1b60>] ? insert_kthread_work+0x40/0x40 [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Initializing cgroup subsys cpuacct