Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.12.0, Lustre 2.10.4
-
Soak cluster - lustre-master-ib build 81
-
3
-
9223372036854775807
Description
This also occurred under 2.10.4, but I missed it due to some syslog issues.
System failed over soak-8 (MDS) to soak-9, upon failback, soak-9 crashed.
vmcore available.
[67387.761484] LustreError: 167-0: soaked-MDT0000-osp-MDT0001: This client was evicted by soaked-MDT0000; in progress operations using this service will fail. [67387.810434] LustreError: 8279:0:(mdt_reint.c:2181:mdt_reint_rename_or_migrate()) soaked-MDT0001: can't lock FS for rename: rc = -5 [67387.810699] LNet: Service thread pid 8242 completed after 336.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [67387.810700] LNet: Service thread pid 8615 completed after 335.77s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [67387.811812] LustreError: 8615:0:(lu_object.h:688:lu_object_get()) ASSERTION( atomic_read(&o->lo_header->loh_ref) > 0 ) failed: [67387.811814] LustreError: 8615:0:(lu_object.h:688:lu_object_get()) LBUG [67387.811815] Pid: 8615, comm: mdt01_014 [67387.811816] Call Trace: [67387.811831] [<ffffffffc0a567ae>] libcfs_call_trace+0x4e/0x60 [libcfs] [67387.811837] [<ffffffffc0a5683c>] lbug_with_loc+0x4c/0xb0 [libcfs] [67387.811847] LustreError: 8692:0:(lu_object.h:688:lu_object_get()) ASSERTION( atomic_read(&o->lo_header->loh_ref) > 0 ) failed: [67387.811850] LustreError: 8692:0:(lu_object.h:688:lu_object_get()) LBUG [67387.811851] Pid: 8692, comm: mdt01_017 [67387.811851] Call Trace: [67387.811860] [<ffffffffc12527c9>] err_serious.part.38+0x0/0x36 [mdt] [67387.811861] [<ffffffffc0a567ae>] libcfs_call_trace+0x4e/0x60 [libcfs] [67387.811867] [<ffffffffc0a5683c>] lbug_with_loc+0x4c/0xb0 [libcfs] [67387.811873] [<ffffffffc1204146>] mdt_remote_object_lock_try+0x736/0x770 [mdt] [67387.811884] [<ffffffffc12527c9>] err_serious.part.38+0x0/0x36 [mdt] [67387.811895] [<ffffffffc1204146>] mdt_remote_object_lock_try+0x736/0x770 [mdt] [67387.811901] [<ffffffffc0b74be4>] ? lprocfs_stats_lock+0x24/0xd0 [obdclass] [67387.811912] [<ffffffffc12041aa>] mdt_remote_object_lock+0x2a/0x30 [mdt] [67387.811918] [<ffffffffc0b74be4>] ? lprocfs_stats_lock+0x24/0xd0 [obdclass] [67387.811927] [<ffffffffc122ccf9>] mdt_reint_open+0x2769/0x3270 [mdt] [67387.811928] [<ffffffffc12041aa>] mdt_remote_object_lock+0x2a/0x30 [mdt] [67387.811940] [<ffffffffc122ccf9>] mdt_reint_open+0x2769/0x3270 [mdt] [67387.811949] [<ffffffffc0baae51>] ? upcall_cache_get_entry+0x211/0x8e0 [obdclass] [67387.811965] [<ffffffffc0baae51>] ? upcall_cache_get_entry+0x211/0x8e0 [obdclass] [67387.811968] [<ffffffffc0bb0b2e>] ? lu_ucred+0x1e/0x30 [obdclass] [67387.811980] [<ffffffffc120fd35>] ? mdt_ucred+0x15/0x20 [mdt] [67387.811984] [<ffffffffc0bb0b2e>] ? lu_ucred+0x1e/0x30 [obdclass] [67387.811990] [<ffffffffc12105c1>] ? mdt_root_squash+0x21/0x430 [mdt] [67387.811995] [<ffffffffc120fd35>] ? mdt_ucred+0x15/0x20 [mdt] [67387.812002] [<ffffffffc12208e3>] mdt_reint_rec+0x83/0x210 [mdt] [67387.812006] [<ffffffffc12105c1>] ? mdt_root_squash+0x21/0x430 [mdt] [67387.812012] [<ffffffffc120020b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] [67387.812017] [<ffffffffc12208e3>] mdt_reint_rec+0x83/0x210 [mdt] [67387.812022] [<ffffffffc120c787>] mdt_intent_reint+0x157/0x420 [mdt] [67387.812027] [<ffffffffc120020b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] [67387.812032] [<ffffffffc1203365>] mdt_intent_opc+0x455/0xae0 [mdt] [67387.812036] [<ffffffffc120c787>] mdt_intent_reint+0x157/0x420 [mdt] [67387.812046] [<ffffffffc1203365>] mdt_intent_opc+0x455/0xae0 [mdt] [67387.812090] [<ffffffffc0dd3920>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc] [67387.812099] [<ffffffffc0dd3920>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc] [67387.812102] [<ffffffffc120afb3>] mdt_intent_policy+0x1a3/0x360 [mdt] [67387.812111] [<ffffffffc120afb3>] mdt_intent_policy+0x1a3/0x360 [mdt] [67387.812127] [<ffffffffc0d83f1e>] ldlm_lock_enqueue+0x34e/0xa50 [ptlrpc] [67387.812136] [<ffffffffc0a6a63e>] ? cfs_hash_add+0xbe/0x1a0 [libcfs] [67387.812137] [<ffffffffc0d83f1e>] ldlm_lock_enqueue+0x34e/0xa50 [ptlrpc] [67387.812145] [<ffffffffc0a6a63e>] ? cfs_hash_add+0xbe/0x1a0 [libcfs] [67387.812164] [<ffffffffc0dabb13>] ldlm_handle_enqueue0+0x8f3/0x1410 [ptlrpc] [67387.812173] [<ffffffffc0dabb13>] ldlm_handle_enqueue0+0x8f3/0x1410 [ptlrpc] [67387.812198] [<ffffffffc0dd39a0>] ? lustre_swab_ldlm_request+0x0/0x30 [ptlrpc] [67387.812204] [<ffffffffc0dd39a0>] ? lustre_swab_ldlm_request+0x0/0x30 [ptlrpc] [67387.812238] [<ffffffffc0e33392>] tgt_enqueue+0x62/0x210 [ptlrpc] [67387.812238] [<ffffffffc0e33392>] tgt_enqueue+0x62/0x210 [ptlrpc] [67387.812245] [<ffffffffc0e33392>] tgt_enqueue+0x62/0x210 [ptlrpc] [67387.812288] [<ffffffffc0e3994a>] tgt_request_handle+0x92a/0x13b0 [ptlrpc] [67387.812294] [<ffffffffc0e3994a>] tgt_request_handle+0x92a/0x13b0 [ptlrpc] [67387.812322] [<ffffffffc0ddda53>] ptlrpc_server_handle_request+0x253/0xab0 [ptlrpc] [67387.812330] [<ffffffffc0ddda53>] ptlrpc_server_handle_request+0x253/0xab0 [ptlrpc] [67387.812355] [<ffffffffc0ddb5d8>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [67387.812361] [<ffffffff810c7c82>] ? default_wake_function+0x12/0x20 [67387.812364] [<ffffffff810bdc4b>] ? __wake_up_common+0x5b/0x90 [67387.812366] [<ffffffffc0ddb5d8>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [67387.812370] [<ffffffff810c7c82>] ? default_wake_function+0x12/0x20 [67387.812372] [<ffffffff810bdc4b>] ? __wake_up_common+0x5b/0x90 [67387.812397] [<ffffffffc0de1202>] ptlrpc_main+0xa92/0x1f40 [ptlrpc] [67387.812405] [<ffffffffc0de1202>] ptlrpc_main+0xa92/0x1f40 [ptlrpc] [67387.812429] [<ffffffffc0de0770>] ? ptlrpc_main+0x0/0x1f40 [ptlrpc] [67387.812433] [<ffffffff810b4031>] kthread+0xd1/0xe0 [67387.812435] [<ffffffff810b3f60>] ? kthread+0x0/0xe0 [67387.812439] [<ffffffff816c0577>] ret_from_fork+0x77/0xb0 [67387.812440] [<ffffffffc0de0770>] ? ptlrpc_main+0x0/0x1f40 [ptlrpc] [67387.812441] [<ffffffff810b3f60>] ? kthread+0x0/0xe0 [67387.812442] [67387.812443] [<ffffffff810b4031>] kthread+0xd1/0xe0 [67387.812443] Kernel panic - not syncing: LBUG [67387.812446] CPU: 25 PID: 8615 Comm: mdt01_014 Tainted: P W OE ------------ 3.10.0-693.21.1.el7_lustre.x86_64 #1 [67387.812447] [<ffffffff810b3f60>] ? kthread+0x0/0xe0 [67387.812448] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [67387.812449] Call Trace: [67387.812450] [<ffffffff816c0577>] ret_from_fork+0x77/0xb0 [67387.812468] [<ffffffff816ae7c8>] dump_stack+0x19/0x1b [67387.812469] [<ffffffff810b3f60>] ? kthread+0x0/0xe0 [67387.812471] [<ffffffff816a8634>] panic+0xe8/0x21f [67387.812472] [67387.812478] [<ffffffffc0a56854>] lbug_with_loc+0x64/0xb0 [libcfs] [67387.812492] [<ffffffffc12527c9>] lu_object_get.isra.33.part.34+0x36/0x36 [mdt] [67387.812502] [<ffffffffc1204146>] mdt_remote_object_lock_try+0x736/0x770 [mdt] [67387.812521] [<ffffffffc0b74be4>] ? lprocfs_stats_lock+0x24/0xd0 [obdclass] [67387.812531] [<ffffffffc12041aa>] mdt_remote_object_lock+0x2a/0x30 [mdt] [67387.812543] [<ffffffffc122ccf9>] mdt_reint_open+0x2769/0x3270 [mdt] [67387.812563] [<ffffffffc0baae51>] ? upcall_cache_get_entry+0x211/0x8e0 [obdclass] [67387.812582] [<ffffffffc0bb0b2e>] ? lu_ucred+0x1e/0x30 [obdclass] [67387.812592] [<ffffffffc120fd35>] ? mdt_ucred+0x15/0x20 [mdt] .....
Full vmcore at /scratch/dumps/soak-9.spirit.hpdd.intel.com/10.10.1.109-2018-04-24-18:20:36
Attachments
Issue Links
- mentioned in
-
Page Loading...