Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.12.0
-
None
-
CentOS 7.6, Kernel 3.10.0-957.1.3.el7_lustre.x86_64, all clients are 2.12.0
-
3
-
9223372036854775807
Description
We just hit the following MDS crash on Fir (2.12), server fir-md1-s1:
[497493.075367] Lustre: fir-MDT0000: Client 691c85d2-0e39-9e6d-1bfd-ecbaccae5366 (at 10.8.2.27@o2ib6) reconnecting [497594.956880] LustreError: 12324:0:(osp_object.c:1458:osp_declare_create()) ASSERTION( o->opo_reserved == 0 ) failed: [497594.967490] LustreError: 12324:0:(osp_object.c:1458:osp_declare_create()) LBUG [497594.974807] Pid: 12324, comm: mdt01_074 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [497594.984636] Call Trace: [497594.987187] [<ffffffffc0c5e7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [497594.993859] [<ffffffffc0c5e87c>] lbug_with_loc+0x4c/0xa0 [libcfs] [497595.000177] [<ffffffffc17cdcc5>] osp_declare_create+0x5a5/0x5b0 [osp] [497595.006833] [<ffffffffc171539f>] lod_sub_declare_create+0xdf/0x210 [lod] [497595.013748] [<ffffffffc1714904>] lod_qos_prep_create+0x15d4/0x1890 [lod] [497595.020662] [<ffffffffc16f5bba>] lod_declare_instantiate_components+0x9a/0x1d0 [lod] [497595.028614] [<ffffffffc17084d5>] lod_declare_layout_change+0xb65/0x10f0 [lod] [497595.035988] [<ffffffffc177a102>] mdd_declare_layout_change+0x62/0x120 [mdd] [497595.043172] [<ffffffffc1782e52>] mdd_layout_change+0x882/0x1000 [mdd] [497595.049830] [<ffffffffc15ea317>] mdt_layout_change+0x337/0x430 [mdt] [497595.056398] [<ffffffffc15f242e>] mdt_intent_layout+0x7ee/0xcc0 [mdt] [497595.062968] [<ffffffffc15efa18>] mdt_intent_policy+0x2e8/0xd00 [mdt] [497595.069549] [<ffffffffc0f41ec6>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] [497595.076400] [<ffffffffc0f6a8a7>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] [497595.083597] [<ffffffffc0ff1302>] tgt_enqueue+0x62/0x210 [ptlrpc] [497595.089851] [<ffffffffc0ff835a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [497595.096881] [<ffffffffc0f9c92b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [497595.104679] [<ffffffffc0fa025c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [497595.111095] [<ffffffff9d6c1c31>] kthread+0xd1/0xe0 [497595.116096] [<ffffffff9dd74c24>] ret_from_fork_nospec_begin+0xe/0x21 [497595.122657] [<ffffffffffffffff>] 0xffffffffffffffff [497595.127761] Kernel panic - not syncing: LBUG [497595.132122] CPU: 41 PID: 12324 Comm: mdt01_074 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [497595.144451] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [497595.152106] Call Trace: [497595.154649] [<ffffffff9dd61e41>] dump_stack+0x19/0x1b [497595.159882] [<ffffffff9dd5b550>] panic+0xe8/0x21f [497595.164763] [<ffffffffc0c5e8cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [497595.171040] [<ffffffffc17cdcc5>] osp_declare_create+0x5a5/0x5b0 [osp] [497595.177668] [<ffffffffc171539f>] lod_sub_declare_create+0xdf/0x210 [lod] [497595.184541] [<ffffffff9d994d0d>] ? list_del+0xd/0x30 [497595.189693] [<ffffffffc1714904>] lod_qos_prep_create+0x15d4/0x1890 [lod] [497595.196569] [<ffffffff9d81a849>] ? ___slab_alloc+0x209/0x4f0 [497595.202421] [<ffffffffc0d87f7b>] ? class_handle_hash+0xab/0x2f0 [obdclass] [497595.209474] [<ffffffff9d6d67b0>] ? wake_up_state+0x20/0x20 [497595.215152] [<ffffffffc0da7138>] ? lu_buf_alloc+0x48/0x320 [obdclass] [497595.221803] [<ffffffffc0f5be0d>] ? ldlm_cli_enqueue_local+0x27d/0x870 [ptlrpc] [497595.229208] [<ffffffffc16f5bba>] lod_declare_instantiate_components+0x9a/0x1d0 [lod] [497595.237131] [<ffffffffc17084d5>] lod_declare_layout_change+0xb65/0x10f0 [lod] [497595.244442] [<ffffffffc177a102>] mdd_declare_layout_change+0x62/0x120 [mdd] [497595.251584] [<ffffffffc1782e52>] mdd_layout_change+0x882/0x1000 [mdd] [497595.258213] [<ffffffffc15e9b30>] ? mdt_object_lock_internal+0x70/0x3e0 [mdt] [497595.265444] [<ffffffffc15ea317>] mdt_layout_change+0x337/0x430 [mdt] [497595.271978] [<ffffffffc15f242e>] mdt_intent_layout+0x7ee/0xcc0 [mdt] [497595.278543] [<ffffffffc0f8e2f7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc] [497595.285083] [<ffffffffc15efa18>] mdt_intent_policy+0x2e8/0xd00 [mdt] [497595.291637] [<ffffffffc0f40524>] ? ldlm_lock_create+0xa4/0xa40 [ptlrpc] [497595.298442] [<ffffffffc15f1c40>] ? mdt_intent_open+0x350/0x350 [mdt] [497595.304999] [<ffffffffc0f41ec6>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] [497595.311794] [<ffffffffc0c69fa3>] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs] [497595.319018] [<ffffffffc0c6d72e>] ? cfs_hash_add+0xbe/0x1a0 [libcfs] [497595.325490] [<ffffffffc0f6a8a7>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] [497595.332662] [<ffffffffc0f927f0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] [497595.340268] [<ffffffffc0ff1302>] tgt_enqueue+0x62/0x210 [ptlrpc] [497595.346488] [<ffffffffc0ff835a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [497595.353481] [<ffffffffc0fd1a51>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [497595.361139] [<ffffffffc0c5ebde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [497595.368309] [<ffffffffc0f9c92b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [497595.376083] [<ffffffffc0f997b5>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [497595.382959] [<ffffffff9d6d67c2>] ? default_wake_function+0x12/0x20 [497595.389311] [<ffffffff9d6cba9b>] ? __wake_up_common+0x5b/0x90 [497595.395263] [<ffffffffc0fa025c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [497595.401650] [<ffffffffc0f9f760>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [497595.409132] [<ffffffff9d6c1c31>] kthread+0xd1/0xe0 [497595.414097] [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40 [497595.420279] [<ffffffff9dd74c24>] ret_from_fork_nospec_begin+0xe/0x21 [497595.426803] [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40
I do have a vmcore.
Fir has 2 MDS, fir-md1-s1 with MDT0 and MDT2 and fir-md1-s2 with MDT1 and MDT3.
DOM, PFL are enabled and used.
Please let me know if you have any idea how to avoid this.
Thanks
Stephane
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36376/
Subject:
LU-11967mdt: reint layout_change in standard wayProject: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 6a806133a3a53987dbd9c207e0ed82dcd4035bbd