[LU-11967] MDS LBUG ASSERTION( o->opo_reserved == 0 ) failed Created: 12/Feb/19 Updated: 21/Nov/19 Resolved: 04/Oct/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Stephane Thiell | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 7.6, Kernel 3.10.0-957.1.3.el7_lustre.x86_64, all clients are 2.12.0 |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
We just hit the following MDS crash on Fir (2.12), server fir-md1-s1: [497493.075367] Lustre: fir-MDT0000: Client 691c85d2-0e39-9e6d-1bfd-ecbaccae5366 (at 10.8.2.27@o2ib6) reconnecting [497594.956880] LustreError: 12324:0:(osp_object.c:1458:osp_declare_create()) ASSERTION( o->opo_reserved == 0 ) failed: [497594.967490] LustreError: 12324:0:(osp_object.c:1458:osp_declare_create()) LBUG [497594.974807] Pid: 12324, comm: mdt01_074 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [497594.984636] Call Trace: [497594.987187] [<ffffffffc0c5e7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [497594.993859] [<ffffffffc0c5e87c>] lbug_with_loc+0x4c/0xa0 [libcfs] [497595.000177] [<ffffffffc17cdcc5>] osp_declare_create+0x5a5/0x5b0 [osp] [497595.006833] [<ffffffffc171539f>] lod_sub_declare_create+0xdf/0x210 [lod] [497595.013748] [<ffffffffc1714904>] lod_qos_prep_create+0x15d4/0x1890 [lod] [497595.020662] [<ffffffffc16f5bba>] lod_declare_instantiate_components+0x9a/0x1d0 [lod] [497595.028614] [<ffffffffc17084d5>] lod_declare_layout_change+0xb65/0x10f0 [lod] [497595.035988] [<ffffffffc177a102>] mdd_declare_layout_change+0x62/0x120 [mdd] [497595.043172] [<ffffffffc1782e52>] mdd_layout_change+0x882/0x1000 [mdd] [497595.049830] [<ffffffffc15ea317>] mdt_layout_change+0x337/0x430 [mdt] [497595.056398] [<ffffffffc15f242e>] mdt_intent_layout+0x7ee/0xcc0 [mdt] [497595.062968] [<ffffffffc15efa18>] mdt_intent_policy+0x2e8/0xd00 [mdt] [497595.069549] [<ffffffffc0f41ec6>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] [497595.076400] [<ffffffffc0f6a8a7>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] [497595.083597] [<ffffffffc0ff1302>] tgt_enqueue+0x62/0x210 [ptlrpc] [497595.089851] [<ffffffffc0ff835a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [497595.096881] [<ffffffffc0f9c92b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [497595.104679] [<ffffffffc0fa025c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [497595.111095] [<ffffffff9d6c1c31>] kthread+0xd1/0xe0 [497595.116096] [<ffffffff9dd74c24>] ret_from_fork_nospec_begin+0xe/0x21 [497595.122657] [<ffffffffffffffff>] 0xffffffffffffffff [497595.127761] Kernel panic - not syncing: LBUG [497595.132122] CPU: 41 PID: 12324 Comm: mdt01_074 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [497595.144451] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [497595.152106] Call Trace: [497595.154649] [<ffffffff9dd61e41>] dump_stack+0x19/0x1b [497595.159882] [<ffffffff9dd5b550>] panic+0xe8/0x21f [497595.164763] [<ffffffffc0c5e8cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [497595.171040] [<ffffffffc17cdcc5>] osp_declare_create+0x5a5/0x5b0 [osp] [497595.177668] [<ffffffffc171539f>] lod_sub_declare_create+0xdf/0x210 [lod] [497595.184541] [<ffffffff9d994d0d>] ? list_del+0xd/0x30 [497595.189693] [<ffffffffc1714904>] lod_qos_prep_create+0x15d4/0x1890 [lod] [497595.196569] [<ffffffff9d81a849>] ? ___slab_alloc+0x209/0x4f0 [497595.202421] [<ffffffffc0d87f7b>] ? class_handle_hash+0xab/0x2f0 [obdclass] [497595.209474] [<ffffffff9d6d67b0>] ? wake_up_state+0x20/0x20 [497595.215152] [<ffffffffc0da7138>] ? lu_buf_alloc+0x48/0x320 [obdclass] [497595.221803] [<ffffffffc0f5be0d>] ? ldlm_cli_enqueue_local+0x27d/0x870 [ptlrpc] [497595.229208] [<ffffffffc16f5bba>] lod_declare_instantiate_components+0x9a/0x1d0 [lod] [497595.237131] [<ffffffffc17084d5>] lod_declare_layout_change+0xb65/0x10f0 [lod] [497595.244442] [<ffffffffc177a102>] mdd_declare_layout_change+0x62/0x120 [mdd] [497595.251584] [<ffffffffc1782e52>] mdd_layout_change+0x882/0x1000 [mdd] [497595.258213] [<ffffffffc15e9b30>] ? mdt_object_lock_internal+0x70/0x3e0 [mdt] [497595.265444] [<ffffffffc15ea317>] mdt_layout_change+0x337/0x430 [mdt] [497595.271978] [<ffffffffc15f242e>] mdt_intent_layout+0x7ee/0xcc0 [mdt] [497595.278543] [<ffffffffc0f8e2f7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc] [497595.285083] [<ffffffffc15efa18>] mdt_intent_policy+0x2e8/0xd00 [mdt] [497595.291637] [<ffffffffc0f40524>] ? ldlm_lock_create+0xa4/0xa40 [ptlrpc] [497595.298442] [<ffffffffc15f1c40>] ? mdt_intent_open+0x350/0x350 [mdt] [497595.304999] [<ffffffffc0f41ec6>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] [497595.311794] [<ffffffffc0c69fa3>] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs] [497595.319018] [<ffffffffc0c6d72e>] ? cfs_hash_add+0xbe/0x1a0 [libcfs] [497595.325490] [<ffffffffc0f6a8a7>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] [497595.332662] [<ffffffffc0f927f0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] [497595.340268] [<ffffffffc0ff1302>] tgt_enqueue+0x62/0x210 [ptlrpc] [497595.346488] [<ffffffffc0ff835a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [497595.353481] [<ffffffffc0fd1a51>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [497595.361139] [<ffffffffc0c5ebde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [497595.368309] [<ffffffffc0f9c92b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [497595.376083] [<ffffffffc0f997b5>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [497595.382959] [<ffffffff9d6d67c2>] ? default_wake_function+0x12/0x20 [497595.389311] [<ffffffff9d6cba9b>] ? __wake_up_common+0x5b/0x90 [497595.395263] [<ffffffffc0fa025c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [497595.401650] [<ffffffffc0f9f760>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [497595.409132] [<ffffffff9d6c1c31>] kthread+0xd1/0xe0 [497595.414097] [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40 [497595.420279] [<ffffffff9dd74c24>] ret_from_fork_nospec_begin+0xe/0x21 [497595.426803] [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40 I do have a vmcore. Fir has 2 MDS, fir-md1-s1 with MDT0 and MDT2 and fir-md1-s2 with MDT1 and MDT3.
DOM, PFL are enabled and used.
Please let me know if you have any idea how to avoid this. Thanks |
| Comments |
| Comment by Stephane Thiell [ 13/Feb/19 ] |
|
It's worth noting that shortly before the LBUG, the server was heavily loaded and that looked like |
| Comment by Peter Jones [ 13/Feb/19 ] |
|
Lai Could you please investigate? Thanks Peter |
| Comment by Lai Siyao [ 15/Feb/19 ] |
|
The backtrace shows this file is creating OST objects, so it should not be related with DOM. And the log shows client reconnected, I suspect there is race in lod_declare_update_plain(): one client replay layout_change, but another client may have instantiate component already, however lod_declare_update_plain() only check whether replay request contains init-ed component, but doesn't check whether this file has instantiated component already (by other client). I'll update a patch soon. |
| Comment by Gerrit Updater [ 15/Feb/19 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34265 |
| Comment by Lai Siyao [ 20/Feb/19 ] |
|
The patch above is not correct, Stephane, can you post all processes backtraces in vmcore, I want to see whether there are two clients trying to instantiate the same component. |
| Comment by Stephane Thiell [ 20/Feb/19 ] |
|
Hi Lai, Argh, I just tried to open the vmcore but it looks like is corrupted as I got the following errors. crash: seek error: kernel virtual address: d2085c00d2085c type: "possible" WARNING: cannot read cpu_possible_map crash: page excluded: kernel virtual address: ffffffffc13328d0 type: "present" WARNING: cannot read cpu_present_map crash: seek error: kernel virtual address: ffff8995bee70810 type: "active" WARNING: cannot read cpu_active_map WARNING: cannot read linux_banner string crash: /usr/lib/debug/lib/modules/3 and vmcore do not match! This is unfortunate as it has been the only occurrence of this issue so far. But I will let you know if it happens again. |
| Comment by Stephane Thiell [ 16/Jun/19 ] |
|
The filesystem has been running fine for about 2 weeks until a new occurence of this MDS crash occurred last night. Additionally, it took us several hours this morning to put the filesystem back online after the crash because the MDT recovery was stuck at 0 sec and wouldn't finish. Even a manual abort_recovery didn't work. Only disconnecting all clients resolved the issue. This has already been reported as LU-12360 (and I posted an update there too). So for the MDS LBUG, this time I do have a proper vmcore. First, the LBUG: [Sat Jun 15 00:46:30 2019][1411481.220356] LustreError: 21484:0:(tgt_grant.c:742:tgt_grant_check()) fir-MDT0001: cli e70978d5-4079-f244-8878-d79e9ac6d1e1 claims 155648 GRANT, real grant 0^M [Sat Jun 15 00:46:30 2019][1411481.234534] LustreError: 21484:0:(tgt_grant.c:742:tgt_grant_check()) Skipped 9824 previous similar messages^M [Sat Jun 15 00:49:13 2019][1411644.396226] Lustre: 23817:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x5:0xa:0x0] crosses index zero^M [Sat Jun 15 00:49:13 2019][1411644.407621] Lustre: 23817:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 77 previous similar messages^M [Sat Jun 15 00:56:30 2019][1412081.314545] LustreError: 21403:0:(tgt_grant.c:742:tgt_grant_check()) fir-MDT0001: cli e70978d5-4079-f244-8878-d79e9ac6d1e1 claims 155648 GRANT, real grant 0^M [Sat Jun 15 00:56:30 2019][1412081.328707] LustreError: 21403:0:(tgt_grant.c:742:tgt_grant_check()) Skipped 9658 previous similar messages^M [Sat Jun 15 00:59:14 2019][1412245.465371] Lustre: 23731:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x5:0xa:0x0] crosses index zero^M [Sat Jun 15 00:59:14 2019][1412245.476773] Lustre: 23731:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 52 previous similar messages^M [Sat Jun 15 01:06:30 2019][1412681.380333] LustreError: 73335:0:(tgt_grant.c:742:tgt_grant_check()) fir-MDT0001: cli e70978d5-4079-f244-8878-d79e9ac6d1e1 claims 155648 GRANT, real grant 0^M [Sat Jun 15 01:06:30 2019][1412681.394502] LustreError: 73335:0:(tgt_grant.c:742:tgt_grant_check()) Skipped 9471 previous similar messages^M [Sat Jun 15 01:09:30 2019][1412861.615161] Lustre: 23727:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x5:0xa:0x0] crosses index zero^M [Sat Jun 15 01:09:30 2019][1412861.626555] Lustre: 23727:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 80 previous similar messages^M [Sat Jun 15 01:16:30 2019][1413281.448578] LustreError: 73327:0:(tgt_grant.c:742:tgt_grant_check()) fir-MDT0001: cli e70978d5-4079-f244-8878-d79e9ac6d1e1 claims 155648 GRANT, real grant 0^M [Sat Jun 15 01:16:30 2019][1413281.462751] LustreError: 73327:0:(tgt_grant.c:742:tgt_grant_check()) Skipped 9698 previous similar messages^M [Sat Jun 15 01:19:30 2019][1413461.904197] Lustre: 21344:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x5:0xa:0x0] crosses index zero^M [Sat Jun 15 01:19:31 2019][1413461.915593] Lustre: 21344:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 125 previous similar messages^M [Sat Jun 15 01:19:41 2019][1413472.771519] LustreError: 49210:0:(osp_object.c:1458:osp_declare_create()) ASSERTION( o->opo_reserved == 0 ) failed: ^M [Sat Jun 15 01:19:41 2019][1413472.782226] LustreError: 49210:0:(osp_object.c:1458:osp_declare_create()) LBUG^M [Sat Jun 15 01:19:41 2019][1413472.789629] Pid: 49210, comm: mdt03_089 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018^M [Sat Jun 15 01:19:41 2019][1413472.799538] Call Trace:^M [Sat Jun 15 01:19:41 2019][1413472.802173] [<ffffffffc09587cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]^M [Sat Jun 15 01:19:41 2019][1413472.808916] [<ffffffffc095887c>] lbug_with_loc+0x4c/0xa0 [libcfs]^M [Sat Jun 15 01:19:41 2019][1413472.815311] [<ffffffffc154ecc5>] osp_declare_create+0x5a5/0x5b0 [osp]^M [Sat Jun 15 01:19:41 2019][1413472.822046] [<ffffffffc149639f>] lod_sub_declare_create+0xdf/0x210 [lod]^M [Sat Jun 15 01:19:41 2019][1413472.829065] [<ffffffffc1495904>] lod_qos_prep_create+0x15d4/0x1890 [lod]^M [Sat Jun 15 01:19:41 2019][1413472.836069] [<ffffffffc1476bba>] lod_declare_instantiate_components+0x9a/0x1d0 [lod]^M [Sat Jun 15 01:19:41 2019][1413472.844103] [<ffffffffc14894d5>] lod_declare_layout_change+0xb65/0x10f0 [lod]^M [Sat Jun 15 01:19:41 2019][1413472.851557] [<ffffffffc14fb102>] mdd_declare_layout_change+0x62/0x120 [mdd]^M [Sat Jun 15 01:19:41 2019][1413472.858838] [<ffffffffc1503e52>] mdd_layout_change+0x882/0x1000 [mdd]^M [Sat Jun 15 01:19:41 2019][1413472.865569] [<ffffffffc136a327>] mdt_layout_change+0x337/0x430 [mdt]^M [Sat Jun 15 01:19:41 2019][1413472.872234] [<ffffffffc137243e>] mdt_intent_layout+0x7ee/0xcc0 [mdt]^M [Sat Jun 15 01:19:41 2019][1413472.878907] [<ffffffffc136fa28>] mdt_intent_policy+0x2e8/0xd00 [mdt]^M [Sat Jun 15 01:19:41 2019][1413472.885563] [<ffffffffc0cc6d46>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]^M [Sat Jun 15 01:19:41 2019][1413472.892512] [<ffffffffc0cef707>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]^M [Sat Jun 15 01:19:41 2019][1413472.899795] [<ffffffffc0d766e2>] tgt_enqueue+0x62/0x210 [ptlrpc]^M [Sat Jun 15 01:19:42 2019][1413472.906131] [<ffffffffc0d7d73a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]^M [Sat Jun 15 01:19:42 2019][1413472.913245] [<ffffffffc0d21d0b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]^M [Sat Jun 15 01:19:42 2019][1413472.921141] [<ffffffffc0d2563c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]^M [Sat Jun 15 01:19:42 2019][1413472.927653] [<ffffffff950c1c31>] kthread+0xd1/0xe0^M [Sat Jun 15 01:19:42 2019][1413472.932747] [<ffffffff95774c24>] ret_from_fork_nospec_begin+0xe/0x21^M [Sat Jun 15 01:19:42 2019][1413472.939394] [<ffffffffffffffff>] 0xffffffffffffffff^M [Sat Jun 15 01:19:42 2019][1413472.944595] Kernel panic - not syncing: LBUG^M Files provided for this occurrence:
|
| Comment by Lai Siyao [ 20/Jun/19 ] |
|
Thanks, I found layout_change replay is different from normal create replay: it reads on-disk LOV ea to decide whether object is initialized upon replay, however this LOV ea may be stale and inconsistent with the object stripes in memory (initialized from LOV in replay request), that is, on-disk LOV reveals some stripe is not initialized, but this stripe has been initialized in memory, so it triggers this assert. if (!replay) {
if (lod_comp_inited(lod_comp))
continue;
} else {
/**
* In replay path, lod_comp is the EA passed by
* client replay buffer, comp_v1 is the pre-recovery
* on-disk EA, we'd sift out those components which
* were init-ed in the on-disk EA.
*/
if (le32_to_cpu(comp_v1->lcm_entries[i].lcme_flags) &
LCME_FL_INIT)
continue;
}
IMO layout_change can be replayed the same as normal create, i.e., use LOV in replay request to create stripes, and never read on-disk LOV which may be stale. I'll write a patch and verify this. |
| Comment by Stephane Thiell [ 20/Jun/19 ] |
|
It's great that you found this! Thanks for the explanation! |
| Comment by Gerrit Updater [ 08/Jul/19 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35439 |
| Comment by Lai Siyao [ 11/Jul/19 ] |
|
I didn't find what race triggered this assert, but I find layout_change is not handled the same as other reint operations, so I'm afraid there are hidden race in layout_change resent/replay. I'll commit a patch to reint layout_change in standard way. |
| Comment by Gerrit Updater [ 11/Jul/19 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35465 |
| Comment by Gerrit Updater [ 04/Oct/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35465/ |
| Comment by Peter Jones [ 04/Oct/19 ] |
|
Landed for 2.13 |
| Comment by Gerrit Updater [ 04/Oct/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36376 |
| Comment by Gerrit Updater [ 21/Nov/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36376/ |