[LU-11967] MDS LBUG ASSERTION( o->opo_reserved == 0 ) failed Created: 12/Feb/19  Updated: 21/Nov/19  Resolved: 04/Oct/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.13.0, Lustre 2.12.4

Type: Bug Priority: Critical
Reporter: Stephane Thiell Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS 7.6, Kernel 3.10.0-957.1.3.el7_lustre.x86_64, all clients are 2.12.0


Attachments: Text File fir-md1-s1-vmcore-dmesg.txt     Text File fir-md1-s2-crash-foreach-bt-2019-06-15-01-19-53.log     Text File fir-md1-s2-crash-ps-2019-06-15-01-19-53.log     Text File vmcore-dmesg-fir-md1-s2-2019-06-15-01-19-53.txt    
Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We just hit the following MDS crash on Fir (2.12), server fir-md1-s1:

[497493.075367] Lustre: fir-MDT0000: Client 691c85d2-0e39-9e6d-1bfd-ecbaccae5366 (at 10.8.2.27@o2ib6) reconnecting
[497594.956880] LustreError: 12324:0:(osp_object.c:1458:osp_declare_create()) ASSERTION( o->opo_reserved == 0 ) failed: 
[497594.967490] LustreError: 12324:0:(osp_object.c:1458:osp_declare_create()) LBUG
[497594.974807] Pid: 12324, comm: mdt01_074 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018
[497594.984636] Call Trace:
[497594.987187]  [<ffffffffc0c5e7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[497594.993859]  [<ffffffffc0c5e87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[497595.000177]  [<ffffffffc17cdcc5>] osp_declare_create+0x5a5/0x5b0 [osp]
[497595.006833]  [<ffffffffc171539f>] lod_sub_declare_create+0xdf/0x210 [lod]
[497595.013748]  [<ffffffffc1714904>] lod_qos_prep_create+0x15d4/0x1890 [lod]
[497595.020662]  [<ffffffffc16f5bba>] lod_declare_instantiate_components+0x9a/0x1d0 [lod]
[497595.028614]  [<ffffffffc17084d5>] lod_declare_layout_change+0xb65/0x10f0 [lod]
[497595.035988]  [<ffffffffc177a102>] mdd_declare_layout_change+0x62/0x120 [mdd]
[497595.043172]  [<ffffffffc1782e52>] mdd_layout_change+0x882/0x1000 [mdd]
[497595.049830]  [<ffffffffc15ea317>] mdt_layout_change+0x337/0x430 [mdt]
[497595.056398]  [<ffffffffc15f242e>] mdt_intent_layout+0x7ee/0xcc0 [mdt]
[497595.062968]  [<ffffffffc15efa18>] mdt_intent_policy+0x2e8/0xd00 [mdt]
[497595.069549]  [<ffffffffc0f41ec6>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]
[497595.076400]  [<ffffffffc0f6a8a7>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]
[497595.083597]  [<ffffffffc0ff1302>] tgt_enqueue+0x62/0x210 [ptlrpc]
[497595.089851]  [<ffffffffc0ff835a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[497595.096881]  [<ffffffffc0f9c92b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[497595.104679]  [<ffffffffc0fa025c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]
[497595.111095]  [<ffffffff9d6c1c31>] kthread+0xd1/0xe0
[497595.116096]  [<ffffffff9dd74c24>] ret_from_fork_nospec_begin+0xe/0x21
[497595.122657]  [<ffffffffffffffff>] 0xffffffffffffffff
[497595.127761] Kernel panic - not syncing: LBUG
[497595.132122] CPU: 41 PID: 12324 Comm: mdt01_074 Kdump: loaded Tainted: G           OEL ------------   3.10.0-957.1.3.el7_lustre.x86_64 #1
[497595.144451] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018
[497595.152106] Call Trace:
[497595.154649]  [<ffffffff9dd61e41>] dump_stack+0x19/0x1b
[497595.159882]  [<ffffffff9dd5b550>] panic+0xe8/0x21f
[497595.164763]  [<ffffffffc0c5e8cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
[497595.171040]  [<ffffffffc17cdcc5>] osp_declare_create+0x5a5/0x5b0 [osp]
[497595.177668]  [<ffffffffc171539f>] lod_sub_declare_create+0xdf/0x210 [lod]
[497595.184541]  [<ffffffff9d994d0d>] ? list_del+0xd/0x30
[497595.189693]  [<ffffffffc1714904>] lod_qos_prep_create+0x15d4/0x1890 [lod]
[497595.196569]  [<ffffffff9d81a849>] ? ___slab_alloc+0x209/0x4f0
[497595.202421]  [<ffffffffc0d87f7b>] ? class_handle_hash+0xab/0x2f0 [obdclass]
[497595.209474]  [<ffffffff9d6d67b0>] ? wake_up_state+0x20/0x20
[497595.215152]  [<ffffffffc0da7138>] ? lu_buf_alloc+0x48/0x320 [obdclass]
[497595.221803]  [<ffffffffc0f5be0d>] ? ldlm_cli_enqueue_local+0x27d/0x870 [ptlrpc]
[497595.229208]  [<ffffffffc16f5bba>] lod_declare_instantiate_components+0x9a/0x1d0 [lod]
[497595.237131]  [<ffffffffc17084d5>] lod_declare_layout_change+0xb65/0x10f0 [lod]
[497595.244442]  [<ffffffffc177a102>] mdd_declare_layout_change+0x62/0x120 [mdd]
[497595.251584]  [<ffffffffc1782e52>] mdd_layout_change+0x882/0x1000 [mdd]
[497595.258213]  [<ffffffffc15e9b30>] ? mdt_object_lock_internal+0x70/0x3e0 [mdt]
[497595.265444]  [<ffffffffc15ea317>] mdt_layout_change+0x337/0x430 [mdt]
[497595.271978]  [<ffffffffc15f242e>] mdt_intent_layout+0x7ee/0xcc0 [mdt]
[497595.278543]  [<ffffffffc0f8e2f7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc]
[497595.285083]  [<ffffffffc15efa18>] mdt_intent_policy+0x2e8/0xd00 [mdt]
[497595.291637]  [<ffffffffc0f40524>] ? ldlm_lock_create+0xa4/0xa40 [ptlrpc]
[497595.298442]  [<ffffffffc15f1c40>] ? mdt_intent_open+0x350/0x350 [mdt]
[497595.304999]  [<ffffffffc0f41ec6>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]
[497595.311794]  [<ffffffffc0c69fa3>] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs]
[497595.319018]  [<ffffffffc0c6d72e>] ? cfs_hash_add+0xbe/0x1a0 [libcfs]
[497595.325490]  [<ffffffffc0f6a8a7>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]
[497595.332662]  [<ffffffffc0f927f0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
[497595.340268]  [<ffffffffc0ff1302>] tgt_enqueue+0x62/0x210 [ptlrpc]
[497595.346488]  [<ffffffffc0ff835a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[497595.353481]  [<ffffffffc0fd1a51>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
[497595.361139]  [<ffffffffc0c5ebde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
[497595.368309]  [<ffffffffc0f9c92b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[497595.376083]  [<ffffffffc0f997b5>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
[497595.382959]  [<ffffffff9d6d67c2>] ? default_wake_function+0x12/0x20
[497595.389311]  [<ffffffff9d6cba9b>] ? __wake_up_common+0x5b/0x90
[497595.395263]  [<ffffffffc0fa025c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]
[497595.401650]  [<ffffffffc0f9f760>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
[497595.409132]  [<ffffffff9d6c1c31>] kthread+0xd1/0xe0
[497595.414097]  [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40
[497595.420279]  [<ffffffff9dd74c24>] ret_from_fork_nospec_begin+0xe/0x21
[497595.426803]  [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40

I do have a vmcore.

Fir has 2 MDS, fir-md1-s1 with MDT0 and MDT2 and fir-md1-s2 with MDT1 and MDT3.

 

DOM, PFL are enabled and used.

 

Please let me know if you have any idea how to avoid this.

Thanks
Stephane



 Comments   
Comment by Stephane Thiell [ 13/Feb/19 ]

It's worth noting that shortly before the LBUG, the server was heavily loaded and that looked like LU-11964. We always have the option of temporarily disabling DOM for new files if that becomes a problem, let me know what you think. Thanks much.

Comment by Peter Jones [ 13/Feb/19 ]

Lai

Could you please investigate?

Thanks

Peter

Comment by Lai Siyao [ 15/Feb/19 ]

The backtrace shows this file is creating OST objects, so it should not be related with DOM. And the log shows client reconnected, I suspect there is race in lod_declare_update_plain(): one client replay layout_change, but another client may have instantiate component already, however lod_declare_update_plain() only check whether replay request contains init-ed component, but doesn't check whether this file has instantiated component already (by other client). I'll update a patch soon.

Comment by Gerrit Updater [ 15/Feb/19 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34265
Subject: LU-11967 lod: check comp inited for layout_change replay
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: aa37c3f76c432ee8a5eaf81a689c4a2390744388

Comment by Lai Siyao [ 20/Feb/19 ]

The patch above is not correct, Stephane, can you post all processes backtraces in vmcore, I want to see whether there are two clients trying to instantiate the same component.

Comment by Stephane Thiell [ 20/Feb/19 ]

Hi Lai,

Argh, I just tried to open the vmcore but it looks like is corrupted as I got the following errors.

crash: seek error: kernel virtual address: d2085c00d2085c  type: "possible"
WARNING: cannot read cpu_possible_map
crash: page excluded: kernel virtual address: ffffffffc13328d0  type: "present"
WARNING: cannot read cpu_present_map
crash: seek error: kernel virtual address: ffff8995bee70810  type: "active"
WARNING: cannot read cpu_active_map
WARNING: cannot read linux_banner string
crash: /usr/lib/debug/lib/modules/3 and vmcore do not match!

This is unfortunate as it has been the only occurrence of this issue so far. But I will let you know if it happens again.

Comment by Stephane Thiell [ 16/Jun/19 ]

The filesystem has been running fine for about 2 weeks until a new occurence of this MDS crash occurred last night.  Additionally, it took us several hours this morning to put the filesystem back online after the crash because the MDT recovery was stuck at 0 sec and wouldn't finish. Even a manual abort_recovery didn't work. Only disconnecting all clients resolved the issue. This has already been reported as LU-12360 (and I posted an update there too).

So for the MDS LBUG, this time I do have a proper vmcore. First, the LBUG:

[Sat Jun 15 00:46:30 2019][1411481.220356] LustreError: 21484:0:(tgt_grant.c:742:tgt_grant_check()) fir-MDT0001: cli e70978d5-4079-f244-8878-d79e9ac6d1e1 claims 155648 GRANT, real grant 0^M
[Sat Jun 15 00:46:30 2019][1411481.234534] LustreError: 21484:0:(tgt_grant.c:742:tgt_grant_check()) Skipped 9824 previous similar messages^M
[Sat Jun 15 00:49:13 2019][1411644.396226] Lustre: 23817:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x5:0xa:0x0] crosses index zero^M
[Sat Jun 15 00:49:13 2019][1411644.407621] Lustre: 23817:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 77 previous similar messages^M
[Sat Jun 15 00:56:30 2019][1412081.314545] LustreError: 21403:0:(tgt_grant.c:742:tgt_grant_check()) fir-MDT0001: cli e70978d5-4079-f244-8878-d79e9ac6d1e1 claims 155648 GRANT, real grant 0^M
[Sat Jun 15 00:56:30 2019][1412081.328707] LustreError: 21403:0:(tgt_grant.c:742:tgt_grant_check()) Skipped 9658 previous similar messages^M
[Sat Jun 15 00:59:14 2019][1412245.465371] Lustre: 23731:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x5:0xa:0x0] crosses index zero^M
[Sat Jun 15 00:59:14 2019][1412245.476773] Lustre: 23731:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 52 previous similar messages^M
[Sat Jun 15 01:06:30 2019][1412681.380333] LustreError: 73335:0:(tgt_grant.c:742:tgt_grant_check()) fir-MDT0001: cli e70978d5-4079-f244-8878-d79e9ac6d1e1 claims 155648 GRANT, real grant 0^M
[Sat Jun 15 01:06:30 2019][1412681.394502] LustreError: 73335:0:(tgt_grant.c:742:tgt_grant_check()) Skipped 9471 previous similar messages^M
[Sat Jun 15 01:09:30 2019][1412861.615161] Lustre: 23727:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x5:0xa:0x0] crosses index zero^M
[Sat Jun 15 01:09:30 2019][1412861.626555] Lustre: 23727:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 80 previous similar messages^M
[Sat Jun 15 01:16:30 2019][1413281.448578] LustreError: 73327:0:(tgt_grant.c:742:tgt_grant_check()) fir-MDT0001: cli e70978d5-4079-f244-8878-d79e9ac6d1e1 claims 155648 GRANT, real grant 0^M
[Sat Jun 15 01:16:30 2019][1413281.462751] LustreError: 73327:0:(tgt_grant.c:742:tgt_grant_check()) Skipped 9698 previous similar messages^M
[Sat Jun 15 01:19:30 2019][1413461.904197] Lustre: 21344:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x5:0xa:0x0] crosses index zero^M
[Sat Jun 15 01:19:31 2019][1413461.915593] Lustre: 21344:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 125 previous similar messages^M
[Sat Jun 15 01:19:41 2019][1413472.771519] LustreError: 49210:0:(osp_object.c:1458:osp_declare_create()) ASSERTION( o->opo_reserved == 0 ) failed: ^M
[Sat Jun 15 01:19:41 2019][1413472.782226] LustreError: 49210:0:(osp_object.c:1458:osp_declare_create()) LBUG^M
[Sat Jun 15 01:19:41 2019][1413472.789629] Pid: 49210, comm: mdt03_089 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018^M
[Sat Jun 15 01:19:41 2019][1413472.799538] Call Trace:^M
[Sat Jun 15 01:19:41 2019][1413472.802173]  [<ffffffffc09587cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]^M
[Sat Jun 15 01:19:41 2019][1413472.808916]  [<ffffffffc095887c>] lbug_with_loc+0x4c/0xa0 [libcfs]^M
[Sat Jun 15 01:19:41 2019][1413472.815311]  [<ffffffffc154ecc5>] osp_declare_create+0x5a5/0x5b0 [osp]^M
[Sat Jun 15 01:19:41 2019][1413472.822046]  [<ffffffffc149639f>] lod_sub_declare_create+0xdf/0x210 [lod]^M
[Sat Jun 15 01:19:41 2019][1413472.829065]  [<ffffffffc1495904>] lod_qos_prep_create+0x15d4/0x1890 [lod]^M
[Sat Jun 15 01:19:41 2019][1413472.836069]  [<ffffffffc1476bba>] lod_declare_instantiate_components+0x9a/0x1d0 [lod]^M
[Sat Jun 15 01:19:41 2019][1413472.844103]  [<ffffffffc14894d5>] lod_declare_layout_change+0xb65/0x10f0 [lod]^M
[Sat Jun 15 01:19:41 2019][1413472.851557]  [<ffffffffc14fb102>] mdd_declare_layout_change+0x62/0x120 [mdd]^M
[Sat Jun 15 01:19:41 2019][1413472.858838]  [<ffffffffc1503e52>] mdd_layout_change+0x882/0x1000 [mdd]^M
[Sat Jun 15 01:19:41 2019][1413472.865569]  [<ffffffffc136a327>] mdt_layout_change+0x337/0x430 [mdt]^M
[Sat Jun 15 01:19:41 2019][1413472.872234]  [<ffffffffc137243e>] mdt_intent_layout+0x7ee/0xcc0 [mdt]^M
[Sat Jun 15 01:19:41 2019][1413472.878907]  [<ffffffffc136fa28>] mdt_intent_policy+0x2e8/0xd00 [mdt]^M
[Sat Jun 15 01:19:41 2019][1413472.885563]  [<ffffffffc0cc6d46>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]^M
[Sat Jun 15 01:19:41 2019][1413472.892512]  [<ffffffffc0cef707>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]^M
[Sat Jun 15 01:19:41 2019][1413472.899795]  [<ffffffffc0d766e2>] tgt_enqueue+0x62/0x210 [ptlrpc]^M
[Sat Jun 15 01:19:42 2019][1413472.906131]  [<ffffffffc0d7d73a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]^M
[Sat Jun 15 01:19:42 2019][1413472.913245]  [<ffffffffc0d21d0b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]^M
[Sat Jun 15 01:19:42 2019][1413472.921141]  [<ffffffffc0d2563c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]^M
[Sat Jun 15 01:19:42 2019][1413472.927653]  [<ffffffff950c1c31>] kthread+0xd1/0xe0^M
[Sat Jun 15 01:19:42 2019][1413472.932747]  [<ffffffff95774c24>] ret_from_fork_nospec_begin+0xe/0x21^M
[Sat Jun 15 01:19:42 2019][1413472.939394]  [<ffffffffffffffff>] 0xffffffffffffffff^M
[Sat Jun 15 01:19:42 2019][1413472.944595] Kernel panic - not syncing: LBUG^M

 Files provided for this occurrence:

Comment by Lai Siyao [ 20/Jun/19 ]

Thanks, I found layout_change replay is different from normal create replay: it reads on-disk LOV ea to decide whether object is initialized upon replay, however this LOV ea may be stale and inconsistent with the object stripes in memory (initialized from LOV in replay request), that is, on-disk LOV reveals some stripe is not initialized, but this stripe has been initialized in memory, so it triggers this assert.

                if (!replay) {
                        if (lod_comp_inited(lod_comp))
                                continue;
                } else {
                        /**
                         * In replay path, lod_comp is the EA passed by
                         * client replay buffer,  comp_v1 is the pre-recovery
                         * on-disk EA, we'd sift out those components which
                         * were init-ed in the on-disk EA.
                         */
                        if (le32_to_cpu(comp_v1->lcm_entries[i].lcme_flags) &
                            LCME_FL_INIT)
                                continue;
                }

IMO layout_change can be replayed the same as normal create, i.e., use LOV in replay request to create stripes, and never read on-disk LOV which may be stale. I'll write a patch and verify this.

Comment by Stephane Thiell [ 20/Jun/19 ]

It's great that you found this! Thanks for the explanation!

Comment by Gerrit Updater [ 08/Jul/19 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35439
Subject: LU-11967 obdclass: add barrier() before checking dying object
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 26c774c44d09be0a03c079d5c707800ffc79932d

Comment by Lai Siyao [ 11/Jul/19 ]

I didn't find what race triggered this assert, but I find layout_change is not handled the same as other reint operations, so I'm afraid there are hidden race in layout_change resent/replay. I'll commit a patch to reint layout_change in standard way.

Comment by Gerrit Updater [ 11/Jul/19 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35465
Subject: LU-11967 mdt: reint layout_change in standard way
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bce42b9cebb70f99d4e90bca8e6c99c4e7290794

Comment by Gerrit Updater [ 04/Oct/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35465/
Subject: LU-11967 mdt: reint layout_change in standard way
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e1bebbdf53c8490a3be35793070f45f1a68721b1

Comment by Peter Jones [ 04/Oct/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 04/Oct/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36376
Subject: LU-11967 mdt: reint layout_change in standard way
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: cf47eb6b28174d280b82154981ff15b80517a8c9

Comment by Gerrit Updater [ 21/Nov/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36376/
Subject: LU-11967 mdt: reint layout_change in standard way
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 6a806133a3a53987dbd9c207e0ed82dcd4035bbd

Generated at Sat Feb 10 02:48:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.