[LU-11786] 6752:0:(osd_handler.c:2308:osd_read_lock()) ASSERTION( obj->oo_owner == ((void *)0) ) failed: Created: 15/Dec/18 Updated: 09/Jun/20 Resolved: 09/Jun/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Mahmoud Hanafi | Assignee: | Yang Sheng |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
kernel: 3.10.0-693.21.1.el7 |
||
| Issue Links: |
|
||||||||||||
| Severity: | 2 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
MDS hit LBUG. [2470859.924802] LustreError: 6752:0:(osd_handler.c:2308:osd_read_lock()) ASSERTION( obj->oo_owner == ((void *)0) ) failed: [2470859.957599] LustreError: 6752:0:(osd_handler.c:2308:osd_read_lock()) LBUG [2470859.978428] Pid: 6752, comm: mdt00_055 3.10.0-693.21.1.el7.20180508.x86_64.lustre2105 #1 SMP Mon Aug 27 23:04:41 UTC 2018 [2470859.978429] Call Trace: [2470859.978443] [<ffffffff8103a1f2>] save_stack_trace_tsk+0x22/0x40 [2470859.996970] [<ffffffffa08a27cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [2470859.996975] [<ffffffffa08a287c>] lbug_with_loc+0x4c/0xa0 [libcfs] [2470859.996987] [<ffffffffa1040a2a>] osd_read_lock+0xda/0xe0 [osd_ldiskfs] [2470859.996997] [<ffffffffa12d61ca>] lod_read_lock+0x3a/0xd0 [lod] [2470859.997006] [<ffffffffa134e8ea>] mdd_read_lock+0x3a/0xd0 [mdd] [2470859.997016] [<ffffffffa135183c>] mdd_xattr_get+0x6c/0x390 [mdd] [2470859.997032] [<ffffffffa11f23fd>] mdt_stripe_get+0xcd/0x3c0 [mdt] [2470859.997043] [<ffffffffa11f29e9>] mdt_attr_get_complex+0x2f9/0xb10 [mdt] [2470859.997055] [<ffffffffa1219e74>] mdt_reint_open+0x1374/0x3190 [mdt] [2470859.997067] [<ffffffffa120faf3>] mdt_reint_rec+0x83/0x210 [mdt] [2470859.997076] [<ffffffffa11f133b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] [2470859.997084] [<ffffffffa11f1862>] mdt_intent_reint+0x162/0x430 [mdt] [2470859.997093] [<ffffffffa11fc631>] mdt_intent_policy+0x441/0xc70 [mdt] [2470859.997138] [<ffffffffa0bab2ba>] ldlm_lock_enqueue+0x38a/0x980 [ptlrpc] [2470859.997170] [<ffffffffa0bd4b53>] ldlm_handle_enqueue0+0x9d3/0x16a0 [ptlrpc] [2470859.997214] [<ffffffffa0c5a262>] tgt_enqueue+0x62/0x210 [ptlrpc] [2470859.997253] [<ffffffffa0c5deca>] tgt_request_handle+0x92a/0x1370 [ptlrpc] [2470859.997286] [<ffffffffa0c064bb>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] [2470859.997318] [<ffffffffa0c0a4a2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [2470859.997324] [<ffffffff810b1131>] kthread+0xd1/0xe0 [2470859.997327] [<ffffffff816a14f7>] ret_from_fork+0x77/0xb0 [2470859.997345] [<ffffffffffffffff>] 0xffffffffffffffff We have the crash dump if need. |
| Comments |
| Comment by Peter Jones [ 16/Dec/18 ] |
|
Yang Sheng Is this a duplicate of Peter |
| Comment by Yang Sheng [ 17/Dec/18 ] |
|
Yes, It is exactly same issue. |
| Comment by Mahmoud Hanafi [ 17/Dec/18 ] |
|
We just hit the same LBUG on a different filesystem. This is become a critical issue for us. |
| Comment by Yang Sheng [ 17/Dec/18 ] |
|
Hi, Mahmoud, Could you help to carry on a debug patch to reproduce this issue? Thanks, |
| Comment by Mahmoud Hanafi [ 17/Dec/18 ] |
|
We don't have a specific reproducer but are willing to install a debug patch. Do you think there is any info in the crash dump that can help? This was a older filesytem (2.4 format) that was updated to 2.7.3 and now 2.10.5 |
| Comment by Gerrit Updater [ 18/Dec/18 ] |
|
Yang Sheng (ys@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33885 |
| Comment by Yang Sheng [ 18/Dec/18 ] |
|
Hi, Mahmoud, Could you please apply this patch and provide vmcore after hit this issue. Does this issue appear frequently? Does it be hit on after update to 693 kernel? Thanks, |
| Comment by Mahmoud Hanafi [ 18/Dec/18 ] |
|
We have only seen it 2 times on our large filesystem since update to cent7/Lustre2.10.5. We updated Lustre 2.10.5 at the end of Oct. We will try out the debug patch.
|
| Comment by Mahmoud Hanafi [ 13/Feb/19 ] |
|
We finally got a crash again on this bug. I can upload the vmcore but it would have to viewed by US citizen only. [1594255.296920] LustreError: 12134:0:(osd_handler.c:2309:osd_read_lock()) ASSERTION( obj->oo_owner == NULL ) failed: owner=ffff883eb1f787c0, obj=ffff8811b9838b00 [1594255.296922] LustreError: 10199:0:(osd_handler.c:2309:osd_read_lock()) ASSERTION( obj->oo_owner == NULL ) failed: owner=ffff883eb1f787c0, obj=ffff8811b9838b00 [1594255.296926] LustreError: 10199:0:(osd_handler.c:2309:osd_read_lock()) LBUG crash> struct osd_object ffff8811b9838b00 , }, do_ops = 0xffffffffa1046f60 <osd_obj_ops>, do_body_ops = 0xffffffffa1048b20 <osd_body_ops>, do_index_ops = 0x0 }, oo_inode = 0xffff880db0057708, oo_hl_head = 0xffff881429684000, oo_ext_idx_sem = { { count = { counter = 0 }, __UNIQUE_ID_rh_kabi_hide2 = { count = 0 }, {<No data fields>} }, } }, osq = { tail = { counter = 0 } }, wait_list = { next = 0xffff8811b9838b60 }, owner = 0x0 }, oo_sem = { { count = { counter = -4294967294 }, __UNIQUE_ID_rh_kabi_hide2 = { count = -4294967294 }, {<No data fields>} }, wait_lock = { raw_lock = { val = { counter = 0 } } }, , } } |
| Comment by Peter Jones [ 14/Feb/19 ] |
|
Mahmoud We actually got a crash dump from a site without restrictions around citizenship a couple of days ago and so there is no immediate need to investigate your crash dump. We're working out next steps ATM. Peter |
| Comment by Yang Sheng [ 15/Feb/19 ] |
|
Hi, Mahmoud, Could you please tell me your cpu type? Like Haswell or other. Please ignore this if it is sensitive infomation. Thanks, |
| Comment by Mahmoud Hanafi [ 20/Feb/19 ] |
|
The servers are E5-2670 SandyBridge
|
| Comment by Taizeng Wu [ 12/Apr/19 ] |
|
We have the same problem. Environment:
Dmesg: [394034.327556] LustreError: 251105:0:(osd_handler.c:2294:osd_read_lock()) ASSERTION( obj->oo_owner == ((void *)0) ) failed: |
| Comment by Peter Jones [ 11/Jun/19 ] |
|
Red Hat issue |
| Comment by Mahmoud Hanafi [ 26/Jul/19 ] |
|
We hit this issue with kernel-3.10.0-957.21.3.el7.x86_64. So It doesn't appear to be fixed in the latest Cent7.6 kernel. |
| Comment by Mahmoud Hanafi [ 18/Feb/20 ] |
|
We hit this again with 3.10.0-957.21.3.el7 and lustre2.12.2 |
| Comment by Xiao Zhenggang [ 02/Apr/20 ] |
|
We hit this with 3.10.0-957.10.1.el7_lustre.x86_64 and lustre 2.12.2 last night |
| Comment by Andreas Dilger [ 09/Jun/20 ] |
|
This is fixed in RHEL7.7 and later kernels, see |