[LU-10678] LBUG: osd_handler.c:2353:osd_read_lock()) ASSERTION( obj->oo_owner == ((void *)0) ) failed: Created: 16/Feb/18 Updated: 27/Oct/21 Resolved: 11/Jun/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Cliff White (Inactive) | Assignee: | Yang Sheng |
| Resolution: | Not a Bug | Votes: | 1 |
| Labels: | soak | ||
| Environment: |
Soak stress cluster - Lustre version=2.10.57_58_gf24340c. |
||
| Issue Links: |
|
||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
Soak MDT was in normal operation. Sudden LBUG Feb 16 09:28:39 soak-8 kernel: LustreError: 2688:0:(osd_handler.c:2353:osd_read_lock()) ASSERTION( obj->oo_owner == ((void *)0) ) failed: Feb 16 09:28:39 soak-8 kernel: LustreError: 2688:0:(osd_handler.c:2353:osd_read_lock()) LBUG Feb 16 09:28:39 soak-8 kernel: Pid: 2688, comm: mdt00_028 Feb 16 09:28:39 soak-8 kernel: #012Call Trace: Feb 16 09:28:39 soak-8 kernel: [<ffffffffc0dbc7ae>] libcfs_call_trace+0x4e/0x60 [libcfs] Feb 16 09:28:39 soak-8 kernel: [<ffffffffc0dbc83c>] lbug_with_loc+0x4c/0xb0 [libcfs] Feb 16 09:28:39 soak-8 kernel: [<ffffffffc140599a>] osd_read_lock+0xda/0xe0 [osd_ldiskfs] Feb 16 09:28:40 soak-8 kernel: [<ffffffffc1691287>] lod_read_lock+0x37/0xd0 [lod] Feb 16 09:28:40 soak-8 kernel: [<ffffffffc17125c7>] mdd_read_lock+0x37/0xd0 [mdd] Feb 16 09:28:40 soak-8 kernel: [<ffffffffc1715dcc>] mdd_xattr_get+0x6c/0x390 [mdd] Feb 16 09:28:40 soak-8 kernel: [<ffffffffc1593c3f>] mdt_pack_acl2body+0x1af/0x800 [mdt] Feb 16 09:28:40 soak-8 kernel: [<ffffffffc15beaf9>] mdt_finish_open+0x289/0x690 [mdt] Feb 16 09:28:40 soak-8 kernel: [<ffffffffc15c120b>] mdt_reint_open+0x230b/0x3260 [mdt] Feb 16 09:28:40 soak-8 kernel: [<ffffffffc0f27d2e>] ? upcall_cache_get_entry+0x20e/0x8f0 [obdclass] Feb 16 09:28:40 soak-8 kernel: [<ffffffffc15a4b43>] ? ucred_set_jobid+0x53/0x70 [mdt] Feb 16 09:28:40 soak-8 kernel: [<ffffffffc15b5400>] mdt_reint_rec+0x80/0x210 [mdt] Feb 16 09:28:40 soak-8 kernel: [<ffffffffc1594f8b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] Feb 16 09:28:40 soak-8 kernel: [<ffffffffc15a1437>] mdt_intent_reint+0x157/0x420 [mdt] Feb 16 09:28:40 soak-8 kernel: [<ffffffffc15980b2>] mdt_intent_opc+0x442/0xad0 [mdt] Feb 16 09:28:41 soak-8 kernel: [<ffffffffc1144470>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc] Feb 16 09:28:41 soak-8 kernel: [<ffffffffc159fc63>] mdt_intent_policy+0x1a3/0x360 [mdt] Feb 16 09:28:41 soak-8 kernel: [<ffffffffc10f4202>] ldlm_lock_enqueue+0x382/0x8f0 [ptlrpc] Feb 16 09:28:41 soak-8 kernel: [<ffffffffc111c753>] ldlm_handle_enqueue0+0x8f3/0x13e0 [ptlrpc] Feb 16 09:28:41 soak-8 kernel: [<ffffffffc11444f0>] ? lustre_swab_ldlm_request+0x0/0x30 [ptlrpc] Feb 16 09:28:41 soak-8 kernel: [<ffffffffc11a2202>] tgt_enqueue+0x62/0x210 [ptlrpc] Feb 16 09:28:41 soak-8 kernel: [<ffffffffc11aa405>] tgt_request_handle+0x925/0x13b0 [ptlrpc] Feb 16 09:28:41 soak-8 kernel: [<ffffffffc114e58e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc] Feb 16 09:28:41 soak-8 kernel: [<ffffffffc114b448>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] Feb 16 09:28:41 soak-8 kernel: [<ffffffff810c6440>] ? default_wake_function+0x0/0x20 Feb 16 09:28:41 soak-8 kernel: [<ffffffffc1151d42>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] Feb 16 09:28:41 soak-8 kernel: [<ffffffffc11512b0>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc] Feb 16 09:28:41 soak-8 kernel: [<ffffffff810b252f>] kthread+0xcf/0xe0 Feb 16 09:28:42 soak-8 kernel: [<ffffffff810b2460>] ? kthread+0x0/0xe0 Feb 16 09:28:42 soak-8 kernel: [<ffffffff816b8798>] ret_from_fork+0x58/0x90 Feb 16 09:28:42 soak-8 kernel: [<ffffffff810b2460>] ? kthread+0x0/0xe0 Feb 16 09:28:42 soak-8 kernel: |
| Comments |
| Comment by Peter Jones [ 16/Feb/18 ] |
|
Yang Sheng Can you please advise Peter |
| Comment by Sebastien Piechurski [ 22/Feb/18 ] |
|
Hi, We have seen this a couple of times on 2.7.21.2 recently. The crashdump collection has failed on the first occurence, and I am waiting for confirmation about the second occurence. Would you be interested in a dump if we get one ? |
| Comment by Yang Sheng [ 22/Feb/18 ] |
|
Hi, Sebastien, This is helpful if got a crash dump. TIA. Thanks, |
| Comment by Sebastien Piechurski [ 23/Feb/18 ] |
|
Unfortunately, the dump collection failed because the crashkernel=auto parameter does not reserve enough memory for our configuration. I have requested this to be adjusted. Let's hope we can get a dump at next crash.
Regards,
Sebastien. |
| Comment by Johann Peyrard (Inactive) [ 22/Oct/18 ] |
|
Hi, I have this LBUG on server running : lustre-el7.3-2.7.21.3-255.ddn20.g10dd357.el7.x86_64 [75345.251740] LustreError: 7714:0:(osd_handler.c:1751:osd_object_read_lock()) ASSERTION( obj->oo_owner == ((void *)0) ) failed: Do I need to open a Jira ticket for this one, or we can use this one ? Seem to be similar, but I preffer to ask. I will try to get the crash file this week and the whole dmesg.
Regards, Johann |
| Comment by Lixin Liu [ 17/Nov/18 ] |
|
We have a few similar crashes on version 2.10.1, kernel 3.10.0-693.2.2.el7_lustre.x86_64. We have an incomplete kernel dump and I uploaded to ftp.whamcloud.com in /uploads/ Thanks.
Lixin Liu Simon Fraser University
|
| Comment by Sebastien Piechurski [ 19/Nov/18 ] |
|
We have one complete vmcore from an MDS running kernel 3.10.0-693.11.1.el7 and lustre 2.7.21.2. I have uploaded it to ftp.whamcloud.com/uploads/
|
| Comment by Yang Sheng [ 19/Nov/18 ] |
|
Hi, Sebastien, Looks like you use a non-standard combination of lustre & kernel? 2.7.21.2 should use 3.10.0.514.xx kernel. Can you provide debuginfo rpms? Thanks, |
| Comment by Sebastien Piechurski [ 20/Nov/18 ] |
|
Hi Yang Sheng, I have just uploaded the corresponding lustre and kernel debuginfo packages to the same directory. Regards,
Sebastien. |
| Comment by rajgautam [ 27/Apr/19 ] |
|
Also seen in server running lustre version 2.11.0.201 and kernel version 3.10.0-693.21.1.x3.1.11.x86_64 Apr 25 11:49:02 hostname-n03 kernel: Pid: 26722, comm: mdt03_004 |
| Comment by Andrew Perepechko [ 29/May/19 ] |
|
We, at Cray, encountered a bunch of similar crashes which we associated with the broken rwsem implementation in certain RHEL7 kernels. https://access.redhat.com/solutions/3393611 Resolution |
| Comment by Yang Sheng [ 29/May/19 ] |
|
Hi, Andrew, Thanks for the info. It is really a tricky one. Thanks, |
| Comment by Peter Jones [ 11/Jun/19 ] |
|
Red Hat bug not Lustre bug |