[LU-9280] coral-beta-combined build 134 (osd_object.c:745:osd_attr_get()) ASSERTION( obj->oo_db ) failed Created: 30/Mar/17 Updated: 14/Jun/18 Resolved: 23/Apr/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | John Salinas (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | LS_RZ, prod | ||
| Environment: |
Running Lustre 2.9 + coral-betal-combined branch based on RC3: kmod-lustre-tests-2.9.0_dirty-1.el7.centos.x86_64 Pool configuration: Example from ost1. NAME STATE READ WRITE CKSUM |
||
| Issue Links: |
|
||||
| Severity: | 1 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
Running Lustre 2.9 + coral-betal-combined branch based on RC3: IOR tests: Began: Thu Mar 30 00:07:33 2017 Test 0 started: Thu Mar 30 00:07:33 2017 access bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter While IOR was writing we hit the following error: [19744.556366] LustreError: 84625:0:(osd_object.c:597:osd_object_destroy()) lsdraid-OST0000: failed to remove [0x100000000:0x1c:0x0] from accounting ZAP for usr 0: rc = -5 [19744.580303] LustreError: 84625:0:(osd_object.c:597:osd_object_destroy()) Skipped 1 previous similar message [19745.014350] LustreError: 84625:0:(osd_object.c:603:osd_object_destroy()) lsdraid-OST0000: failed to remove [0x100000000:0x1c:0x0] from accounting ZAP for grp 0: rc = -5 [19745.037113] LustreError: 84625:0:(osd_object.c:603:osd_object_destroy()) Skipped 2 previous similar messages [19768.423554] LustreError: 84625:0:(osd_object.c:597:osd_object_destroy()) lsdraid-OST0000: failed to remove [0x100000000:0x1f:0x0] from accounting ZAP for usr 0: rc = -52 [19768.586567] LustreError: 84625:0:(osd_object.c:603:osd_object_destroy()) lsdraid-OST0000: failed to remove [0x100000000:0x1f:0x0] from accounting ZAP for grp 0: rc = -52 [19779.750997] LustreError: 52432:0:(osd_object.c:745:osd_attr_get()) ASSERTION( obj->oo_db ) failed: [19779.751007] LustreError: 50225:0:(osd_object.c:745:osd_attr_get()) ASSERTION( obj->oo_db ) failed: [19779.751010] LustreError: 50225:0:(osd_object.c:745:osd_attr_get()) LBUG [19779.751012] Pid: 50225, comm: ll_ost01_002 [19779.751012] Call Trace: [19779.751043] [<ffffffffa0a1b7d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] [19779.751054] [<ffffffffa0a1b841>] lbug_with_loc+0x41/0xb0 [libcfs] [19779.751072] [<ffffffffa0968210>] osd_attr_set+0x0/0xce0 [osd_zfs] [19779.751096] [<ffffffffa0f1b405>] ofd_attr_get+0xa5/0x230 [ofd] [19779.751111] [<ffffffffa0f29bfd>] ofd_lvbo_init+0x42d/0xb02 [ofd] [19779.751248] [<ffffffffa0cd22d9>] ldlm_handle_enqueue0+0x8f9/0x1680 [ptlrpc] [19779.751322] [<ffffffffa0cfa0f0>] ? lustre_swab_ldlm_request+0x0/0x30 [ptlrpc] [19779.751407] [<ffffffffa0d52dc2>] tgt_enqueue+0x62/0x210 [ptlrpc] [19779.751483] [<ffffffffa0d57225>] tgt_request_handle+0x915/0x1320 [ptlrpc] [19779.751545] [<ffffffffa0d031ab>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] [19779.751563] [<ffffffffa0a28128>] ? lc_watchdog_touch+0x68/0x180 [libcfs] [19779.751621] [<ffffffffa0d00d68>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [19779.751635] [<ffffffff810b8952>] ? default_wake_function+0x12/0x20 [19779.751639] [<ffffffff810af0b8>] ? __wake_up_common+0x58/0x90 [19779.751708] [<ffffffffa0d07260>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc] [19779.751765] [<ffffffffa0d067c0>] ? ptlrpc_main+0x0/0x1de0 [ptlrpc] [19779.751775] [<ffffffff810a5b8f>] kthread+0xcf/0xe0 [19779.751779] [<ffffffff810a5ac0>] ? kthread+0x0/0xe0 [19779.751789] [<ffffffff81646a98>] ret_from_fork+0x58/0x90 [19779.751794] [<ffffffff810a5ac0>] ? kthread+0x0/0xe0 [19779.751795] [19779.751797] Kernel panic - not syncing: LBUG [19779.751801] CPU: 26 PID: 50225 Comm: ll_ost01_002 Tainted: G IOE ------------ 3.10.0-327.36.3.el7.x86_64 #1 [19779.751803] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 [19779.751813] ffffffffa0a38d4c 00000000e5fc8e4d ffff880fe9b33a78 ffffffff81636431 [19779.751820] ffff880fe9b33af8 ffffffff8162fcc0 ffffffff00000008 ffff880fe9b33b08 [19779.751827] ffff880fe9b33aa8 00000000e5fc8e4d 00000000e5fc8e4d 0000000000000092 [19779.751828] Call Trace: [19779.751843] [<ffffffff81636431>] dump_stack+0x19/0x1b [19779.751847] [<ffffffff8162fcc0>] panic+0xd8/0x1e7 [19779.751859] [<ffffffffa0a1b859>] lbug_with_loc+0x59/0xb0 [libcfs] [19779.751871] [<ffffffffa0968210>] osd_attr_get+0x2d0/0x2d0 [osd_zfs] [19779.751885] [<ffffffffa0f1b405>] ofd_attr_get+0xa5/0x230 [ofd] [19779.751898] [<ffffffffa0f29bfd>] ofd_lvbo_init+0x42d/0xb02 [ofd] [19779.751952] [<ffffffffa0cd22d9>] ldlm_handle_enqueue0+0x8f9/0x1680 [ptlrpc] [19779.752010] [<ffffffffa0cfa0f0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] [19779.752084] [<ffffffffa0d52dc2>] tgt_enqueue+0x62/0x210 [ptlrpc] [19779.752165] [<ffffffffa0d57225>] tgt_request_handle+0x915/0x1320 [ptlrpc] [19779.752238] [<ffffffffa0d031ab>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] [19779.752255] [<ffffffffa0a28128>] ? lc_watchdog_touch+0x68/0x180 [libcfs] [19779.752326] [<ffffffffa0d00d68>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [19779.752333] [<ffffffff810b8952>] ? default_wake_function+0x12/0x20 [19779.752337] [<ffffffff810af0b8>] ? __wake_up_common+0x58/0x90 [19779.752409] [<ffffffffa0d07260>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc] [19779.752482] [<ffffffffa0d067c0>] ? ptlrpc_register_service+0xe40/0xe40 [ptlrpc] [19779.752489] [<ffffffff810a5b8f>] kthread+0xcf/0xe0 [19779.752494] [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140 [19779.752500] [<ffffffff81646a98>] ret_from_fork+0x58/0x90 [19779.752505] [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140 osd-zfs/osd_object.c:937 932 * dmu_tx_hold_bonus(tx, oid) called and then assigned 933 * to a transaction group. 934 */ 935 static int osd_attr_set(const struct lu_env *env, struct dt_object *dt, 936 const struct lu_attr *la, struct thandle *handle) 937 { 938 struct osd_thread_info *info = osd_oti_get(env); 939 sa_bulk_attr_t *bulk = osd_oti_get(env)->oti_attr_bulk; 940 struct osd_object *obj = osd_dt_obj(dt); 941 struct osd_device *osd = osd_obj2dev(obj); ofd/ofd_objects.c:780 775 * \retval 0 if successful 776 * \retval negative value on error 777 */ 778 int ofd_attr_get(const struct lu_env *env, struct ofd_object *fo, 779 struct lu_attr *la) 780 { 781 int rc = 0; 782 783 ENTRY; 784 Dump is at: |
| Comments |
| Comment by Peter Jones [ 31/Mar/17 ] |
|
Niu Could you please advise? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 01/Apr/17 ] |
|
I see it's "Lustre 2.9 + coral-betal-combined branch based on RC3:", where can I get the source code? |
| Comment by John Salinas (Inactive) [ 01/Apr/17 ] |
|
Lustre is stock Lustre 2.9.0 |
| Comment by John Salinas (Inactive) [ 04/Apr/17 ] |
|
Should I re-try this with checksums on? Aren't the lnet checksums turned off by default? |
| Comment by Niu Yawei (Inactive) [ 05/Apr/17 ] |
|
from the stacktrace, it seems unlikely related to cheksum. I'll look into the coral changes to see if there is anything suspicious. |
| Comment by John Salinas (Inactive) [ 12/Apr/17 ] |
|
Have you looked at the code? Do you have any questions we can get answered for you? |
| Comment by Niu Yawei (Inactive) [ 13/Apr/17 ] |
|
Yes, but I didn't find the root cause yet. Is this a clean 2.9.0 Lustre or any patches applied? |
| Comment by John Salinas (Inactive) [ 13/Apr/17 ] |
|
No patches to 2.9.0. We are making use of both 16MB RPCs from Lustre Client to OSS and have BRW size to 16 as well. |
| Comment by Gerrit Updater [ 14/Apr/17 ] |
|
Niu Yawei (yawei.niu@intel.com) uploaded a new patch: https://review.whamcloud.com/26617 |
| Comment by Niu Yawei (Inactive) [ 14/Apr/17 ] |
|
There is a defect in osd_object_create() is likely related to this bug, I pushed a patch to master for review, once it's passed review, I'll backport it to b2_9. |
| Comment by Gerrit Updater [ 17/Apr/17 ] |
|
Niu Yawei (yawei.niu@intel.com) uploaded a new patch: https://review.whamcloud.com/26653 |
| Comment by Niu Yawei (Inactive) [ 17/Apr/17 ] |
|
ported to b2_9: https://review.whamcloud.com/26653 |
| Comment by Gerrit Updater [ 23/Apr/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26617/ |
| Comment by Peter Jones [ 23/Apr/17 ] |
|
Landed for 2.10 |