[LU-10601] lov_stripe_lock()) ASSERTION( md->lsm_lock_owner == 0 ) failed Created: 04/Feb/18 Updated: 30/May/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Oleg Drokin | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
Hit this on current master-next, some sort of use after free it appears. [119243.473752] Lustre: DEBUG MARKER: == racer test 1: racer on clients: centos-90.localnet DURATION=2700 ================================== 09:38:17 (1517668697) [119292.154075] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200000401:0xfa:0x0], use llapi_layout_get_by_path() [119298.778891] Lustre: DEBUG MARKER: racer test_1: @@@@@@ FAIL: generate lss conf (mds1) [119363.272972] LustreError: 2970:0:(mdt_lvb.c:402:mdt_lvbo_fill()) lustre-MDT0000: small buffer size 448 for EA 472 (max_mdsize 472): rc = -34 [121710.001204] LustreError: 6319:0:(lov_obd.c:1308:lov_stripe_lock()) ASSERTION( md->lsm_lock_owner == 0 ) failed: [121710.002273] Slab corruption (Tainted: P OE ------------ ): kmalloc-192 start=ffff8800b7853140, len=192 [121710.002278] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6d 6b 6b 6b 6b 6b kkkkkkkkkkmkkkkk [121710.002280] 010: 0e 00 00 00 6b 6b 6b 6b c0 04 ef 9e 02 88 ff ff ....kkkk........ [121710.020795] LustreError: 6319:0:(lov_obd.c:1308:lov_stripe_lock()) LBUG [121710.021476] Pid: 6319, comm: cat [121710.022363] Call Trace: [121710.023546] [<ffffffffa020e7ce>] libcfs_call_trace+0x4e/0x60 [libcfs] [121710.024252] [<ffffffffa020e85c>] lbug_with_loc+0x4c/0xb0 [libcfs] [121710.024971] [<ffffffffa0870c79>] lov_stripe_lock+0xa9/0xb0 [lov] [121710.025674] [<ffffffffa0871283>] lov_attr_get_raid0+0x103/0x1f0 [lov] [121710.026383] [<ffffffffa1478c04>] ? ll_inode_size_lock+0x24/0x60 [lustre] [121710.027106] [<ffffffffa087429e>] lov_attr_get_composite+0x15e/0x3d0 [lov] [121710.027838] [<ffffffffa087103e>] lov_attr_get+0x1e/0x60 [lov] [121710.028705] [<ffffffffa0354a28>] cl_object_attr_get+0x78/0x150 [obdclass] [121710.029720] [<ffffffffa1463c7f>] ll_merge_attr+0x15f/0x420 [lustre] [121710.030566] [<ffffffffa14654e2>] ll_file_write_iter+0x4b2/0x540 [lustre] [121710.031423] [<ffffffffa146564b>] ll_file_aio_write+0xdb/0x180 [lustre] [121710.032751] [<ffffffffa1465794>] ll_file_write+0xa4/0x170 [lustre] [121710.033601] [<ffffffff811ed36d>] vfs_write+0xbd/0x1e0 [121710.034398] [<ffffffff811ede34>] SyS_write+0x84/0xf0 [121710.035191] [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b [121710.036001] [121710.036584] Kernel panic - not syncing: LBUG Crashdump on onyx-68 in 192.168.123.190-2018-02-03-10:1* |
| Comments |
| Comment by Oleg Drokin [ 05/Feb/18 ] |
|
I guess a somewhat related crash, also in current master-next: [167021.604141] Lustre: DEBUG MARKER: racer test_1: @@@@@@ FAIL: generate lss conf (mds1) [167032.628234] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200000401:0x165:0x0], use llapi_layout_get_by_path() [167189.832873] 2[2699]: segfault at 8 ip 00007f4c554a8728 sp 00007fff5cb268b0 error 4 in ld-2.17.so[7f4c5549d000+20000] [168793.524664] 7[25061]: segfault at 0 ip (null) sp 00007ffd859bb338 error 14 in 7[400000+6000] [169230.995638] 17[5237]: segfault at 8 ip 00007f5d2f872728 sp 00007ffe51ae4160 error 4 in ld-2.17.so[7f5d2f867000+20000] [169542.140240] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 [169542.143263] IP: [<ffffffffa047d266>] lov_attr_get_composite+0x126/0x3d0 [lov] [169542.146519] PGD 2c9d66067 PUD 2dce99067 PMD 0 [169542.147640] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [169542.149186] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect sysimgblt ttm ata_generic drm_kms_helper pata_acpi drm i2c_piix4 ata_piix virtio_balloon pcspkr serio_raw i2c_core virtio_blk virtio_console libata floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs] [169542.168711] CPU: 11 PID: 20181 Comm: cat Tainted: P OE ------------ 3.10.0-debug #2 [169542.169825] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [169542.170393] task: ffff8802fd8f8b40 ti: ffff880094500000 task.ti: ffff880094500000 [169542.171505] RIP: 0010:[<ffffffffa047d266>] [<ffffffffa047d266>] lov_attr_get_composite+0x126/0x3d0 [lov] [169542.172686] RSP: 0018:ffff880094503c20 EFLAGS: 00010246 [169542.173210] RAX: 0000000000000240 RBX: 0000000000000000 RCX: 0000000000000003 [169542.174288] RDX: ffff8800a162df48 RSI: ffff88029c0e5e80 RDI: ffff88008e602f68 [169542.175349] RBP: ffff880094503c70 R08: 0000000000000004 R09: 0000000000000001 [169542.176408] R10: 0000000000000000 R11: 0000000000000246 R12: ffff88029c0e5e80 [169542.177494] R13: 0000000000000000 R14: ffff8800a162df48 R15: aaaaaaaaaaaaaaab [169542.178581] FS: 00007fca37e29740(0000) GS:ffff88033e560000(0000) knlGS:0000000000000000 [169542.180166] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [169542.181020] CR2: 0000000000000004 CR3: 00000002fc329000 CR4: 00000000000006e0 [169542.183136] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [169542.184749] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [169542.185986] Stack: [169542.186711] 0000000000000000 0000000000000000 ffff88008e602f68 0000000000000000 [169542.187793] 00000000ec696e4b ffff88029c0e5e80 ffff8802ae257f48 ffff88008e602f68 [169542.188798] ffff8800a162df48 ffff88025752f828 ffff880094503c80 ffffffffa047a03e [169542.189768] Call Trace: [169542.207741] [<ffffffffa047a03e>] lov_attr_get+0x1e/0x60 [lov] [169542.208495] [<ffffffffa05c7a28>] cl_object_attr_get+0x78/0x150 [obdclass] [169542.209243] [<ffffffffa1450c7f>] ll_merge_attr+0x15f/0x420 [lustre] [169542.209945] [<ffffffffa14524e2>] ll_file_write_iter+0x4b2/0x540 [lustre] [169542.210896] [<ffffffffa145264b>] ll_file_aio_write+0xdb/0x180 [lustre] [169542.211610] [<ffffffffa1452794>] ll_file_write+0xa4/0x170 [lustre] [169542.212320] [<ffffffff811ed36d>] vfs_write+0xbd/0x1e0 [169542.212967] [<ffffffff811ede34>] SyS_write+0x84/0xf0 [169542.213651] [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b [169542.214415] Code: 48 89 d8 48 c7 45 c8 00 00 00 00 48 29 c8 48 c1 f8 06 49 0f af c7 48 39 cb 41 89 c5 0f 82 95 02 00 00 41 39 c0 0f 86 28 02 00 00 <f6> 43 04 01 74 8c 49 8b 84 24 00 01 00 00 49 63 d5 48 8b 44 d0 [169542.217540] RIP [<ffffffffa047d266>] lov_attr_get_composite+0x126/0x3d0 [lov] |
| Comment by Alex Zhuravlev [ 31/Oct/22 ] |
|
not exactly the same backtrace, but very similar, just hit on master: [ 610.918971] Lustre: DEBUG MARKER: == sanity-flr test 200a: racing IO, mirror extend and resync ========================================================== 04:27:05 (1667190425) .. [ 658.602971] LustreError: 36720:0:(lov_obd.c:1219:lov_stripe_unlock()) ASSERTION( md->lsm_lock_owner == get_current()->pid ) failed: [ 658.603285] LustreError: 36720:0:(lov_obd.c:1219:lov_stripe_unlock()) LBUG [ 658.603408] Pid: 36720, comm: cat 4.18.0 #2 SMP Sun Oct 23 17:58:04 UTC 2022 [ 658.603536] Call Trace TBD: [ 658.603599] [<0>] libcfs_call_trace+0x67/0x90 [libcfs] [ 658.603694] [<0>] lbug_with_loc+0x3e/0x80 [libcfs] [ 658.603791] [<0>] lov_stripe_unlock+0x54/0x60 [lov] [ 658.603886] [<0>] lov_attr_get_raid0+0x58/0x80 [lov] [ 658.603980] [<0>] lov_attr_get_composite+0xdd/0x400 [lov] [ 658.604112] [<0>] cl_object_attr_get+0x63/0x160 [obdclass] [ 658.604228] [<0>] ll_file_read_iter+0x274/0x930 [lustre] [ 658.604323] [<0>] new_sync_read+0xf7/0x120 [ 658.604390] [<0>] vfs_read+0xa1/0x150 [ 658.604457] [<0>] ksys_read+0x3d/0xa0 |