Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.16.0
-
None
-
3
-
9223372036854775807
Description
For the master next on Oct 10 suddenly this crash came out and seems to repeat regularly though not very often since then.
[338005.158491] Lustre: DEBUG MARKER: == sanity test 273b: DoM: race writeback and object destroy ===================== 18:58:27 (1666652307) [338005.392174] LustreError: 17734:0:(osc_cache.c:2484:osc_teardown_async_page()) extent ffff8800400af8e8@{[0 -> 255/1023], [2|0|-|cache|wi|ffff8800badddb28], [1073152|256|+|-|ffff8802276bfc00|1024| (null)]} trunc at 0. [338005.394627] LustreError: 17734:0:(osc_cache.c:2484:osc_teardown_async_page()) ### extent: ffff8800400af8e8 ns: lustre-OST0003-osc-ffff8803228cca88 lock: ffff8802276bfc00/0xd5e49483fedfcc95 lrc: 2/0,0 mode: PW/PW res: [0x51a7a:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->1048575) gid 0 flags: 0x800020000000000 nid: local remote: 0xd5e49483fedfcc9c expref: -99 pid: 17734 timeout: 0 lvb_type: 1 [338005.397689] LustreError: 17734:0:(osc_page.c:174:osc_page_delete()) page@ffff8802b0a34e18[2 ffff880309e4ec80 5 1 (null)] [338005.399302] LustreError: 17734:0:(osc_page.c:174:osc_page_delete()) vmpage @ffffea0005eb7f80 2fffff0000083d 3:0 ffff8802b0a34e18 256 lru [338005.400969] LustreError: 17734:0:(osc_page.c:174:osc_page_delete()) osc-page@ffff8802b0a34e78 0: 1< 2 + - > 2< 0 0 4096 0x0 0x40420 | (null) ffff88028f868710 ffff8800badddb28 > 3< 0 0 > 4< 0 0 8 97349632 - | - - + - > 5< - - + - | 0 - | 256 - -> [338005.403820] LustreError: 17734:0:(osc_page.c:174:osc_page_delete()) end page@ffff8802b0a34e18 [338005.405381] LustreError: 17734:0:(osc_page.c:174:osc_page_delete()) Trying to teardown failed: -16 [338005.406554] LustreError: 17734:0:(osc_page.c:175:osc_page_delete()) ASSERTION( 0 ) failed: [338005.407645] LustreError: 17734:0:(osc_page.c:175:osc_page_delete()) LBUG [338005.408234] Pid: 17734, comm: multiop 3.10.0-7.9-debug #2 SMP Tue Feb 1 18:17:58 EST 2022 [338005.409142] Call Trace: [338005.409630] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs] [338005.410130] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs] [338005.410607] [<0>] osc_page_delete+0x48d/0x500 [osc] [338005.411107] [<0>] __cl_page_delete+0x82/0x320 [obdclass] [338005.411593] [<0>] cl_page_delete+0x33/0x110 [obdclass] [338005.412078] [<0>] ll_invalidatepage+0x7f/0x170 [lustre] [338005.412568] [<0>] do_invalidatepage_range+0x71/0x80 [338005.413050] [<0>] truncate_inode_page+0x77/0x80 [338005.413652] [<0>] truncate_inode_pages_range+0x1ea/0x7d0 [338005.414215] [<0>] truncate_inode_pages_final+0x4c/0x60 [338005.414791] [<0>] ll_truncate_inode_pages_final+0x21/0xe0 [lustre][338005.415385] [<0>] ll_delete_inode+0x38/0x150 [lustre] [338005.415929] [<0>] evict+0xaf/0x180 [338005.416481] [<0>] iput+0xf5/0x180 [338005.416977] [<0>] __dentry_kill+0x148/0x1b0 [338005.417523] [<0>] dput+0xca/0x1cc [338005.418054] [<0>] __fput+0x1a0/0x240 [338005.418624] [<0>] ____fput+0xe/0x10 [338005.419155] [<0>] task_work_run+0xb5/0xf0 [338005.419584] [<0>] do_notify_resume+0x92/0xb0 [338005.420091] [<0>] int_signal+0x12/0x17 [338005.420650] [<0>] 0xfffffffffffffffe [338005.421218] Kernel panic - not syncing: LBUG [338005.421744] CPU: 1 PID: 17734 Comm: multiop Kdump: loaded Tainted: P W OE ------------ 3.10.0-7.9-debug #2 [338005.422211] Hardware name: Red Hat KVM, BIOS 1.15.0-1.module_el8.6.0+1087+b42c8331 04/01/2014 [338005.422211] Call Trace: [338005.422211] [<ffffffff817d93f8>] dump_stack+0x19/0x1b [338005.422211] [<ffffffff817d24d5>] panic+0xe8/0x20d [338005.422211] [<ffffffff817e324e>] ? _raw_spin_unlock+0xe/0x20 [338005.422211] [<ffffffffa022c56b>] lbug_with_loc+0x9b/0xa0 [libcfs][338005.422211] [<ffffffffa08d22dd>] osc_page_delete+0x48d/0x500 [osc][338005.422211] [<ffffffffa03e2612>] __cl_page_delete+0x82/0x320 [obdclass][338005.422211] [<ffffffffa03e28e3>] cl_page_delete+0x33/0x110 [obdclass][338005.422211] [<ffffffffa171b30f>] ll_invalidatepage+0x7f/0x170 [lustre][338005.422211] [<ffffffff811c5b41>] do_invalidatepage_range+0x71/0x80 [338005.422211] [<ffffffff811c5be7>] truncate_inode_page+0x77/0x80 [338005.422211] [<ffffffff811c5e1a>] truncate_inode_pages_range+0x1ea/0x7d0 [338005.422211] [<ffffffff811c646c>] truncate_inode_pages_final+0x4c/0x60 [338005.422211] [<ffffffffa16f5121>] ll_truncate_inode_pages_final+0x21/0xe0 [lustre] [338005.422211] [<ffffffffa16f5648>] ll_delete_inode+0x38/0x150 [lustre] [338005.422211] [<ffffffff81264a2f>] evict+0xaf/0x180 [338005.422211] [<ffffffff81264e65>] iput+0xf5/0x180 [338005.422211] [<ffffffff8125ff68>] __dentry_kill+0x148/0x1b0 [338005.422211] [<ffffffff8126072a>] dput+0xca/0x1c0 [338005.422211] [<ffffffff81248000>] __fput+0x1a0/0x240 [338005.422211] [<ffffffff8124817e>] ____fput+0xe/0x10 [338005.422211] [<ffffffff810b69b5>] task_work_run+0xb5/0xf0 [338005.422211] [<ffffffff8102ccb2>] do_notify_resume+0x92/0xb0 [338005.422211] [<ffffffff817ee363>] int_signal+0x12/0x17
Always test 273b
https://knox.linuxhacker.ru/crashdb_ui_external.py.cgi?newid=67744 tracks all such crashes
the list of patches in that master-next release:
a99fcef712 LU-15721 llite: only statfs for projid if PROJINHERIT set 4f3e709b69 LU-16219 tests: syntax error fix c90e7da475 LU-16198 tests: increase margin for sanity/33hh fe12cf7af6 LU-16200 tests: test_32[f,g]: specify blocksize explicitly e0a8f3f60d LU-16180 ptlrpc: reduce lock contention in ptlrpc_free_committed 873276be1c LU-16076 utils: enhance 'lfs check' command 9f0b2c2e23 LU-16044 osd: discard pagecache in truncate's declaration 90307c2721 LU-15451 sec: retry ro mount if read-only flag set b871286a4e LU-15619 osc: Remove oap lock b918031786 LU-15014 osc: Fix possible null pointer 98c70df982 LU-13364 utils: fix bad output for lnetctl import --show 1c095e3a80 LU-14165 utils: llog_reader: display changleog_user records b89c2797eb LU-16139 statahead: avoid to block ptlrpcd interpret context ff59bd3a4a LU-6142 obdclass: change some foo0() to __foo() 0572fc241a LU-10391 lnet: support IPv6 in lnet_inet_enumerate() 5f5523beb1 LU-16002 ptlrpc: reduce pinger eviction time a1dfff48db LU-16211 o2iblnd: Avoid NULL md deref 3bb1366fe8 LU-16046 ldlm: group lock fix f79107cf9a LU-16046 revert: "LU-9964 llite: prevent mulitple group locks"
I tried excluding the LU-15619, LU-15014 and LU-16139 and it did not help and the rest do not appear to be related.