[LU-16263] crash in sanity 273b osc_page_delete LBUG Created: 25/Oct/22 Updated: 13/Sep/23 Resolved: 21/Mar/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.16.0 |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Oleg Drokin | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
For the master next on Oct 10 suddenly this crash came out and seems to repeat regularly though not very often since then. [338005.158491] Lustre: DEBUG MARKER: == sanity test 273b: DoM: race writeback and object destroy ===================== 18:58:27 (1666652307)
[338005.392174] LustreError: 17734:0:(osc_cache.c:2484:osc_teardown_async_page()) extent ffff8800400af8e8@{[0 -> 255/1023], [2|0|-|cache|wi|ffff8800badddb28], [1073152|256|+|-|ffff8802276bfc00|1024| (null)]} trunc at 0.
[338005.394627] LustreError: 17734:0:(osc_cache.c:2484:osc_teardown_async_page()) ### extent: ffff8800400af8e8 ns: lustre-OST0003-osc-ffff8803228cca88 lock: ffff8802276bfc00/0xd5e49483fedfcc95 lrc: 2/0,0 mode: PW/PW res: [0x51a7a:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->1048575) gid 0 flags: 0x800020000000000 nid: local remote: 0xd5e49483fedfcc9c expref: -99 pid: 17734 timeout: 0 lvb_type: 1
[338005.397689] LustreError: 17734:0:(osc_page.c:174:osc_page_delete()) page@ffff8802b0a34e18[2 ffff880309e4ec80 5 1 (null)]
[338005.399302] LustreError: 17734:0:(osc_page.c:174:osc_page_delete()) vmpage @ffffea0005eb7f80 2fffff0000083d 3:0 ffff8802b0a34e18 256 lru
[338005.400969] LustreError: 17734:0:(osc_page.c:174:osc_page_delete()) osc-page@ffff8802b0a34e78 0: 1< 2 + - > 2< 0 0 4096 0x0 0x40420 | (null) ffff88028f868710 ffff8800badddb28 > 3< 0 0 > 4< 0 0 8 97349632 - | - - + - > 5< - - + - | 0 - | 256 - ->
[338005.403820] LustreError: 17734:0:(osc_page.c:174:osc_page_delete()) end page@ffff8802b0a34e18
[338005.405381] LustreError: 17734:0:(osc_page.c:174:osc_page_delete()) Trying to teardown failed: -16
[338005.406554] LustreError: 17734:0:(osc_page.c:175:osc_page_delete()) ASSERTION( 0 ) failed:
[338005.407645] LustreError: 17734:0:(osc_page.c:175:osc_page_delete()) LBUG
[338005.408234] Pid: 17734, comm: multiop 3.10.0-7.9-debug #2 SMP Tue Feb 1 18:17:58 EST 2022
[338005.409142] Call Trace:
[338005.409630] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs]
[338005.410130] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs]
[338005.410607] [<0>] osc_page_delete+0x48d/0x500 [osc]
[338005.411107] [<0>] __cl_page_delete+0x82/0x320 [obdclass]
[338005.411593] [<0>] cl_page_delete+0x33/0x110 [obdclass]
[338005.412078] [<0>] ll_invalidatepage+0x7f/0x170 [lustre]
[338005.412568] [<0>] do_invalidatepage_range+0x71/0x80
[338005.413050] [<0>] truncate_inode_page+0x77/0x80
[338005.413652] [<0>] truncate_inode_pages_range+0x1ea/0x7d0
[338005.414215] [<0>] truncate_inode_pages_final+0x4c/0x60
[338005.414791] [<0>] ll_truncate_inode_pages_final+0x21/0xe0 [lustre][338005.415385] [<0>] ll_delete_inode+0x38/0x150 [lustre]
[338005.415929] [<0>] evict+0xaf/0x180
[338005.416481] [<0>] iput+0xf5/0x180
[338005.416977] [<0>] __dentry_kill+0x148/0x1b0
[338005.417523] [<0>] dput+0xca/0x1cc
[338005.418054] [<0>] __fput+0x1a0/0x240
[338005.418624] [<0>] ____fput+0xe/0x10
[338005.419155] [<0>] task_work_run+0xb5/0xf0
[338005.419584] [<0>] do_notify_resume+0x92/0xb0
[338005.420091] [<0>] int_signal+0x12/0x17
[338005.420650] [<0>] 0xfffffffffffffffe
[338005.421218] Kernel panic - not syncing: LBUG
[338005.421744] CPU: 1 PID: 17734 Comm: multiop Kdump: loaded Tainted: P W OE ------------ 3.10.0-7.9-debug #2
[338005.422211] Hardware name: Red Hat KVM, BIOS 1.15.0-1.module_el8.6.0+1087+b42c8331 04/01/2014
[338005.422211] Call Trace:
[338005.422211] [<ffffffff817d93f8>] dump_stack+0x19/0x1b
[338005.422211] [<ffffffff817d24d5>] panic+0xe8/0x20d
[338005.422211] [<ffffffff817e324e>] ? _raw_spin_unlock+0xe/0x20
[338005.422211] [<ffffffffa022c56b>] lbug_with_loc+0x9b/0xa0 [libcfs][338005.422211] [<ffffffffa08d22dd>] osc_page_delete+0x48d/0x500 [osc][338005.422211] [<ffffffffa03e2612>] __cl_page_delete+0x82/0x320 [obdclass][338005.422211] [<ffffffffa03e28e3>] cl_page_delete+0x33/0x110 [obdclass][338005.422211] [<ffffffffa171b30f>] ll_invalidatepage+0x7f/0x170 [lustre][338005.422211] [<ffffffff811c5b41>] do_invalidatepage_range+0x71/0x80
[338005.422211] [<ffffffff811c5be7>] truncate_inode_page+0x77/0x80
[338005.422211] [<ffffffff811c5e1a>] truncate_inode_pages_range+0x1ea/0x7d0
[338005.422211] [<ffffffff811c646c>] truncate_inode_pages_final+0x4c/0x60
[338005.422211] [<ffffffffa16f5121>] ll_truncate_inode_pages_final+0x21/0xe0 [lustre]
[338005.422211] [<ffffffffa16f5648>] ll_delete_inode+0x38/0x150 [lustre]
[338005.422211] [<ffffffff81264a2f>] evict+0xaf/0x180
[338005.422211] [<ffffffff81264e65>] iput+0xf5/0x180
[338005.422211] [<ffffffff8125ff68>] __dentry_kill+0x148/0x1b0
[338005.422211] [<ffffffff8126072a>] dput+0xca/0x1c0
[338005.422211] [<ffffffff81248000>] __fput+0x1a0/0x240
[338005.422211] [<ffffffff8124817e>] ____fput+0xe/0x10
[338005.422211] [<ffffffff810b69b5>] task_work_run+0xb5/0xf0
[338005.422211] [<ffffffff8102ccb2>] do_notify_resume+0x92/0xb0
[338005.422211] [<ffffffff817ee363>] int_signal+0x12/0x17
Always test 273b https://knox.linuxhacker.ru/crashdb_ui_external.py.cgi?newid=67744 tracks all such crashes the list of patches in that master-next release: a99fcef712 LU-15721 llite: only statfs for projid if PROJINHERIT set 4f3e709b69 LU-16219 tests: syntax error fix c90e7da475 LU-16198 tests: increase margin for sanity/33hh fe12cf7af6 LU-16200 tests: test_32[f,g]: specify blocksize explicitly e0a8f3f60d LU-16180 ptlrpc: reduce lock contention in ptlrpc_free_committed 873276be1c LU-16076 utils: enhance 'lfs check' command 9f0b2c2e23 LU-16044 osd: discard pagecache in truncate's declaration 90307c2721 LU-15451 sec: retry ro mount if read-only flag set b871286a4e LU-15619 osc: Remove oap lock b918031786 LU-15014 osc: Fix possible null pointer 98c70df982 LU-13364 utils: fix bad output for lnetctl import --show 1c095e3a80 LU-14165 utils: llog_reader: display changleog_user records b89c2797eb LU-16139 statahead: avoid to block ptlrpcd interpret context ff59bd3a4a LU-6142 obdclass: change some foo0() to __foo() 0572fc241a LU-10391 lnet: support IPv6 in lnet_inet_enumerate() 5f5523beb1 LU-16002 ptlrpc: reduce pinger eviction time a1dfff48db LU-16211 o2iblnd: Avoid NULL md deref 3bb1366fe8 LU-16046 ldlm: group lock fix f79107cf9a LU-16046 revert: "LU-9964 llite: prevent mulitple group locks" I tried excluding the |
| Comments |
| Comment by Oleg Drokin [ 25/Oct/22 ] |
|
This seems to be related to the now closed
Alex also reports hitting this on his systems from time to time, |
| Comment by Alex Zhuravlev [ 25/Oct/22 ] |
|
usually hit this in racer using zfs:
[ 1393.723944] LustreError: 202459:0:(osc_cache.c:2484:osc_teardown_async_page()) extent 0000000076978463@{[0 -> 255/255], [2|0|-|cache|wi|0000000017bdddf3], [1703936|176|+|-|000000008bd8d628|256|00000000e98e7788]} trunc at 80.
[ 1393.724211] LustreError: 202459:0:(osc_cache.c:2484:osc_teardown_async_page()) ### extent: 0000000076978463 ns: lustre-OST0001-osc-ffff96b339e42000 lock: 000000008bd8d628/0xacdee9107b049393 lrc: 3/0,0 mode: PW/PW res: [0x2c0000400:0x5aa:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 262144->18446744073709551615) gid 0 flags: 0x800020000020000 nid: local remote: 0xacdee9107b0493a8 expref: -99 pid: 202304 timeout: 0 lvb_type: 1
[ 1393.724384] LustreError: 202459:0:(osc_page.c:174:osc_page_delete()) page@00000000def37c21[2 000000004f3fbbc0 5 1 00000000e98e7788]
[ 1393.724384]
[ 1393.724456] LustreError: 202459:0:(osc_page.c:174:osc_page_delete()) vmpage @0000000057d3a516 4100000001039 3:0 ffff96b32bd52eb8 80 lru
[ 1393.724456]
[ 1393.724529] LustreError: 202459:0:(osc_page.c:174:osc_page_delete()) osc-page@0000000059260e88 80: 1< 2 + - > 2< 327680 0 4096 0x0 0x40420 | 00000000e98e7788 00000000f9a6b8a1 0000000017bdddf3 > 3< 0 0 > 4< 0 0 8 69206016 - | - - + - > 5< - - + - | 0 - | 256 - ->
[ 1393.724529]
[ 1393.724647] LustreError: 202459:0:(osc_page.c:174:osc_page_delete()) end page@00000000def37c21
[ 1393.724647]
[ 1393.724710] LustreError: 202459:0:(osc_page.c:174:osc_page_delete()) Trying to teardown failed: -16
[ 1393.724761] LustreError: 202459:0:(osc_page.c:175:osc_page_delete()) ASSERTION( 0 ) failed:
[ 1393.724811] LustreError: 202459:0:(osc_page.c:175:osc_page_delete()) LBUG
[ 1393.724903] Pid: 202459, comm: rm 4.18.0 #2 SMP Sun Oct 23 17:58:04 UTC 2022
[ 1393.724955] Call Trace TBD:
[ 1393.724995] [<0>] libcfs_call_trace+0x67/0x90 [libcfs]
[ 1393.725156] [<0>] lbug_with_loc+0x3e/0x80 [libcfs]
[ 1393.725266] [<0>] osc_page_delete+0x4b4/0x4c0 [osc]
[ 1393.725396] [<0>] __cl_page_delete+0x7c/0x2f0 [obdclass]
[ 1393.725518] [<0>] cl_page_delete+0x25/0xe0 [obdclass]
[ 1393.725645] [<0>] ll_invalidatepage+0x95/0x180 [lustre]
[ 1393.725749] [<0>] truncate_cleanup_page+0x6a/0xb0
[ 1393.725844] [<0>] truncate_inode_pages_range+0x1c2/0x7a0
[ 1393.725951] [<0>] ll_truncate_inode_pages_final+0x13/0xe0 [lustre]
[ 1393.726084] [<0>] ll_delete_inode+0x33/0x140 [lustre]
[ 1393.726185] [<0>] evict+0xbc/0x180
[ 1393.726260] [<0>] do_unlinkat+0x22c/0x2c0
[ 1393.726335] [<0>] do_syscall_64+0x43/0x120
[ 1393.726409] [<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca
|
| Comment by Etienne Aujames [ 26/Oct/22 ] |
|
Could this be related to the |
| Comment by Alex Zhuravlev [ 26/Oct/22 ] |
I don't see how - |
| Comment by Xing Huang [ 18/Feb/23 ] |
|
"Bobi Jam <bobijam@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/#/c/fs/lustre-release/+/50005/ |
| Comment by Gerrit Updater [ 21/Mar/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50005/ |
| Comment by Peter Jones [ 21/Mar/23 ] |
|
Landed for 2.16 |