[LU-15122] Lustre ASSERTION( iobuf->dr_rw == 0 ) crash on Arm server end Created: 18/Oct/21 Updated: 20/Nov/21 Resolved: 20/Nov/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Upstream |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Xinliang Liu | Assignee: | Xinliang Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | arm, arm-server | ||
| Environment: |
[centos@beegfs-test-01 ~]$ lsb_release -a |
||
| Issue Links: |
|
||||||||
| Epic/Theme: | ldiskfs | ||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
It is easy to enconter bellow crash when running some tests on Arm all-in-one(server and client) test evironment. Tests such as fsx tests: sanityn 16a, 16b etc. [57580.041008] Lustre: DEBUG MARKER: -----============= acceptance-small: sanityn ============----- Tue Oct 12 02:22:12 UTC 2021 [57580.414512] Lustre: DEBUG MARKER: excepting tests: 28 [57580.528887] Lustre: Mounted lustre-client [57580.868328] Lustre: DEBUG MARKER: beegfs-test-01.novalocal: executing check_config_client /mnt/lustre [57581.696557] Lustre: DEBUG MARKER: Using TIMEOUT=20 [57581.854832] Lustre: Modifying parameter general.*.*.lbug_on_grant_miscount in log params [57582.331371] Lustre: DEBUG MARKER: == sanityn test 16b: 12500 iterations of dual-mount fsx at small size ========================================================== 02:22:15 (1634005335) [57585.485613] LustreError: 5594:0:(osd_io.c:538:osd_do_bio()) ASSERTION( iobuf->dr_rw == 0 ) failed: page_idx 1, block_idx 16, i 13,start_blocks: 0, count: 29, npages: 2 [57585.488764] LustreError: 5594:0:(osd_io.c:538:osd_do_bio()) LBUG [57585.490011] Pid: 5594, comm: ll_ost_io03_000 4.18.0-305.7.1.el8_lustre.aarch64 #1 SMP Mon Jul 19 08:24:26 UTC 2021 [57585.492109] Call Trace: [57585.492633] [<0>] libcfs_call_trace+0xb8/0x118 [libcfs] [57585.493719] [<0>] lbug_with_loc+0x60/0xa0 [libcfs] [57585.494721] [<0>] osd_do_bio.constprop.21+0x6c4/0xcf8 [osd_ldiskfs] [57585.496012] [<0>] osd_ldiskfs_map_write.constprop.20+0xb4/0xf0 [osd_ldiskfs] [57585.497464] [<0>] osd_ldiskfs_map_inode_pages+0x66c/0x958 [osd_ldiskfs] [57585.498880] [<0>] osd_write_commit+0x4fc/0xbb8 [osd_ldiskfs] [57585.500047] [<0>] ofd_commitrw_write+0x658/0x1e68 [ofd] [57585.501133] [<0>] ofd_commitrw+0x360/0xa18 [ofd] [57585.502180] [<0>] tgt_brw_write+0x1624/0x2d90 [ptlrpc] [57585.503299] [<0>] tgt_handle_request0+0xd0/0x978 [ptlrpc] [57585.504466] [<0>] tgt_request_handle+0x7c0/0x1a38 [ptlrpc] [57585.505663] [<0>] ptlrpc_server_handle_request+0x3bc/0x11e8 [ptlrpc] [57585.507026] [<0>] ptlrpc_main+0xd28/0x15f0 [ptlrpc] [57585.508034] [<0>] kthread+0x130/0x138 [57585.508861] [<0>] ret_from_fork+0x10/0x18 [57585.509697] Kernel panic - not syncing: LBUG [57585.510570] CPU: 6 PID: 5594 Comm: ll_ost_io03_000 Kdump: loaded Tainted: P OE --------- - - 4.18.0-305.7.1.el8_lustre.aarch64 #1 [57585.513181] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [57585.514576] Call trace: [57585.515080] dump_backtrace+0x0/0x188 [57585.515826] show_stack+0x24/0x30 [57585.516530] dump_stack+0x9c/0xbc [57585.517189] panic+0x130/0x2f8 [57585.517802] param_set_delay_minmax.isra.1+0x0/0xd0 [libcfs] [57585.518972] osd_do_bio.constprop.21+0x6c4/0xcf8 [osd_ldiskfs] [57585.520204] osd_ldiskfs_map_write.constprop.20+0xb4/0xf0 [osd_ldiskfs] [57585.521607] osd_ldiskfs_map_inode_pages+0x66c/0x958 [osd_ldiskfs] [57585.522910] osd_write_commit+0x4fc/0xbb8 [osd_ldiskfs] [57585.524012] ofd_commitrw_write+0x658/0x1e68 [ofd] [57585.525023] ofd_commitrw+0x360/0xa18 [ofd] [57585.525982] tgt_brw_write+0x1624/0x2d90 [ptlrpc] [57585.527046] tgt_handle_request0+0xd0/0x978 [ptlrpc] [57585.528160] tgt_request_handle+0x7c0/0x1a38 [ptlrpc] [57585.529316] ptlrpc_server_handle_request+0x3bc/0x11e8 [ptlrpc] [57585.530622] ptlrpc_main+0xd28/0x15f0 [ptlrpc] [57585.531557] kthread+0x130/0x138 [57585.532236] ret_from_fork+0x10/0x18 [57585.533308] SMP: stopping secondary CPUs [57585.539225] Starting crashdump kernel... [57585.540042] Bye! |
| Comments |
| Comment by Xinliang Liu [ 18/Oct/21 ] |
|
Add some print msgs in the blocks[] array assignment and grab the kernel log. It seems the blocks[] array access is exceeding the assignment size. Kernel msgs printing patch: --- a/lustre/osd-ldiskfs/osd_io.c +++ b/lustre/osd-ldiskfs/osd_io.c @@ -1143,6 +1150,7 @@ cont_map: static int osd_ldiskfs_map_inode_pages(struct inode *inode, ## Assignment function * want to avoid that as much as possible. */ if (oh->oh_declared_ext <= 0) { + CDEBUG(D_OTHER,"call osd_ldiskfs_map_write here!!\n"); rc = osd_ldiskfs_map_write(inode, iobuf, osd, start_blocks, count, &disk_size, user_size); @@ -1188,6 +1198,18 @@ cont_map: rc = 0; }+ if (create && iobuf->dr_rw) { + CDEBUG(D_OTHER,"Page_index=%lu: rc=%d, (blen=%ld, total=%ld, m_len=%d), blocks:{\n", + fp->index, ret, blen, total, map.m_len); + for (c=0; c < total; c++){ + unsigned long long value = *(blocks + c); + if (value == 0) + CDEBUG(D_OTHER,"blocks[%d]=%llu \n", c, value); + } + CDEBUG(D_OTHER,"}\n"); + } + + --- a/lustre/osd-ldiskfs/osd_io.c +++ b/lustre/osd-ldiskfs/osd_io.c @@ -530,12 +530,17 @@ static int osd_do_bio(struct osd_device *osd, struct inode *inode, nblocks = 1; if (blocks[block_idx + i] == 0) { /* hole */ - LASSERTF(iobuf->dr_rw == 0, - "page_idx %u, block_idx %u, i %u," - "start_blocks: %llu, count: %llu, npages: %d\n", - page_idx, block_idx, i, - (unsigned long long)start_blocks, - (unsigned long long)count, npages); + if (iobuf->dr_rw == 1) { + CDEBUG(D_OTHER,"Write with hole!!: blocks[%d]=0, iobuf_line=%d, iobuf_error=%d," + "page_idx %u, block_idx %u, i %u," + "start_blocks: %llu, count: %llu, npages: %d\n", + block_idx + i, iobuf->dr_init_at, iobuf->dr_error, + page_idx, block_idx, i, + (unsigned long long)start_blocks, + (unsigned long long)count, npages); + rc = -EFBIG; + goto out; + } memset(kmap(page) + page_offset, 0, blocksize); kunmap(page); continue; Kernel log: blocks[] array assignment size is 25(see total var), so blocks[25] exceeds the assignment size.
(osd_io.c:1093:osd_ldiskfs_map_inode_pages()) inode 104: map 2 pages from 22
(osd_io.c:1095:osd_ldiskfs_map_inode_pages()) Enter inode 104: create:1, is_write:1 =======>
(osd_io.c:1203:osd_ldiskfs_map_inode_pages()) Page_index=22: rc=25, (blen=32, total=25, m_len=25), blocks:{
(osd_io.c:1209:osd_ldiskfs_map_inode_pages()) }
(osd_io.c:1153:osd_ldiskfs_map_inode_pages()) call osd_ldiskfs_map_write here!!
(osd_io.c:503:osd_do_bio()) Process entered
(osd_io.c:426:osd_bio_init()) Process entered
(osd_io.c:449:osd_bio_init()) Process leaving (rc=0 : 0 : 0)
(osd_io.c:540:osd_do_bio()) Write with hole!!: blocks[25]=0, iobuf_line=1593, iobuf_error=0,page_idx 1, block_idx 16, i 9,start_blocks: 0, count: 25, npages: 2
|
| Comment by Gerrit Updater [ 19/Oct/21 ] |
|
"xinliang <xinliang.liu@linaro.org>" uploaded a new patch: https://review.whamcloud.com/45288 |
| Comment by Gerrit Updater [ 20/Nov/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45288/ |
| Comment by Peter Jones [ 20/Nov/21 ] |
|
Landed for 2.15 |