[LU-15722] IO write gets stuck on some sanityn test cases for 64K PAGE_SIZE Created: 06/Apr/22 Updated: 12/Jan/24 Resolved: 19/Aug/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Xinliang Liu | Assignee: | Xinliang Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | arm, arm-server | ||
| Environment: |
Arm CentOS 8 server and client all-in-one test environment. |
||
| Issue Links: |
|
||||||||
| Severity: | 2 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
IO write gets stuck on some test sanityn cases for 64K PAGE_SIZE, such as sanityn 16a, 16b, 71a etc.
Kernel logs: [ 1308.972770] Lustre: DEBUG MARKER: == sanityn test 71a: correct file map just after write operation is finished ========================================================== 01:56:46 (1638755806) [ 1309.958643] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 10000 [ 1310.704658] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 20000 [ 1312.183793] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 40000 [ 1312.186168] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 1 previous similar message [ 1314.448308] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 70000 [ 1314.538440] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 2 previous similar messages [ 1319.063438] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 120000 [ 1319.065847] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 4 previous similar messages [ 1327.150991] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 210000 [ 1327.153390] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 8 previous similar messages [ 1327.359397] Lustre: lustre-OST0001-osc-ffffcaf2f9e2e000: disconnect after 20s idle [ 1332.399266] Lustre: lustre-OST0001-osc-ffffcaf2fb2cf000: disconnect after 23s idle [ 1344.045735] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 400000 [ 1344.193155] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 18 previous similar messages [ 1349.278936] Lustre: ll_ost05_002: service thread pid 5101 was inactive for 40.209 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 1349.278963] Pid: 5126, comm: ll_ost_io00_002 4.18.0-305.7.1.el8_lustre.aarch64 #1 SMP Mon Jul 19 08:24:26 UTC 2021 [ 1349.282879] Lustre: Skipped 1 previous similar message [ 1349.285101] Call Trace: [ 1349.286688] [<0>] __switch_to+0xbc/0x108 [ 1349.287534] [<0>] osd_trans_stop+0x66c/0xc58 [osd_ldiskfs] [ 1349.288692] [<0>] ofd_trans_stop+0x48/0x90 [ofd] [ 1349.289746] [<0>] ofd_commitrw_write+0x9c4/0x1e68 [ofd] [ 1349.290825] [<0>] ofd_commitrw+0x454/0xa88 [ofd] [ 1349.291909] [<0>] tgt_brw_write+0x1654/0x2db8 [ptlrpc] [ 1349.293054] [<0>] tgt_handle_request0+0xd0/0x978 [ptlrpc] [ 1349.294228] [<0>] tgt_request_handle+0x7c0/0x1a38 [ptlrpc] [ 1349.295423] [<0>] ptlrpc_server_handle_request+0x3bc/0x11e8 [ptlrpc] [ 1349.296802] [<0>] ptlrpc_main+0xd28/0x15f0 [ptlrpc] [ 1349.297808] [<0>] kthread+0x130/0x138 [ 1349.298560] [<0>] ret_from_fork+0x10/0x18 [ 1349.299465] Pid: 5101, comm: ll_ost05_002 4.18.0-305.7.1.el8_lustre.aarch64 #1 SMP Mon Jul 19 08:24:26 UTC 2021 [ 1349.301552] Call Trace: [ 1349.302051] [<0>] __switch_to+0xbc/0x108 [ 1349.302934] [<0>] ldlm_completion_ast+0x778/0xdf8 [ptlrpc] [ 1349.304076] [<0>] ldlm_cli_enqueue_local+0x204/0xb68 [ptlrpc] [ 1349.305282] [<0>] tgt_extent_lock+0x108/0x2d0 [ptlrpc] [ 1349.306308] [<0>] ofd_lock_unlock_region+0x74/0x1e8 [ofd] [ 1349.307370] [<0>] ofd_get_info_hdl+0xd30/0x1378 [ofd] [ 1349.308439] [<0>] tgt_handle_request0+0xd0/0x978 [ptlrpc] [ 1349.309637] [<0>] tgt_request_handle+0x7c0/0x1a38 [ptlrpc] [ 1349.310786] [<0>] ptlrpc_server_handle_request+0x3bc/0x11e8 [ptlrpc] [ 1349.312097] [<0>] ptlrpc_main+0xd28/0x15f0 [ptlrpc] [ 1349.313068] [<0>] kthread+0x130/0x138 [ 1349.313788] [<0>] ret_from_fork+0x10/0x18 [ 1376.839778] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 770000 [ 1376.842204] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 36 previous similar messages |
| Comments |
| Comment by Xinliang Liu [ 06/Apr/22 ] |
|
Look into the source code and address that it goes into an infinite loop:
[606895.151765] LustreError: 334886:0:(ofd_io.c:1389:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 10000 [606895.207345] LustreError: 334886:0:(ofd_io.c:1389:ofd_commitrw_write()) Skipped 8 previous similar messages ofd_commitrw_write()-> osd_write_commit()->osd_ldiskfs_map_inode_pages()->ldiskfs_map_blocks()->ofd_commitrw_write() The related loop source code part is:
1219 ofd_commitrw_write(const struct lu_env *env, struct obd_export *exp, .... 1288 retry: ... 1342 1343 if (likely(!fake_write)) { ... 1346 rc = osd_write_commit(env, o, lnb, niocount, th, oa->o_size); 1347 if (rc) { 1348 restart = th->th_restart_tran; 1349 GOTO(out_unlock, rc); 1350 } 1351 } 1352 ..... 1384 if (restart) { 1385 retries++; 1386 restart = 0; 1387 if (retries % 10000 == 0) 1388 CERROR("%s: restart IO write too many times: %d\n", 1389 ofd_name(ofd), retries); 1390 CDEBUG(D_INODE, "retry transaction, retries:%d\n", 1391 retries); 1392 goto retry; 1393 } 1079 static int osd_ldiskfs_map_inode_pages(struct inode *inode, ... 1143 cont_map: ... 1150 if (handle && check_credits) { ... 1161 if (oh->oh_declared_ext <= 0) { 1162 rc = osd_ldiskfs_map_write(inode, 1163 iobuf, osd, start_blocks, 1164 count, &disk_size, user_size); 1165 if (rc) 1166 GOTO(cleanup, rc); 1167 thandle->th_restart_tran = 1; 1168 GOTO(cleanup, rc = -EAGAIN); ... 1175 } 1176 rc = ldiskfs_map_blocks(handle, inode, &map, create); ... 1224 if (rc == 0 && total < blen) { ... 1234 goto cont_map; 1235 } |
| Comment by Xinliang Liu [ 06/Apr/22 ] |
|
And the cause is in function ldiskfs_map_blocks(), that for PAGE_SIZE > block_size if a partial page allocation occurs first it will never get a full page(e.g. 64K PAGE_SIZE 16 blocks) allocated later for the page: 498 int ldiskfs_map_blocks(handle_t *handle, struct inode *inode, 499 struct ext4_map_blocks *map, int flags) 500 { ... 525 /* Lookup extent status tree firstly */ 526 if (ext4_es_lookup_extent(inode, map->m_lblk, &es)) { 527 if (ext4_es_is_written(&es) || ext4_es_is_unwritten(&es)) { 528 map->m_pblk = ext4_es_pblock(&es) + 529 map->m_lblk - es.es_lblk; 530 map->m_flags |= ext4_es_is_written(&es) ? 531 EXT4_MAP_MAPPED : EXT4_MAP_UNWRITTEN; 532 retval = es.es_len - (map->m_lblk - es.es_lblk); ... 550 goto found; 551 } 590 found: ... 608 if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) 614 if (!(flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN)) 615 return retval;
Below Lustre log and kernel trace logs is an example for page 1 partial block allocation first(a.k.a 14 of 16 blocks) and then ldiskfs_map_blocks for this page always keep 14 blocks allocated no way up to full page 16 blocks allocation. Because below log keep printing and no change any more. (osd_io.c:1107:osd_ldiskfs_map_inode_pages()) inode 109: map 2 pages from 1 (osd_io.c:1148:osd_ldiskfs_map_inode_pages()) pages[0, 1], clen=2: blen=32 (osd_io.c:507:osd_do_bio()) Process entered (osd_io.c:571:osd_do_bio()) Write page 1 blocks_left_page=14: i=0, page_offset=0, nblocks=14 (osd_io.c:430:osd_bio_init()) Process entered 6) | osd_ldiskfs_map_inode_pages [osd_ldiskfs]() { 6) 0.180 us | ldiskfs_inode_attach_jinode [ldiskfs](); 6) | ldiskfs_map_blocks [ldiskfs]() { 6) | ldiskfs_es_lookup_extent [ldiskfs]() { 6) | /* ldiskfs_es_lookup_extent_enter: dev 253,1 ino 109 lblk 16 */ 6) | /* ldiskfs_es_lookup_extent_exit: dev 253,1 ino 109 found 1 [0/30) 61112 W0x10 */ 6) 1.820 us | } 6) 0.440 us | ldiskfs_inode_block_valid [ldiskfs](); 6) 6.760 us | } 6) + 51.380 us | osd_ldiskfs_map_write.constprop.9 [osd_ldiskfs](); 6) + 75.360 us | }
|
| Comment by Gerrit Updater [ 06/Apr/22 ] |
|
"xinliang <xinliang.liu@linaro.org>" uploaded a new patch: https://review.whamcloud.com/47004 |
| Comment by Gerrit Updater [ 30/May/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47004/ |
| Comment by Gerrit Updater [ 08/Jun/22 ] |
|
"xinliang <xinliang.liu@linaro.org>" uploaded a new patch: https://review.whamcloud.com/47563 |
| Comment by Gerrit Updater [ 19/Aug/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47563/ |
| Comment by Peter Jones [ 19/Aug/23 ] |
|
Landed for 2.16 |
| Comment by Gerrit Updater [ 12/Jan/24 ] |
|
"xinliang <xinliang.liu@linaro.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53655 |