[LU-15722] IO write gets stuck on some sanityn test cases for 64K PAGE_SIZE Created: 06/Apr/22  Updated: 12/Jan/24  Resolved: 19/Aug/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Major
Reporter: Xinliang Liu Assignee: Xinliang Liu
Resolution: Fixed Votes: 0
Labels: arm, arm-server
Environment:

Arm CentOS 8 server and client all-in-one test environment.


Issue Links:
Related
is related to LU-10300 Can the Lustre 2.10.x clients support... Resolved
Severity: 2
Rank (Obsolete): 9223372036854775807

 Description   

IO write gets stuck on some test sanityn cases for 64K PAGE_SIZE, such as sanityn 16a, 16b, 71a etc.

 

Kernel logs:

[ 1308.972770] Lustre: DEBUG MARKER: == sanityn test 71a: correct file map just after write operation is finished ========================================================== 01:56:46 (1638755806)
[ 1309.958643] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 10000
[ 1310.704658] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 20000
[ 1312.183793] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 40000
[ 1312.186168] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 1 previous similar message
[ 1314.448308] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 70000
[ 1314.538440] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 2 previous similar messages
[ 1319.063438] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 120000
[ 1319.065847] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 4 previous similar messages
[ 1327.150991] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 210000
[ 1327.153390] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 8 previous similar messages
[ 1327.359397] Lustre: lustre-OST0001-osc-ffffcaf2f9e2e000: disconnect after 20s idle
[ 1332.399266] Lustre: lustre-OST0001-osc-ffffcaf2fb2cf000: disconnect after 23s idle
[ 1344.045735] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 400000
[ 1344.193155] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 18 previous similar messages
[ 1349.278936] Lustre: ll_ost05_002: service thread pid 5101 was inactive for 40.209 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[ 1349.278963] Pid: 5126, comm: ll_ost_io00_002 4.18.0-305.7.1.el8_lustre.aarch64 #1 SMP Mon Jul 19 08:24:26 UTC 2021
[ 1349.282879] Lustre: Skipped 1 previous similar message
[ 1349.285101] Call Trace:
[ 1349.286688] [<0>] __switch_to+0xbc/0x108
[ 1349.287534] [<0>] osd_trans_stop+0x66c/0xc58 [osd_ldiskfs]
[ 1349.288692] [<0>] ofd_trans_stop+0x48/0x90 [ofd]
[ 1349.289746] [<0>] ofd_commitrw_write+0x9c4/0x1e68 [ofd]
[ 1349.290825] [<0>] ofd_commitrw+0x454/0xa88 [ofd]
[ 1349.291909] [<0>] tgt_brw_write+0x1654/0x2db8 [ptlrpc]
[ 1349.293054] [<0>] tgt_handle_request0+0xd0/0x978 [ptlrpc]
[ 1349.294228] [<0>] tgt_request_handle+0x7c0/0x1a38 [ptlrpc]
[ 1349.295423] [<0>] ptlrpc_server_handle_request+0x3bc/0x11e8 [ptlrpc]
[ 1349.296802] [<0>] ptlrpc_main+0xd28/0x15f0 [ptlrpc]
[ 1349.297808] [<0>] kthread+0x130/0x138
[ 1349.298560] [<0>] ret_from_fork+0x10/0x18
[ 1349.299465] Pid: 5101, comm: ll_ost05_002 4.18.0-305.7.1.el8_lustre.aarch64 #1 SMP Mon Jul 19 08:24:26 UTC 2021
[ 1349.301552] Call Trace:
[ 1349.302051] [<0>] __switch_to+0xbc/0x108
[ 1349.302934] [<0>] ldlm_completion_ast+0x778/0xdf8 [ptlrpc]
[ 1349.304076] [<0>] ldlm_cli_enqueue_local+0x204/0xb68 [ptlrpc]
[ 1349.305282] [<0>] tgt_extent_lock+0x108/0x2d0 [ptlrpc]
[ 1349.306308] [<0>] ofd_lock_unlock_region+0x74/0x1e8 [ofd]
[ 1349.307370] [<0>] ofd_get_info_hdl+0xd30/0x1378 [ofd]
[ 1349.308439] [<0>] tgt_handle_request0+0xd0/0x978 [ptlrpc]
[ 1349.309637] [<0>] tgt_request_handle+0x7c0/0x1a38 [ptlrpc]
[ 1349.310786] [<0>] ptlrpc_server_handle_request+0x3bc/0x11e8 [ptlrpc]
[ 1349.312097] [<0>] ptlrpc_main+0xd28/0x15f0 [ptlrpc]
[ 1349.313068] [<0>] kthread+0x130/0x138
[ 1349.313788] [<0>] ret_from_fork+0x10/0x18
[ 1376.839778] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 770000
[ 1376.842204] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 36 previous similar messages 


 Comments   
Comment by Xinliang Liu [ 06/Apr/22 ]

Look into the source code and address that it goes into an infinite loop:

 

[606895.151765] LustreError: 334886:0:(ofd_io.c:1389:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 10000
[606895.207345] LustreError: 334886:0:(ofd_io.c:1389:ofd_commitrw_write()) Skipped 8 previous similar messages
ofd_commitrw_write()-> osd_write_commit()->osd_ldiskfs_map_inode_pages()->ldiskfs_map_blocks()->ofd_commitrw_write()

The related loop source code part is:

 

1219 ofd_commitrw_write(const struct lu_env *env, struct obd_export *exp,
....
1288 retry:
...
1342
1343         if (likely(!fake_write)) {
...
1346                 rc = osd_write_commit(env, o, lnb, niocount, th, oa->o_size);
1347                 if (rc) {
1348                         restart = th->th_restart_tran;
1349                         GOTO(out_unlock, rc);
1350                 }
1351         }
1352
.....
1384         if (restart) {
1385                 retries++;
1386                 restart = 0;
1387                 if (retries % 10000 == 0)
1388                         CERROR("%s: restart IO write too many times: %d\n",
1389                                 ofd_name(ofd), retries);
1390                 CDEBUG(D_INODE, "retry transaction, retries:%d\n",
1391                        retries);
1392                 goto retry;
1393         }

1079 static int osd_ldiskfs_map_inode_pages(struct inode *inode,
...
1143 cont_map:
...
1150                 if (handle && check_credits) {
...
1161                         if (oh->oh_declared_ext <= 0) {
1162                                 rc = osd_ldiskfs_map_write(inode,
1163                                         iobuf, osd, start_blocks,
1164                                         count, &disk_size, user_size);
1165                                 if (rc)
1166                                         GOTO(cleanup, rc);
1167                                 thandle->th_restart_tran = 1;
1168                                 GOTO(cleanup, rc = -EAGAIN);
...
1175                 }
1176                 rc = ldiskfs_map_blocks(handle, inode, &map, create);
...
1224                 if (rc == 0 && total < blen) {
...
1234                         goto cont_map;
1235                 }
Comment by Xinliang Liu [ 06/Apr/22 ]

And the cause is in function ldiskfs_map_blocks(), that for PAGE_SIZE > block_size if a partial page allocation occurs first it will never get a full page(e.g. 64K PAGE_SIZE 16 blocks) allocated later for the page:

 498 int ldiskfs_map_blocks(handle_t *handle, struct inode *inode,
 499                     struct ext4_map_blocks *map, int flags)
 500 {
 ...
 525         /* Lookup extent status tree firstly */
 526         if (ext4_es_lookup_extent(inode, map->m_lblk, &es)) {
 527                 if (ext4_es_is_written(&es) || ext4_es_is_unwritten(&es)) {
 528                         map->m_pblk = ext4_es_pblock(&es) +
 529                                         map->m_lblk - es.es_lblk;
 530                         map->m_flags |= ext4_es_is_written(&es) ?
 531                                         EXT4_MAP_MAPPED : EXT4_MAP_UNWRITTEN;
 532                         retval = es.es_len - (map->m_lblk - es.es_lblk);
...
 550                 goto found;
 551         }
590 found:
...
 608         if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED)
 614                 if (!(flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN))
 615                         return retval;

 

Below Lustre log and kernel trace logs is an example for page 1 partial block allocation first(a.k.a 14 of 16 blocks) and then ldiskfs_map_blocks for this page always keep 14 blocks allocated no way up to full page 16 blocks allocation. Because below log keep printing and no change any more.

(osd_io.c:1107:osd_ldiskfs_map_inode_pages()) inode 109: map 2 pages from 1
(osd_io.c:1148:osd_ldiskfs_map_inode_pages()) pages[0, 1], clen=2: blen=32
(osd_io.c:507:osd_do_bio()) Process entered
(osd_io.c:571:osd_do_bio()) Write page 1 blocks_left_page=14: i=0, page_offset=0, nblocks=14
(osd_io.c:430:osd_bio_init()) Process entered  6)               |  osd_ldiskfs_map_inode_pages [osd_ldiskfs]() {
  6)   0.180 us    |    ldiskfs_inode_attach_jinode [ldiskfs]();
  6)               |    ldiskfs_map_blocks [ldiskfs]() {
  6)               |      ldiskfs_es_lookup_extent [ldiskfs]() {
  6)               |        /* ldiskfs_es_lookup_extent_enter: dev 253,1 ino 109 lblk 16 */
  6)               |        /* ldiskfs_es_lookup_extent_exit: dev 253,1 ino 109 found 1 [0/30) 61112 W0x10 */
  6)   1.820 us    |      }
  6)   0.440 us    |      ldiskfs_inode_block_valid [ldiskfs]();
  6)   6.760 us    |    }
  6) + 51.380 us   |    osd_ldiskfs_map_write.constprop.9 [osd_ldiskfs]();
  6) + 75.360 us   |  } 

 

Comment by Gerrit Updater [ 06/Apr/22 ]

"xinliang <xinliang.liu@linaro.org>" uploaded a new patch: https://review.whamcloud.com/47004
Subject: LU-15722 osd-ldiskfs: fix IO write gets stuck for 64K PAGE_SIZE
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 40d76c4524e97862eb57b3821852b2383b2f1022

Comment by Gerrit Updater [ 30/May/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47004/
Subject: LU-15722 osd-ldiskfs: fix IO write gets stuck for 64K PAGE_SIZE
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 176ea3a4599ede8b1a0c91506dcd34bc162f2959

Comment by Gerrit Updater [ 08/Jun/22 ]

"xinliang <xinliang.liu@linaro.org>" uploaded a new patch: https://review.whamcloud.com/47563
Subject: LU-15722 osd-ldiskfs: fix write stuck for 64K PAGE_SIZE
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8fc89eaaffe26f407a635fef79131bb0cdb23470

Comment by Gerrit Updater [ 19/Aug/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47563/
Subject: LU-15722 osd-ldiskfs: fix write stuck for 64K PAGE_SIZE
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 48ccf2db8495344e1d0c39f8bed0969279e29db4

Comment by Peter Jones [ 19/Aug/23 ]

Landed for 2.16

Comment by Gerrit Updater [ 12/Jan/24 ]

"xinliang <xinliang.liu@linaro.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53655
Subject: LU-15722 osd-ldiskfs: fix write stuck for 64K PAGE_SIZE
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 0cf9f35f772b7aa3d967f82fdd426621648c933c

Generated at Sat Feb 10 03:20:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.