Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Arm CentOS 8 server and client all-in-one test environment.
-
2
-
9223372036854775807
Description
IO write gets stuck on some test sanityn cases for 64K PAGE_SIZE, such as sanityn 16a, 16b, 71a etc.
Kernel logs:
[ 1308.972770] Lustre: DEBUG MARKER: == sanityn test 71a: correct file map just after write operation is finished ========================================================== 01:56:46 (1638755806) [ 1309.958643] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 10000 [ 1310.704658] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 20000 [ 1312.183793] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 40000 [ 1312.186168] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 1 previous similar message [ 1314.448308] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 70000 [ 1314.538440] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 2 previous similar messages [ 1319.063438] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 120000 [ 1319.065847] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 4 previous similar messages [ 1327.150991] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 210000 [ 1327.153390] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 8 previous similar messages [ 1327.359397] Lustre: lustre-OST0001-osc-ffffcaf2f9e2e000: disconnect after 20s idle [ 1332.399266] Lustre: lustre-OST0001-osc-ffffcaf2fb2cf000: disconnect after 23s idle [ 1344.045735] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 400000 [ 1344.193155] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 18 previous similar messages [ 1349.278936] Lustre: ll_ost05_002: service thread pid 5101 was inactive for 40.209 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 1349.278963] Pid: 5126, comm: ll_ost_io00_002 4.18.0-305.7.1.el8_lustre.aarch64 #1 SMP Mon Jul 19 08:24:26 UTC 2021 [ 1349.282879] Lustre: Skipped 1 previous similar message [ 1349.285101] Call Trace: [ 1349.286688] [<0>] __switch_to+0xbc/0x108 [ 1349.287534] [<0>] osd_trans_stop+0x66c/0xc58 [osd_ldiskfs] [ 1349.288692] [<0>] ofd_trans_stop+0x48/0x90 [ofd] [ 1349.289746] [<0>] ofd_commitrw_write+0x9c4/0x1e68 [ofd] [ 1349.290825] [<0>] ofd_commitrw+0x454/0xa88 [ofd] [ 1349.291909] [<0>] tgt_brw_write+0x1654/0x2db8 [ptlrpc] [ 1349.293054] [<0>] tgt_handle_request0+0xd0/0x978 [ptlrpc] [ 1349.294228] [<0>] tgt_request_handle+0x7c0/0x1a38 [ptlrpc] [ 1349.295423] [<0>] ptlrpc_server_handle_request+0x3bc/0x11e8 [ptlrpc] [ 1349.296802] [<0>] ptlrpc_main+0xd28/0x15f0 [ptlrpc] [ 1349.297808] [<0>] kthread+0x130/0x138 [ 1349.298560] [<0>] ret_from_fork+0x10/0x18 [ 1349.299465] Pid: 5101, comm: ll_ost05_002 4.18.0-305.7.1.el8_lustre.aarch64 #1 SMP Mon Jul 19 08:24:26 UTC 2021 [ 1349.301552] Call Trace: [ 1349.302051] [<0>] __switch_to+0xbc/0x108 [ 1349.302934] [<0>] ldlm_completion_ast+0x778/0xdf8 [ptlrpc] [ 1349.304076] [<0>] ldlm_cli_enqueue_local+0x204/0xb68 [ptlrpc] [ 1349.305282] [<0>] tgt_extent_lock+0x108/0x2d0 [ptlrpc] [ 1349.306308] [<0>] ofd_lock_unlock_region+0x74/0x1e8 [ofd] [ 1349.307370] [<0>] ofd_get_info_hdl+0xd30/0x1378 [ofd] [ 1349.308439] [<0>] tgt_handle_request0+0xd0/0x978 [ptlrpc] [ 1349.309637] [<0>] tgt_request_handle+0x7c0/0x1a38 [ptlrpc] [ 1349.310786] [<0>] ptlrpc_server_handle_request+0x3bc/0x11e8 [ptlrpc] [ 1349.312097] [<0>] ptlrpc_main+0xd28/0x15f0 [ptlrpc] [ 1349.313068] [<0>] kthread+0x130/0x138 [ 1349.313788] [<0>] ret_from_fork+0x10/0x18 [ 1376.839778] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 770000 [ 1376.842204] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 36 previous similar messages
Attachments
Issue Links
- is related to
-
LU-10300 Can the Lustre 2.10.x clients support 64K kernel page?
- Resolved