Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15722

IO write gets stuck on some sanityn test cases for 64K PAGE_SIZE

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • None
    • Arm CentOS 8 server and client all-in-one test environment.

    • 2
    • 9223372036854775807

    Description

      IO write gets stuck on some test sanityn cases for 64K PAGE_SIZE, such as sanityn 16a, 16b, 71a etc.

       

      Kernel logs:

      [ 1308.972770] Lustre: DEBUG MARKER: == sanityn test 71a: correct file map just after write operation is finished ========================================================== 01:56:46 (1638755806)
      [ 1309.958643] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 10000
      [ 1310.704658] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 20000
      [ 1312.183793] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 40000
      [ 1312.186168] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 1 previous similar message
      [ 1314.448308] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 70000
      [ 1314.538440] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 2 previous similar messages
      [ 1319.063438] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 120000
      [ 1319.065847] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 4 previous similar messages
      [ 1327.150991] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 210000
      [ 1327.153390] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 8 previous similar messages
      [ 1327.359397] Lustre: lustre-OST0001-osc-ffffcaf2f9e2e000: disconnect after 20s idle
      [ 1332.399266] Lustre: lustre-OST0001-osc-ffffcaf2fb2cf000: disconnect after 23s idle
      [ 1344.045735] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 400000
      [ 1344.193155] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 18 previous similar messages
      [ 1349.278936] Lustre: ll_ost05_002: service thread pid 5101 was inactive for 40.209 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [ 1349.278963] Pid: 5126, comm: ll_ost_io00_002 4.18.0-305.7.1.el8_lustre.aarch64 #1 SMP Mon Jul 19 08:24:26 UTC 2021
      [ 1349.282879] Lustre: Skipped 1 previous similar message
      [ 1349.285101] Call Trace:
      [ 1349.286688] [<0>] __switch_to+0xbc/0x108
      [ 1349.287534] [<0>] osd_trans_stop+0x66c/0xc58 [osd_ldiskfs]
      [ 1349.288692] [<0>] ofd_trans_stop+0x48/0x90 [ofd]
      [ 1349.289746] [<0>] ofd_commitrw_write+0x9c4/0x1e68 [ofd]
      [ 1349.290825] [<0>] ofd_commitrw+0x454/0xa88 [ofd]
      [ 1349.291909] [<0>] tgt_brw_write+0x1654/0x2db8 [ptlrpc]
      [ 1349.293054] [<0>] tgt_handle_request0+0xd0/0x978 [ptlrpc]
      [ 1349.294228] [<0>] tgt_request_handle+0x7c0/0x1a38 [ptlrpc]
      [ 1349.295423] [<0>] ptlrpc_server_handle_request+0x3bc/0x11e8 [ptlrpc]
      [ 1349.296802] [<0>] ptlrpc_main+0xd28/0x15f0 [ptlrpc]
      [ 1349.297808] [<0>] kthread+0x130/0x138
      [ 1349.298560] [<0>] ret_from_fork+0x10/0x18
      [ 1349.299465] Pid: 5101, comm: ll_ost05_002 4.18.0-305.7.1.el8_lustre.aarch64 #1 SMP Mon Jul 19 08:24:26 UTC 2021
      [ 1349.301552] Call Trace:
      [ 1349.302051] [<0>] __switch_to+0xbc/0x108
      [ 1349.302934] [<0>] ldlm_completion_ast+0x778/0xdf8 [ptlrpc]
      [ 1349.304076] [<0>] ldlm_cli_enqueue_local+0x204/0xb68 [ptlrpc]
      [ 1349.305282] [<0>] tgt_extent_lock+0x108/0x2d0 [ptlrpc]
      [ 1349.306308] [<0>] ofd_lock_unlock_region+0x74/0x1e8 [ofd]
      [ 1349.307370] [<0>] ofd_get_info_hdl+0xd30/0x1378 [ofd]
      [ 1349.308439] [<0>] tgt_handle_request0+0xd0/0x978 [ptlrpc]
      [ 1349.309637] [<0>] tgt_request_handle+0x7c0/0x1a38 [ptlrpc]
      [ 1349.310786] [<0>] ptlrpc_server_handle_request+0x3bc/0x11e8 [ptlrpc]
      [ 1349.312097] [<0>] ptlrpc_main+0xd28/0x15f0 [ptlrpc]
      [ 1349.313068] [<0>] kthread+0x130/0x138
      [ 1349.313788] [<0>] ret_from_fork+0x10/0x18
      [ 1376.839778] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) lustre-OST0000: restart IO write too many times: 770000
      [ 1376.842204] LustreError: 5126:0:(ofd_io.c:1401:ofd_commitrw_write()) Skipped 36 previous similar messages 

      Attachments

        Issue Links

          Activity

            People

              xinliang Xinliang Liu
              xinliang Xinliang Liu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: