Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13358

sanityn test_43j: Timeout occurred after 348 mins, last suite running was sanityn

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Yang Sheng <ys@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/febc8b7c-b581-4673-ae0d-0acc45b22ca7

      test_43j failed with the following error:

      Timeout occurred after 348 mins, last suite running was sanityn
      
      ======================================================== 23:32:57 \(1583883177\)
      [17311.109132] Lustre: DEBUG MARKER: == sanityn test 43j: racy mkdir return EEXIST ======================================================== 23:32:57 (1583883177)
      [17311.329173] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17311.517750] LustreError: 29009:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 167 sleeping
      [17311.519410] LustreError: 29007:0:(libcfs_fail.h:174:cfs_race()) cfs_fail_race id 167 waking
      [17311.520591] LustreError: 29009:0:(libcfs_fail.h:172:cfs_race()) cfs_fail_race id 167 awake: rc=0
      [17311.917773] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17312.103328] LustreError: 29009:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 167 sleeping
      [17312.103439] LustreError: 29007:0:(libcfs_fail.h:174:cfs_race()) cfs_fail_race id 167 waking
      [17312.105603] LustreError: 29009:0:(libcfs_fail.h:172:cfs_race()) cfs_fail_race id 167 awake: rc=0
      [17312.286194] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17312.656299] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17313.023848] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17313.211668] LustreError: 29009:0:(libcfs_fail.h:174:cfs_race()) cfs_fail_race id 167 waking
      [17313.211671] LustreError: 29958:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 167 sleeping
      [17313.211701] LustreError: 29958:0:(libcfs_fail.h:169:cfs_race()) Skipped 2 previous similar messages
      [17313.215085] LustreError: 29009:0:(libcfs_fail.h:174:cfs_race()) Skipped 2 previous similar messages
      [17313.216256] LustreError: 29958:0:(libcfs_fail.h:172:cfs_race()) cfs_fail_race id 167 awake: rc=0
      [17313.217446] LustreError: 29958:0:(libcfs_fail.h:172:cfs_race()) Skipped 2 previous similar messages
      [17313.395036] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17313.769611] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17314.135975] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17314.505179] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17314.879328] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17315.250075] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17315.440261] LustreError: 29958:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 167 sleeping
      [17315.441404] LustreError: 29958:0:(libcfs_fail.h:169:cfs_race()) Skipped 5 previous similar messages
      [17315.442663] LustreError: 14962:0:(libcfs_fail.h:174:cfs_race()) cfs_fail_race id 167 waking
      [17315.443955] LustreError: 14962:0:(libcfs_fail.h:174:cfs_race()) Skipped 5 previous similar messages
      [17315.446280] LustreError: 29958:0:(libcfs_fail.h:172:cfs_race()) cfs_fail_race id 167 awake: rc=0
      [17315.447552] LustreError: 29958:0:(libcfs_fail.h:172:cfs_race()) Skipped 5 previous similar messages
      [17315.627940] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17315.999870] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17316.367260] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17316.737816] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17317.116644] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17317.488198] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17317.862823] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17318.247053] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17318.669421] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17319.048331] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17319.414480] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17319.602497] LustreError: 20329:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 167 sleeping
      [17319.602911] LustreError: 14963:0:(libcfs_fail.h:174:cfs_race()) cfs_fail_race id 167 waking
      [17319.602913] LustreError: 14963:0:(libcfs_fail.h:174:cfs_race()) Skipped 10 previous similar messages
      [17319.606183] LustreError: 20329:0:(libcfs_fail.h:169:cfs_race()) Skipped 10 previous similar messages
      [17319.607404] LustreError: 20329:0:(libcfs_fail.h:172:cfs_race()) cfs_fail_race id 167 awake: rc=0
      [17319.608561] LustreError: 20329:0:(libcfs_fail.h:172:cfs_race()) Skipped 10 previous similar messages
      [17319.789546] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17320.160652] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17320.533914] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17320.902690] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17321.271216] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17321.639185] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17322.010093] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17322.381671] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17322.752167] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17323.124004] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17323.488809] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17323.858050] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17324.229766] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17324.596745] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17324.966558] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17325.331734] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17325.698313] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17326.059736] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17326.422889] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17326.791051] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17327.158072] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17327.523903] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17327.714309] LustreError: 29009:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 167 sleeping
      [17327.715587] LustreError: 29009:0:(libcfs_fail.h:169:cfs_race()) Skipped 21 previous similar messages
      [17327.715632] LustreError: 29007:0:(libcfs_fail.h:174:cfs_race()) cfs_fail_race id 167 waking
      [17327.715634] LustreError: 29007:0:(libcfs_fail.h:174:cfs_race()) Skipped 21 previous similar messages
      [17327.719196] LustreError: 29009:0:(libcfs_fail.h:172:cfs_race()) cfs_fail_race id 167 awake: rc=0
      [17327.720335] LustreError: 29009:0:(libcfs_fail.h:172:cfs_race()) Skipped 21 previous similar messages
      [17327.899149] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17328.264351] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17328.632777] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17329.003813] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17329.371423] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17329.740128] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17330.109495] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17330.480179] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17330.851411] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17331.217019] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17331.584481] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17331.953570] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17332.319811] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17332.689562] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17333.071429] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17333.442313] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17333.813061] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17334.182470] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17334.548590] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17334.915354] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17335.281713] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17335.648787] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17336.016108] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17336.379247] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17336.748132] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17337.119766] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17337.486196] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17337.869241] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17338.237004] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17338.630641] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17338.997588] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17339.364136] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17339.730623] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17340.096813] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17340.462864] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17340.831538] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167
      [17403.223365] Lustre: mdt00_002: service thread pid 20331 was inactive for 62.204 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [17403.225741] Pid: 20331, comm: mdt00_002 3.10.0-1062.9.1.el7_lustre.x86_64 #1 SMP Wed Feb 12 09:50:45 UTC 2020
      [17403.226985] Call Trace:
      [17403.227524]  [<ffffffffc150b281>] mdt_create+0xf31/0x1350 [mdt]
      [17403.228564]  [<ffffffffc150b80b>] mdt_reint_create+0x16b/0x360 [mdt]
      [17403.229409]  [<ffffffffc1510d33>] mdt_reint_rec+0x83/0x210 [mdt]
      [17403.230297]  [<ffffffffc14ea970>] mdt_reint_internal+0x7b0/0xba0 [mdt]
      [17403.231291]  [<ffffffffc14f5fe7>] mdt_reint+0x67/0x140 [mdt]
      [17403.232119]  [<ffffffffc1168f9a>] tgt_request_handle+0x95a/0x1630 [ptlrpc]
      [17403.233926]  [<ffffffffc110e0e6>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
      [17403.234967]  [<ffffffffc1112574>] ptlrpc_main+0xbb4/0x1550 [ptlrpc]
      [17403.235890]  [<ffffffffb06c61f1>] kthread+0xd1/0xe0
      [17403.236704]  [<ffffffffb0d8dd37>] ret_from_fork_nospec_end+0x0/0x39
      [17403.237634]  [<ffffffffffffffff>] 0xffffffffffffffff
      [17936.088290] Lustre: 29006:0:(service.c:1440:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/5), not sending early reply  req@ffff8b1d15c90480 x1660819790875584/t0(0) o36->e483fb96-13aa-4@10.9.3.3@tcp:262/0 lens 512/448 e 23 to 0 dl 1583883807 ref 2 fl Interpret:/0/0 rc 0/0 job:'mkdir.0'
      [17942.007808] Lustre: lustre-MDT0000: Client e483fb96-13aa-4 (at 10.9.3.3@tcp) reconnecting
      [17942.009423] Lustre: lustre-MDT0000: Connection restored to 66d0d404-6645-4 (at 10.9.3.3@tcp)
      [17942.010579] Lustre: Skipped 2 previous similar messages
      [18542.999441] Lustre: lustre-MDT0000: Client e483fb96-13aa-4 (at 10.9.3.3@tcp) reconnecting
      [18543.000625] Lustre: lustre-MDT0000: Connection restored to 66d0d404-6645-4 (at 10.9.3.3@tcp)
      [19143.990549] Lustre: lustre-MDT0000: Client e483fb96-13aa-4 (at 10.9.3.3@tcp) reconnecting
      [19143.991784] Lustre: lustre-MDT0000: Connection restored to 66d0d404-6645-4 (at 10.9.3.3@tcp)
      [19744.981248] Lustre: lustre-MDT0000: Client e483fb96-13aa-4 (at 10.9.3.3@tcp) reconnecting
      [19744.982495] Lustre: lustre-MDT0000: Connection restored to 66d0d404-6645-4 (at 10.9.3.3@tcp)
      [20345.972751] Lustre: lustre-MDT0000: Client e483fb96-13aa-4 (at 10.9.3.3@tcp) reconnecting
      [20345.974154] Lustre: lustre-MDT0000: Connection restored to 66d0d404-6645-4 (at 10.9.3.3@tcp)
      

      Looks like OBD_RACE wakeup was lost cause mdt thread sleep forever.

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanityn test_43j - Timeout occurred after 348 mins, last suite running was sanityn

      Attachments

        Issue Links

          Activity

            People

              ys Yang Sheng
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: