Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Yang Sheng <ys@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/febc8b7c-b581-4673-ae0d-0acc45b22ca7
test_43j failed with the following error:
Timeout occurred after 348 mins, last suite running was sanityn
======================================================== 23:32:57 \(1583883177\) [17311.109132] Lustre: DEBUG MARKER: == sanityn test 43j: racy mkdir return EEXIST ======================================================== 23:32:57 (1583883177) [17311.329173] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17311.517750] LustreError: 29009:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 167 sleeping [17311.519410] LustreError: 29007:0:(libcfs_fail.h:174:cfs_race()) cfs_fail_race id 167 waking [17311.520591] LustreError: 29009:0:(libcfs_fail.h:172:cfs_race()) cfs_fail_race id 167 awake: rc=0 [17311.917773] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17312.103328] LustreError: 29009:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 167 sleeping [17312.103439] LustreError: 29007:0:(libcfs_fail.h:174:cfs_race()) cfs_fail_race id 167 waking [17312.105603] LustreError: 29009:0:(libcfs_fail.h:172:cfs_race()) cfs_fail_race id 167 awake: rc=0 [17312.286194] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17312.656299] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17313.023848] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17313.211668] LustreError: 29009:0:(libcfs_fail.h:174:cfs_race()) cfs_fail_race id 167 waking [17313.211671] LustreError: 29958:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 167 sleeping [17313.211701] LustreError: 29958:0:(libcfs_fail.h:169:cfs_race()) Skipped 2 previous similar messages [17313.215085] LustreError: 29009:0:(libcfs_fail.h:174:cfs_race()) Skipped 2 previous similar messages [17313.216256] LustreError: 29958:0:(libcfs_fail.h:172:cfs_race()) cfs_fail_race id 167 awake: rc=0 [17313.217446] LustreError: 29958:0:(libcfs_fail.h:172:cfs_race()) Skipped 2 previous similar messages [17313.395036] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17313.769611] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17314.135975] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17314.505179] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17314.879328] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17315.250075] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17315.440261] LustreError: 29958:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 167 sleeping [17315.441404] LustreError: 29958:0:(libcfs_fail.h:169:cfs_race()) Skipped 5 previous similar messages [17315.442663] LustreError: 14962:0:(libcfs_fail.h:174:cfs_race()) cfs_fail_race id 167 waking [17315.443955] LustreError: 14962:0:(libcfs_fail.h:174:cfs_race()) Skipped 5 previous similar messages [17315.446280] LustreError: 29958:0:(libcfs_fail.h:172:cfs_race()) cfs_fail_race id 167 awake: rc=0 [17315.447552] LustreError: 29958:0:(libcfs_fail.h:172:cfs_race()) Skipped 5 previous similar messages [17315.627940] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17315.999870] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17316.367260] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17316.737816] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17317.116644] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17317.488198] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17317.862823] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17318.247053] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17318.669421] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17319.048331] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17319.414480] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17319.602497] LustreError: 20329:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 167 sleeping [17319.602911] LustreError: 14963:0:(libcfs_fail.h:174:cfs_race()) cfs_fail_race id 167 waking [17319.602913] LustreError: 14963:0:(libcfs_fail.h:174:cfs_race()) Skipped 10 previous similar messages [17319.606183] LustreError: 20329:0:(libcfs_fail.h:169:cfs_race()) Skipped 10 previous similar messages [17319.607404] LustreError: 20329:0:(libcfs_fail.h:172:cfs_race()) cfs_fail_race id 167 awake: rc=0 [17319.608561] LustreError: 20329:0:(libcfs_fail.h:172:cfs_race()) Skipped 10 previous similar messages [17319.789546] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17320.160652] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17320.533914] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17320.902690] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17321.271216] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17321.639185] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17322.010093] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17322.381671] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17322.752167] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17323.124004] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17323.488809] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17323.858050] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17324.229766] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17324.596745] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17324.966558] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17325.331734] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17325.698313] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17326.059736] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17326.422889] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17326.791051] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17327.158072] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17327.523903] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17327.714309] LustreError: 29009:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 167 sleeping [17327.715587] LustreError: 29009:0:(libcfs_fail.h:169:cfs_race()) Skipped 21 previous similar messages [17327.715632] LustreError: 29007:0:(libcfs_fail.h:174:cfs_race()) cfs_fail_race id 167 waking [17327.715634] LustreError: 29007:0:(libcfs_fail.h:174:cfs_race()) Skipped 21 previous similar messages [17327.719196] LustreError: 29009:0:(libcfs_fail.h:172:cfs_race()) cfs_fail_race id 167 awake: rc=0 [17327.720335] LustreError: 29009:0:(libcfs_fail.h:172:cfs_race()) Skipped 21 previous similar messages [17327.899149] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17328.264351] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17328.632777] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17329.003813] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17329.371423] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17329.740128] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17330.109495] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17330.480179] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17330.851411] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17331.217019] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17331.584481] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17331.953570] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17332.319811] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17332.689562] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17333.071429] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17333.442313] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17333.813061] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17334.182470] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17334.548590] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17334.915354] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17335.281713] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17335.648787] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17336.016108] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17336.379247] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17336.748132] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17337.119766] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17337.486196] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17337.869241] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17338.237004] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17338.630641] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17338.997588] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17339.364136] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17339.730623] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17340.096813] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17340.462864] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17340.831538] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000167 [17403.223365] Lustre: mdt00_002: service thread pid 20331 was inactive for 62.204 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [17403.225741] Pid: 20331, comm: mdt00_002 3.10.0-1062.9.1.el7_lustre.x86_64 #1 SMP Wed Feb 12 09:50:45 UTC 2020 [17403.226985] Call Trace: [17403.227524] [<ffffffffc150b281>] mdt_create+0xf31/0x1350 [mdt] [17403.228564] [<ffffffffc150b80b>] mdt_reint_create+0x16b/0x360 [mdt] [17403.229409] [<ffffffffc1510d33>] mdt_reint_rec+0x83/0x210 [mdt] [17403.230297] [<ffffffffc14ea970>] mdt_reint_internal+0x7b0/0xba0 [mdt] [17403.231291] [<ffffffffc14f5fe7>] mdt_reint+0x67/0x140 [mdt] [17403.232119] [<ffffffffc1168f9a>] tgt_request_handle+0x95a/0x1630 [ptlrpc] [17403.233926] [<ffffffffc110e0e6>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [17403.234967] [<ffffffffc1112574>] ptlrpc_main+0xbb4/0x1550 [ptlrpc] [17403.235890] [<ffffffffb06c61f1>] kthread+0xd1/0xe0 [17403.236704] [<ffffffffb0d8dd37>] ret_from_fork_nospec_end+0x0/0x39 [17403.237634] [<ffffffffffffffff>] 0xffffffffffffffff [17936.088290] Lustre: 29006:0:(service.c:1440:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/5), not sending early reply req@ffff8b1d15c90480 x1660819790875584/t0(0) o36->e483fb96-13aa-4@10.9.3.3@tcp:262/0 lens 512/448 e 23 to 0 dl 1583883807 ref 2 fl Interpret:/0/0 rc 0/0 job:'mkdir.0' [17942.007808] Lustre: lustre-MDT0000: Client e483fb96-13aa-4 (at 10.9.3.3@tcp) reconnecting [17942.009423] Lustre: lustre-MDT0000: Connection restored to 66d0d404-6645-4 (at 10.9.3.3@tcp) [17942.010579] Lustre: Skipped 2 previous similar messages [18542.999441] Lustre: lustre-MDT0000: Client e483fb96-13aa-4 (at 10.9.3.3@tcp) reconnecting [18543.000625] Lustre: lustre-MDT0000: Connection restored to 66d0d404-6645-4 (at 10.9.3.3@tcp) [19143.990549] Lustre: lustre-MDT0000: Client e483fb96-13aa-4 (at 10.9.3.3@tcp) reconnecting [19143.991784] Lustre: lustre-MDT0000: Connection restored to 66d0d404-6645-4 (at 10.9.3.3@tcp) [19744.981248] Lustre: lustre-MDT0000: Client e483fb96-13aa-4 (at 10.9.3.3@tcp) reconnecting [19744.982495] Lustre: lustre-MDT0000: Connection restored to 66d0d404-6645-4 (at 10.9.3.3@tcp) [20345.972751] Lustre: lustre-MDT0000: Client e483fb96-13aa-4 (at 10.9.3.3@tcp) reconnecting [20345.974154] Lustre: lustre-MDT0000: Connection restored to 66d0d404-6645-4 (at 10.9.3.3@tcp)
Looks like OBD_RACE wakeup was lost cause mdt thread sleep forever.
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanityn test_43j - Timeout occurred after 348 mins, last suite running was sanityn