Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.12.0
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for bobijam <bobijam@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/5c95f0b2-8186-11e8-b441-52540065bddc
test_115 failed with the following error:
Timeout occurred after 216 mins, last suite running was replay-single, restarting cluster to continue tests
MDS dmesg keeps showing following error messages during several tests, and the test takes too much time.
[ 2545.541360] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. [ 2545.571570] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.9.5.210@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. [ 2545.618732] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.9.5.212@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. [ 2545.618926] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.9.5.212@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. [ 2545.619112] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.9.5.212@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. ...
another hit also happens at https://testing.whamcloud.com/test_sets/08372d04-8188-11e8-97ff-52540065bddc
test_80c 'Timeout occurred after 159 mins, last suite running was replay-single, restarting cluster to continue tests'
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
replay-single test_115 - Timeout occurred after 216 mins, last suite running was replay-single, restarting cluster to continue tests
Attachments
Issue Links
- is duplicated by
-
LU-11126 replay-single test 89: import.c:1668:ptlrpc_disconnect_idle_interpret()) ASSERTION( imp->imp_state == LUSTRE_IMP_CONNECTING ) failed:
-
- Resolved
-
-
LU-11183 sanity test 244 hangs with no information in the logs
-
- Resolved
-
- is related to
-
LU-11362 sanity test_156: timeout loop in ptlrpc_check_set()
-
- Open
-
-
LU-11269 ptlrpc_set_add_req()) ASSERTION( req->rq_import->imp_state != LUSTRE_IMP_IDLE ) failed
-
- Resolved
-
-
LU-11405 add a test for idle connection feature
-
- Closed
-
- is related to
-
LU-7236 OST connect and disconnect on demand
-
- Resolved
-
- mentioned in
-
Page Loading...
Andreas, I'm fine to change the defaults and yes, one of the reason to rather have it short is to hit the code more frequently.
ping reply is not counted:
if (lustre_msg_get_opc(req->rq_reqmsg) != OBD_PING) req->rq_import->imp_last_reply_time = ktime_get_real_seconds();
then check for idle: