Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.6.0, Lustre 2.7.0
-
3
-
15676
Description
It looks to me that lots of FAIL_ID checking are lost from time to time, take the replay-single.sh as an example:
- test_73c() checked OBD_FAIL_TGT_LAST_REPLAY, but this FAIL_ID is never being checked in Lustre code from the day one it was introduced.
- test_73b() checked OBD_FAIL_LDLM_REPLY, but this FAIL_ID is now only checked in mdt_reint_open(), I think it should be checked for every lock enqueue as well.
- test_73a() checked OBD_FAIL_LDLM_ENQUEUE_NET, but this FAIL_ID is not being checked in Lustre code anymore.
- test_80c() checked OBD_FAIL_UPDATE_OBJ_NET_REP, but this FAIL_ID has been removed from Lustre code.
- test_83a() checked OBD_FAIL_MDS_FAIL_LOV_LOG_ADD, but this FAIL_ID isn't checked in Lustre code.
...
To make sure the error injection test working as expected, I think we'd go through all the fail IDs, and add back all the missed fail_id checking. If some FAIL_ID is obsolete already, we'd remove or improve the corresponding test case.