Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.1.4, Lustre 2.4.1
-
3
-
5420
Description
Long ago in patch for bug 23542 to make test 27 time-bound, an error was made that disables the test most of the time and potentially introduces unknown side effects for further tests:
@@ -725,12 +725,8 @@ test_27() { #define OBD_FAIL_OSC_SHUTDOWN 0x407 do_facet $SINGLEMDS lctl set_param fail_loc=0x80000407 # need to wait for reconnect - echo -n waiting for fail_loc - while [ $(do_facet $SINGLEMDS lctl get_param -n fail_loc) -eq -214748261 - sleep 1 - echo -n . - done - do_facet $SINGLEMDS lctl get_param -n fail_loc + echo waiting for fail_loc + wait_update_facet $SINGLEMDS "lctl get_param -n fail_loc" "-2147482617"
clearly the wait should be for 3221226503 which is 0xc0000407 (= 0x80000407 + 0x40000000(CFS_FAILED - when the test triggered).
I found this after a bizarre failure of test 27 like this:
14:53:22 (1351623202) network interface is UP Starting mds1: -o loop /tmp/lustre-mdt1 /mnt/mds1 Started lustre-MDT0000 fail_loc=0x80000407 waiting for fail_loc Waiting 90 secs for update Waiting 80 secs for update Waiting 70 secs for update Waiting 60 secs for update Waiting 50 secs for update Waiting 40 secs for update Waiting 30 secs for update Waiting 20 secs for update Waiting 10 secs for update Update not seen after 90s: wanted '-2147482617' got '3221226503'
Attachments
Issue Links
- duplicates
-
LU-5965 recovery-small 27 looks works incorrectly
-
- Resolved
-
Ok, looking at the test and a bit of history of the bug (bz5949) I must admit I don't fully understand what's going on, but I know how to replicate what's needed