Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2266

recovery-small test 27 waits for wrong condition

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.1.4, Lustre 2.4.1
    • 3
    • 5420

    Description

      Long ago in patch for bug 23542 to make test 27 time-bound, an error was made that disables the test most of the time and potentially introduces unknown side effects for further tests:

      @@ -725,12 +725,8 @@ test_27() {
       #define OBD_FAIL_OSC_SHUTDOWN            0x407
              do_facet $SINGLEMDS lctl set_param fail_loc=0x80000407
              # need to wait for reconnect
      -       echo -n waiting for fail_loc
      -       while [ $(do_facet $SINGLEMDS lctl get_param -n fail_loc) -eq -214748261
      -           sleep 1
      -           echo -n .
      -       done
      -       do_facet $SINGLEMDS lctl get_param -n fail_loc
      +       echo waiting for fail_loc
      +       wait_update_facet $SINGLEMDS "lctl get_param -n fail_loc" "-2147482617"
      

      clearly the wait should be for 3221226503 which is 0xc0000407 (= 0x80000407 + 0x40000000(CFS_FAILED - when the test triggered).

      I found this after a bizarre failure of test 27 like this:

      14:53:22 (1351623202) network interface is UP
      Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/mds1
      Started lustre-MDT0000
      fail_loc=0x80000407
      waiting for fail_loc
      Waiting 90 secs for update
      Waiting 80 secs for update
      Waiting 70 secs for update
      Waiting 60 secs for update
      Waiting 50 secs for update
      Waiting 40 secs for update
      Waiting 30 secs for update
      Waiting 20 secs for update
      Waiting 10 secs for update
      Update not seen after 90s: wanted '-2147482617' got '3221226503'
      

      Attachments

        Issue Links

          Activity

            [LU-2266] recovery-small test 27 waits for wrong condition
            adilger Andreas Dilger made changes -
            Labels Original: mq313 New: patch
            jlevi Jodi Levi (Inactive) made changes -
            Link New: This issue duplicates LU-5965 [ LU-5965 ]
            jlevi Jodi Levi (Inactive) made changes -
            Affects Version/s New: Lustre 2.4.1 [ 10294 ]
            Affects Version/s Original: Lustre 2.4.0 [ 10154 ]
            pjones Peter Jones made changes -
            Labels Original: LB New: mq313
            pjones Peter Jones made changes -
            Priority Original: Blocker [ 1 ] New: Minor [ 4 ]
            jlevi Jodi Levi (Inactive) made changes -
            Labels New: LB
            adilger Andreas Dilger made changes -
            Assignee Original: WC Triage [ wc-triage ] New: Oleg Drokin [ green ]
            adilger Andreas Dilger made changes -
            Priority Original: Major [ 3 ] New: Blocker [ 1 ]
            green Oleg Drokin created issue -

            People

              green Oleg Drokin
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: