Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1688

recovery-small: test_58 failed with 1

Details

    • 3
    • 4423

    Description

      == recovery-small test 58: Eviction in the middle of open RPC reply processing ======================= 14:48:37 (1343339317)
      rw-rr- 1 root root 0 Jul 26 14:48 /mnt/nbp0-1/f58
      fail_loc=0x80000801
      fail_loc=0
      fail_loc=0x305
      fail_loc=0
      df: `/mnt/nbp0-1': Interrupted system call
      df: no file systems processed
      recovery-small test_58: @@@@@@ FAIL: test_58 failed with 1

      Attached two files:
      recovery-small.test_58.tgz - tarball of the test_log files
      recovery-small.test_58.test_log.service331.log.dbg: output of the test with shell debugging of "set -x". The log showed the test passed, but it was a flase positive. The 'df' failed with 1, yet a subsequent "set +x" set the return value to 0, thus gave a false positive.

      Attachments

        Activity

          [LU-1688] recovery-small: test_58 failed with 1
          pjones Peter Jones added a comment -

          Landed for 2.1.4 and 2.4

          pjones Peter Jones added a comment - Landed for 2.1.4 and 2.4

          Can we complete the review and land the patch? Thanks!

          jaylan Jay Lan (Inactive) added a comment - Can we complete the review and land the patch? Thanks!

          The new patch looks good to me, and the test passed. Thanks!

          jaylan Jay Lan (Inactive) added a comment - The new patch looks good to me, and the test passed. Thanks!

          Hi Jay, the patch is updated as per your advice, thanks a lot!

          hongchao.zhang Hongchao Zhang added a comment - Hi Jay, the patch is updated as per your advice, thanks a lot!

          It would be nice if you can check the status of the first 'df' command, and perform the second 'df' only if the first returns failure.

          jaylan Jay Lan (Inactive) added a comment - It would be nice if you can check the status of the first 'df' command, and perform the second 'df' only if the first returns failure.

          == recovery-small test 58: Eviction in the middle of open RPC reply processing ======================= 11:54:50 (1343847290)
          rw-rr- 1 root root 0 Aug 1 11:54 /mnt/nbp0-1/f58
          fail_loc=0x80000801
          fail_loc=0
          fail_loc=0x305
          fail_loc=0
          df: `/mnt/nbp0-1': Interrupted system call
          df: no file systems processed
          Filesystem 1K-blocks Used Available Use% Mounted on
          service337@o2ib:/lustre
          3937056 209208 3527720 6% /mnt/nbp0-1
          Resetting fail_loc on all nodes...done.
          PASS 58 (40s)

          From the above log, you can see the first 'df' failed and the second 'df' passed with 'df' output!

          jaylan Jay Lan (Inactive) added a comment - == recovery-small test 58: Eviction in the middle of open RPC reply processing ======================= 11:54:50 (1343847290) rw-r r - 1 root root 0 Aug 1 11:54 /mnt/nbp0-1/f58 fail_loc=0x80000801 fail_loc=0 fail_loc=0x305 fail_loc=0 df: `/mnt/nbp0-1': Interrupted system call df: no file systems processed Filesystem 1K-blocks Used Available Use% Mounted on service337@o2ib:/lustre 3937056 209208 3527720 6% /mnt/nbp0-1 Resetting fail_loc on all nodes...done. PASS 58 (40s) From the above log, you can see the first 'df' failed and the second 'df' passed with 'df' output!

          the possible patch is tracked at http://review.whamcloud.com/#change,3506.

          Hi Jay, Is this issue reproducible, and if so, could you please help to test with the patch?

          hongchao.zhang Hongchao Zhang added a comment - the possible patch is tracked at http://review.whamcloud.com/#change,3506 . Hi Jay, Is this issue reproducible, and if so, could you please help to test with the patch?

          the eviction of this client is just caused by the revalidate request on the root inode of "df", then this issue is triggered.
          and this bug should be fixed by waiting some time before calling "df" or doing something else to trigger the evcition/recovery.

          hongchao.zhang Hongchao Zhang added a comment - the eviction of this client is just caused by the revalidate request on the root inode of "df", then this issue is triggered. and this bug should be fixed by waiting some time before calling "df" or doing something else to trigger the evcition/recovery.
          pjones Peter Jones added a comment -

          Hongchao

          Could you please look into this one?

          Thanks

          Peter

          pjones Peter Jones added a comment - Hongchao Could you please look into this one? Thanks Peter

          People

            hongchao.zhang Hongchao Zhang
            jaylan Jay Lan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: