Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3701

Failure on test suite posix subtest test_1: fcntl.18/fcntl.35 Unresolved

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.5.0, Lustre 2.4.2
    • Lustre 2.5.0
    • server and client: lustre-master build: 1952
    • 3
    • 9549

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/261745ba-fb5b-11e2-8c6e-52540035b04c.

      The sub-test test_1 failed with the following error:

      Run POSIX testsuite on /mnt/lustre failed

      test log

      SUCCESS SUMMARY:
      
      News POSIX successes: 1
      
      Test Name                   Baseline   Lustre Report
      read.15                       Failed       Succeeded
      
      
      FAILURE SUMMARY:
      
      POSIX failures: 2
      
      Test Name                   Baseline   Lustre Report
      fcntl.18                   Succeeded      Unresolved
      fcntl.35                   Succeeded      Unresolved
      
      FAILURE DESCRIPTIONS:
      
      ####################################################
      Test Name: fcntl.18 Unresolved
      
      	Test Description:
      For the XNFS specification:
          If the implementation supports file locking for files residing on
          a remote file system: On a call to fcntl(fildes, F_SETLKW, arg)
          when the lock specified by arg can not be set, waits until the
          lock can be set.
      For the XSH specification:
          On a call to fcntl(fildes, F_SETLKW, arg) when the lock specified
          by arg can not be set, waits until the lock can be set.
          Posix Ref: Component FCNTL Assertion 6.5.2.2-23(A)
      
      	Test Information:
      deletion reason: External error - waitsync failed
      deletion reason: External error - waitsync failed
      
      ####################################################
      Test Name: fcntl.35 Unresolved
      
      	Test Description:
      For the XNFS specification:
          If the implementation supports file locking for files residing on
          a remote file system: EINTR in errno and -1 returned by fcntl() if
          the operation is interrupted by a signal.
      For the XSH specification:
          EINTR in errno and -1 returned by fcntl() if the operation is
          interrupted by a signal.
          Posix Ref: Component FCNTL Assertion 6.5.2.4-40(A)
      
      	Test Information:
      child process timed out
      

      Attachments

        Issue Links

          Activity

            [LU-3701] Failure on test suite posix subtest test_1: fcntl.18/fcntl.35 Unresolved
            pjones Peter Jones added a comment -

            Bruno will look into this

            pjones Peter Jones added a comment - Bruno will look into this

            Hi,

            If the patch http://review.whamcloud.com/6415 from LU-2665 introduces a regression, we also need to find a solution for 2.1 and 2.4.
            As I mentioned in LU-2665, the b2_1 patch will be rolled out at CEA in a couple of weeks!

            Thanks,
            Sebastien.

            sebastien.buisson Sebastien Buisson (Inactive) added a comment - Hi, If the patch http://review.whamcloud.com/6415 from LU-2665 introduces a regression, we also need to find a solution for 2.1 and 2.4. As I mentioned in LU-2665 , the b2_1 patch will be rolled out at CEA in a couple of weeks! Thanks, Sebastien.
            pjones Peter Jones added a comment -

            Reverted LU-2665 from b2_4 to avoid this problem but we still need to find a solution for 2.5

            pjones Peter Jones added a comment - Reverted LU-2665 from b2_4 to avoid this problem but we still need to find a solution for 2.5
            pjones Peter Jones added a comment -

            Oleg

            What do you suggest here?

            Peter

            pjones Peter Jones added a comment - Oleg What do you suggest here? Peter
            yujian Jian Yu added a comment -

            Hi Oleg,

            LU-2665 mdc: Keep resend FLocks (http://review.whamcloud.com/6415) caused the above regression.

            On master branch, posix test passed on build #1560. However, build #1561 and #1562 were not tested. The test failed on build #1563. Here are the patches in those builds:

            Build #1561: LU-2665 mdc: Keep resend FLocks
            Build #1562: LU-3568 contrib: ignore initial comments
            Build #1563: LU-3478 iokit: fix sgpdd-survey scripts (output and plotting)

            Only "LU-2665 mdc: Keep resend FLocks" is in Lustre b2_4 build #28, so that's the culprit.

            yujian Jian Yu added a comment - Hi Oleg, LU-2665 mdc: Keep resend FLocks ( http://review.whamcloud.com/6415 ) caused the above regression. On master branch, posix test passed on build #1560. However, build #1561 and #1562 were not tested. The test failed on build #1563. Here are the patches in those builds: Build #1561: LU-2665 mdc: Keep resend FLocks Build #1562: LU-3568 contrib: ignore initial comments Build #1563: LU-3478 iokit: fix sgpdd-survey scripts (output and plotting) Only " LU-2665 mdc: Keep resend FLocks" is in Lustre b2_4 build #28, so that's the culprit.
            yujian Jian Yu added a comment - Hi Oleg, On Lustre b2_4 branch, this is a regression issue introduced by the patch in build http://build.whamcloud.com/job/lustre-b2_4/28/ : https://maloo.whamcloud.com/test_sets/23814618-02b1-11e3-a4b4-52540035b04c https://maloo.whamcloud.com/test_sets/4854f044-0283-11e3-a4b4-52540035b04c https://maloo.whamcloud.com/test_sets/c5113ad6-0286-11e3-b384-52540035b04c https://maloo.whamcloud.com/test_sets/6dd63b10-0261-11e3-a4b4-52540035b04c https://maloo.whamcloud.com/test_sets/4493f2a0-0249-11e3-a4b4-52540035b04c FYI, the posix test passed on Lustre b2_4 build #27.
            green Oleg Drokin added a comment -

            in client dmesg:

            Lustre: DEBUG MARKER: Run POSIX test against lustre filesystem
            LustreError: 11-0: lustre-MDT0000-mdc-ffff880331591c00: Communicating with 192.168.4.20@o2ib, operation ldlm_enqueue failed with -11.
            LustreError: 11-0: lustre-MDT0000-mdc-ffff880331591c00: Communicating with 192.168.4.20@o2ib, operation ldlm_enqueue failed with -11.
            LustreError: 11-0: lustre-MDT0000-mdc-ffff880331591c00: Communicating with 192.168.4.20@o2ib, operation ldlm_enqueue failed with -11.
            LustreError: 11-0: lustre-MDT0000-mdc-ffff880331591c00: Communicating with 192.168.4.20@o2ib, operation ldlm_enqueue failed with -11.
            LustreError: Skipped 8 previous similar messages
            LustreError: 11-0: lustre-MDT0000-mdc-ffff880331591c00: Communicating with 192.168.4.20@o2ib, operation ldlm_enqueue failed with -35.
            LustreError: Skipped 2 previous similar messages
            

            But nothing like that on MDS

            green Oleg Drokin added a comment - in client dmesg: Lustre: DEBUG MARKER: Run POSIX test against lustre filesystem LustreError: 11-0: lustre-MDT0000-mdc-ffff880331591c00: Communicating with 192.168.4.20@o2ib, operation ldlm_enqueue failed with -11. LustreError: 11-0: lustre-MDT0000-mdc-ffff880331591c00: Communicating with 192.168.4.20@o2ib, operation ldlm_enqueue failed with -11. LustreError: 11-0: lustre-MDT0000-mdc-ffff880331591c00: Communicating with 192.168.4.20@o2ib, operation ldlm_enqueue failed with -11. LustreError: 11-0: lustre-MDT0000-mdc-ffff880331591c00: Communicating with 192.168.4.20@o2ib, operation ldlm_enqueue failed with -11. LustreError: Skipped 8 previous similar messages LustreError: 11-0: lustre-MDT0000-mdc-ffff880331591c00: Communicating with 192.168.4.20@o2ib, operation ldlm_enqueue failed with -35. LustreError: Skipped 2 previous similar messages But nothing like that on MDS

            Is the client being mounted with "-o flock"?

            adilger Andreas Dilger added a comment - Is the client being mounted with "-o flock"?

            People

              bfaccini Bruno Faccini (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: