Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3701

Failure on test suite posix subtest test_1: fcntl.18/fcntl.35 Unresolved

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.5.0, Lustre 2.4.2
    • Lustre 2.5.0
    • server and client: lustre-master build: 1952
    • 3
    • 9549

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/261745ba-fb5b-11e2-8c6e-52540035b04c.

      The sub-test test_1 failed with the following error:

      Run POSIX testsuite on /mnt/lustre failed

      test log

      SUCCESS SUMMARY:
      
      News POSIX successes: 1
      
      Test Name                   Baseline   Lustre Report
      read.15                       Failed       Succeeded
      
      
      FAILURE SUMMARY:
      
      POSIX failures: 2
      
      Test Name                   Baseline   Lustre Report
      fcntl.18                   Succeeded      Unresolved
      fcntl.35                   Succeeded      Unresolved
      
      FAILURE DESCRIPTIONS:
      
      ####################################################
      Test Name: fcntl.18 Unresolved
      
      	Test Description:
      For the XNFS specification:
          If the implementation supports file locking for files residing on
          a remote file system: On a call to fcntl(fildes, F_SETLKW, arg)
          when the lock specified by arg can not be set, waits until the
          lock can be set.
      For the XSH specification:
          On a call to fcntl(fildes, F_SETLKW, arg) when the lock specified
          by arg can not be set, waits until the lock can be set.
          Posix Ref: Component FCNTL Assertion 6.5.2.2-23(A)
      
      	Test Information:
      deletion reason: External error - waitsync failed
      deletion reason: External error - waitsync failed
      
      ####################################################
      Test Name: fcntl.35 Unresolved
      
      	Test Description:
      For the XNFS specification:
          If the implementation supports file locking for files residing on
          a remote file system: EINTR in errno and -1 returned by fcntl() if
          the operation is interrupted by a signal.
      For the XSH specification:
          EINTR in errno and -1 returned by fcntl() if the operation is
          interrupted by a signal.
          Posix Ref: Component FCNTL Assertion 6.5.2.4-40(A)
      
      	Test Information:
      child process timed out
      

      Attachments

        Issue Links

          Activity

            [LU-3701] Failure on test suite posix subtest test_1: fcntl.18/fcntl.35 Unresolved
            yujian Jian Yu added a comment - - edited

            Patch http://review.whamcloud.com/7453 was cherry-picked to Lustre b2_4 branch.

            yujian Jian Yu added a comment - - edited Patch http://review.whamcloud.com/7453 was cherry-picked to Lustre b2_4 branch.

            b2_1 patch version is at http://review.whamcloud.com/7586. Patch-less Client Kernel integration will occur automatically now that master (http://review.whamcloud.com/7453) patch landed.

            bfaccini Bruno Faccini (Inactive) added a comment - b2_1 patch version is at http://review.whamcloud.com/7586 . Patch-less Client Kernel integration will occur automatically now that master ( http://review.whamcloud.com/7453 ) patch landed.
            pjones Peter Jones added a comment -

            Landed to 2.5.

            pjones Peter Jones added a comment - Landed to 2.5.

            In fact auto-tests default set finally ran against build/patch.

            Also, I checked successfully that new patch/change http://review.whamcloud.com/7453 also preserves correct behavior against LU-2665 case/scenario.

            Will ask for reviews now and if ok, need to provide at least a b2_1 version and also push it for patch-less Client Kernel integration (in addition of patch for LU-2665 already pushed !!).

            bfaccini Bruno Faccini (Inactive) added a comment - In fact auto-tests default set finally ran against build/patch. Also, I checked successfully that new patch/change http://review.whamcloud.com/7453 also preserves correct behavior against LU-2665 case/scenario. Will ask for reviews now and if ok, need to provide at least a b2_1 version and also push it for patch-less Client Kernel integration (in addition of patch for LU-2665 already pushed !!).

            I did, but seems that only "posix" test ran, is it expected behavior ? I thought that Test-Parameters will run tests in addition to the default set, unless "fortestonly" is specified ...

            On the other hand "posix" test has been successful, so I need to check now that LU-2665 problem is still fixed too.

            bfaccini Bruno Faccini (Inactive) added a comment - I did, but seems that only "posix" test ran, is it expected behavior ? I thought that Test-Parameters will run tests in addition to the default set, unless "fortestonly" is specified ... On the other hand "posix" test has been successful, so I need to check now that LU-2665 problem is still fixed too.
            yujian Jian Yu added a comment -

            On the other hand, I think that original test/patch for LU-2665 could be refined to make both LU-2665 bug and Posix test suite happy. New change attempt pushed on Gerrit at http://review.whamcloud.com/7453.

            Please add the following test parameter into the commit message to see whether posix test suite can pass or not:

            Test-Parameters: testlist=posix
            
            yujian Jian Yu added a comment - On the other hand, I think that original test/patch for LU-2665 could be refined to make both LU-2665 bug and Posix test suite happy. New change attempt pushed on Gerrit at http://review.whamcloud.com/7453 . Please add the following test parameter into the commit message to see whether posix test suite can pass or not: Test-Parameters: testlist=posix

            Hello Jian,
            Thanks for the link+hint already!
            Unfortunately, when I run lustre/tests/posix.sh on a fresh+recent master install, it fails in build-posix.exp with following msgs/logs :

            Enter the root password:^M
            Password: ^M
            losetup: /dev/loop0: device is busy^M
            Aborting installation^M
            mv: cannot stat `/usr/src/posix/ext4/tet/test_sets/results/0002e': No such file or directory
            child process exited abnormally
                while executing
            "system "mv $results_dir/0002e $results_dir/lustre_baseline""
                (file "build-posix.exp" line 161)^M
            failed to build POSIX test suite.
             posix test_1: @@@@@@ FAIL: Setup POSIX test suite failed
              Trace dump:
              = /usr/lib64/lustre/tests/test-framework.sh:4200:error_noexit()
              = /usr/lib64/lustre/tests/test-framework.sh:4227:error()
              = ./posix.sh:106:test_1()
              = /usr/lib64/lustre/tests/test-framework.sh:4466:run_one()
              = /usr/lib64/lustre/tests/test-framework.sh:4499:run_one_logged()
              = /usr/lib64/lustre/tests/test-framework.sh:4369:run_test()
              = ./posix.sh:118:main()
            Dumping lctl log to /tmp/test_logs/1377521275/posix.test_1.*.1377521325.log
            Dumping logs only on local client.
            

            and this looks like some odd loop-device configuration issue.I am trying to debug+fix this, but any other help and hint are welcome.

            On the other hand, I think that original test/patch for LU-2665 could be refined to make both LU-2665 bug and Posix test suite happy. New change attempt pushed on Gerrit at http://review.whamcloud.com/7453.

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Jian, Thanks for the link+hint already! Unfortunately, when I run lustre/tests/posix.sh on a fresh+recent master install, it fails in build-posix.exp with following msgs/logs : Enter the root password:^M Password: ^M losetup: /dev/loop0: device is busy^M Aborting installation^M mv: cannot stat `/usr/src/posix/ext4/tet/test_sets/results/0002e': No such file or directory child process exited abnormally while executing "system "mv $results_dir/0002e $results_dir/lustre_baseline"" (file "build-posix.exp" line 161)^M failed to build POSIX test suite. posix test_1: @@@@@@ FAIL: Setup POSIX test suite failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4200:error_noexit() = /usr/lib64/lustre/tests/test-framework.sh:4227:error() = ./posix.sh:106:test_1() = /usr/lib64/lustre/tests/test-framework.sh:4466:run_one() = /usr/lib64/lustre/tests/test-framework.sh:4499:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:4369:run_test() = ./posix.sh:118:main() Dumping lctl log to /tmp/test_logs/1377521275/posix.test_1.*.1377521325.log Dumping logs only on local client. and this looks like some odd loop-device configuration issue.I am trying to debug+fix this, but any other help and hint are welcome. On the other hand, I think that original test/patch for LU-2665 could be refined to make both LU-2665 bug and Posix test suite happy. New change attempt pushed on Gerrit at http://review.whamcloud.com/7453 .
            yujian Jian Yu added a comment -

            BTW, where can I find the Posix test suite ? It does not appear to be part of lustre-tests.

            http://build.whamcloud.com/job/toolkit/arch=x86_64,distro=el6/lastSuccessfulBuild/artifact/_topdir/RPMS/x86_64/posix-1.0-wc1.x86_64.rpm

            After installing the above package on test node, we can perform lustre/tests/posix.sh to install, build and run LSB-VSX POSIX test suite on $BASELINE_FS and Lustre, then compare the test results.

            yujian Jian Yu added a comment - BTW, where can I find the Posix test suite ? It does not appear to be part of lustre-tests. http://build.whamcloud.com/job/toolkit/arch=x86_64,distro=el6/lastSuccessfulBuild/artifact/_topdir/RPMS/x86_64/posix-1.0-wc1.x86_64.rpm After installing the above package on test node, we can perform lustre/tests/posix.sh to install, build and run LSB-VSX POSIX test suite on $BASELINE_FS and Lustre, then compare the test results.

            According to the failing Posix tests description and LU-2665 patch content we can suspect a possible regression for fcntl.35 (EINTR handling), but it looks less obvious for fcntl.18 (patch should not impact forced wait).

            BTW, where can I find the Posix test suite ? It does not appear to be part of lustre-tests.

            bfaccini Bruno Faccini (Inactive) added a comment - According to the failing Posix tests description and LU-2665 patch content we can suspect a possible regression for fcntl.35 (EINTR handling), but it looks less obvious for fcntl.18 (patch should not impact forced wait). BTW, where can I find the Posix test suite ? It does not appear to be part of lustre-tests.
            pjones Peter Jones added a comment -

            Bruno will look into this

            pjones Peter Jones added a comment - Bruno will look into this

            People

              bfaccini Bruno Faccini (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: