Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6453

Interop 2.7.0<->2.8 sanity-hsm test_500: One llapi HSM test failed

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.7.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/61061480-df8e-11e4-b5b0-5254006e85c2.

      The sub-test test_500 failed with the following error:

      One llapi HSM test failed
      
      == sanity-hsm test 500: various LLAPI HSM tests == 13:56:09 (1428612969)
      CMD: onyx-38vm3 /usr/sbin/lctl get_param -n version
      CMD: onyx-38vm5 pkill -INT -x lhsmtool_posix
      Starting test test1 at 1428612970
      Finishing test test1 at 1428612972
      Starting test test2 at 1428612972
      Finishing test test2 at 1428612972
      Starting test test3 at 1428612972
      NULL archive numbers
      NULL archive numbers
      maximum of 32 archives supported
      maximum of 32 archives supported
      Finishing test test3 at 1428612972
      Starting test test4 at 1428612972
      Finishing test test4 at 1428612972
      Starting test test5 at 1428612972
      Finishing test test5 at 1428612972
      Starting test test6 at 1428612972
      Finishing test test6 at 1428612972
      Starting test test7 at 1428612972
      Finishing test test7 at 1428612972
      Starting test test50 at 1428612972
      Finishing test test50 at 1428612972
      Starting test test51 at 1428612972
      llapi_hsm_test: llapi_hsm_test.c:387: test51: assertion 'rc == 0' failed: llapi_hsm_state_set_fd failed: Invalid argument
       sanity-hsm test_500: @@@@@@ FAIL: One llapi HSM test failed 
      

      Attachments

        Issue Links

          Activity

            [LU-6453] Interop 2.7.0<->2.8 sanity-hsm test_500: One llapi HSM test failed
            sarah Sarah Liu added a comment - interop of b2_10 https://testing.whamcloud.com/test_sets/62641a4e-432a-11e9-92fe-52540065bddc
            bogl Bob Glossman (Inactive) added a comment - another on master: https://testing.hpdd.intel.com/test_sets/9320b6c8-b303-11e7-a282-5254006e85c2
            standan Saurabh Tandan (Inactive) added a comment - - edited

            Another instance found for interop - EL7 Server/2.7.1 Client, tag 2.7.90.
            https://testing.hpdd.intel.com/test_sessions/495aabae-d306-11e5-be5c-5254006e85c2
            Another instance found for interop - EL6.7 Server/2.7.1 Client, tag 2.7.90.
            https://testing.hpdd.intel.com/test_sessions/42ace612-d560-11e5-9cc2-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - - edited Another instance found for interop - EL7 Server/2.7.1 Client, tag 2.7.90. https://testing.hpdd.intel.com/test_sessions/495aabae-d306-11e5-be5c-5254006e85c2 Another instance found for interop - EL6.7 Server/2.7.1 Client, tag 2.7.90. https://testing.hpdd.intel.com/test_sessions/42ace612-d560-11e5-9cc2-5254006e85c2

            Another instance found for interop tag 2.7.66 - EL7 Server/2.7.1 Client, build# 3316
            https://testing.hpdd.intel.com/test_sets/be4c447a-cc91-11e5-b80c-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - Another instance found for interop tag 2.7.66 - EL7 Server/2.7.1 Client, build# 3316 https://testing.hpdd.intel.com/test_sets/be4c447a-cc91-11e5-b80c-5254006e85c2

            Another instance for interop: EL7 Server/2.7.1 Client
            Server: master, build# 3303, RHEL 7
            Client: 2.7.1, b2_7_fe/34
            https://testing.hpdd.intel.com/test_sets/64407d5e-bac4-11e5-9137-5254006e85c2

            Occured around 15 times in past 30 days.

            standan Saurabh Tandan (Inactive) added a comment - Another instance for interop: EL7 Server/2.7.1 Client Server: master, build# 3303, RHEL 7 Client: 2.7.1, b2_7_fe/34 https://testing.hpdd.intel.com/test_sets/64407d5e-bac4-11e5-9137-5254006e85c2 Occured around 15 times in past 30 days.

            Another instance found for :
            Server: Master , Build# 3276
            Client: b2_7_fe/34
            https://testing.hpdd.intel.com/test_sets/147848da-a5a5-11e5-a14a-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - Another instance found for : Server: Master , Build# 3276 Client: b2_7_fe/34 https://testing.hpdd.intel.com/test_sets/147848da-a5a5-11e5-a14a-5254006e85c2
            • Similar issue is seen during master <-> 2.5.1 interop .
            • Configuration
              Configuration : 4 node setup - 1 MDS/ 1OSS/2 clients.
              Release
              server - 3.10.0_229.20.1.el7.x86_64_ga7eface
              clients - 2.6.32_431.17.1.el6.x86_64
              
              Server 2.7.63
              Client 2.5.1
              interop
              
              stdout.log
              == sanity-hsm test 500: various LLAPI HSM tests == 13:22:21 (1448457741)
              Copytool is stopped on fre1112
              Waiting 20 secs for update
              Updated after 6s: wanted 'stopped' got 'stopped'
              Starting test test1 at 1448457750
              Finishing test test1 at 1448457756
              Starting test test2 at 1448457756
              Finishing test test2 at 1448457756
              Starting test test3 at 1448457756
              NULL archive numbers
              NULL archive numbers
              maximum of 32 archives supported
              maximum of 32 archives supported
              Finishing test test3 at 1448457756
              Starting test test4 at 1448457756
              Finishing test test4 at 1448457756
              Starting test test5 at 1448457756
              Finishing test test5 at 1448457756
              Starting test test6 at 1448457756
              Finishing test test6 at 1448457756
              Starting test test7 at 1448457756
              Finishing test test7 at 1448457756
              Starting test test50 at 1448457756
              Finishing test test50 at 1448457756
              Starting test test51 at 1448457756
              Starting test test51 at 1448457756
              llapi_hsm_test: llapi_hsm_test.c:387: test51: assertion 'rc == 0' failed: llapi_hsm_state_set_fd failed: Invalid argument
               sanity-hsm test_500: @@@@@@ FAIL: One llapi HSM test failed 
                Trace dump:
                = /usr/lib64/lustre/tests/test-framework.sh:4672:error()
                = /usr/lib64/lustre/tests/sanity-hsm.sh:4234:test_500()
                = /usr/lib64/lustre/tests/test-framework.sh:4932:run_one()
                = /usr/lib64/lustre/tests/test-framework.sh:4968:run_one_logged()
                = /usr/lib64/lustre/tests/test-framework.sh:4774:run_test()
                = /usr/lib64/lustre/tests/sanity-hsm.sh:4236:main()
              Dumping lctl log to /tmp/test_logs/1448457713/sanity-hsm.test_500.*.1448457757.log
              Resetting fail_loc and fail_val on all nodes...done.
              FAIL 500 (17s)
              == sanity-hsm test complete, duration 45 sec == 13:22:38 (1448457758)
              sanity-hsm: FAIL: test_500 One llapi HSM test failed
              Stopping clients: fre1111,fre1112 /mnt/lustre2 (opts:)
              Stopping client fre1112 /mnt/lustre2 opts:
              Stopping client fre1111 /mnt/lustre2 opts:
              
              
              
              stderr.log
              running as uid/gid/euid/egid 500/500/500/500, groups:
               [touch] [/mnt/lustre/d0_runas_test/f2433]
              pdsh@fre1111: fre1112: ssh exited with exit code 1
              
              
            parinay parinay v kondekar (Inactive) added a comment - - edited Similar issue is seen during master <-> 2.5.1 interop . Configuration Configuration : 4 node setup - 1 MDS/ 1OSS/2 clients. Release server - 3.10.0_229.20.1.el7.x86_64_ga7eface clients - 2.6.32_431.17.1.el6.x86_64 Server 2.7.63 Client 2.5.1 interop stdout.log == sanity-hsm test 500: various LLAPI HSM tests == 13:22:21 (1448457741) Copytool is stopped on fre1112 Waiting 20 secs for update Updated after 6s: wanted 'stopped' got 'stopped' Starting test test1 at 1448457750 Finishing test test1 at 1448457756 Starting test test2 at 1448457756 Finishing test test2 at 1448457756 Starting test test3 at 1448457756 NULL archive numbers NULL archive numbers maximum of 32 archives supported maximum of 32 archives supported Finishing test test3 at 1448457756 Starting test test4 at 1448457756 Finishing test test4 at 1448457756 Starting test test5 at 1448457756 Finishing test test5 at 1448457756 Starting test test6 at 1448457756 Finishing test test6 at 1448457756 Starting test test7 at 1448457756 Finishing test test7 at 1448457756 Starting test test50 at 1448457756 Finishing test test50 at 1448457756 Starting test test51 at 1448457756 Starting test test51 at 1448457756 llapi_hsm_test: llapi_hsm_test.c:387: test51: assertion 'rc == 0' failed: llapi_hsm_state_set_fd failed: Invalid argument sanity-hsm test_500: @@@@@@ FAIL: One llapi HSM test failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4672:error() = /usr/lib64/lustre/tests/sanity-hsm.sh:4234:test_500() = /usr/lib64/lustre/tests/test-framework.sh:4932:run_one() = /usr/lib64/lustre/tests/test-framework.sh:4968:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:4774:run_test() = /usr/lib64/lustre/tests/sanity-hsm.sh:4236:main() Dumping lctl log to /tmp/test_logs/1448457713/sanity-hsm.test_500.*.1448457757.log Resetting fail_loc and fail_val on all nodes...done. FAIL 500 (17s) == sanity-hsm test complete, duration 45 sec == 13:22:38 (1448457758) sanity-hsm: FAIL: test_500 One llapi HSM test failed Stopping clients: fre1111,fre1112 /mnt/lustre2 (opts:) Stopping client fre1112 /mnt/lustre2 opts: Stopping client fre1111 /mnt/lustre2 opts: stderr.log running as uid/gid/euid/egid 500/500/500/500, groups: [touch] [/mnt/lustre/d0_runas_test/f2433] pdsh@fre1111: fre1112: ssh exited with exit code 1

            This has occured because test was done with MDS node running with 2.7.52-PRISTINE-2.6.32-504.12.2.el6_lustre.x86_64 (ie, with commit 32bd5051 for LU-5757) and Client node running with 2.7.0-RC4-PRISTINE-2.6.32-504.8.1.el6.x86_64 (ie, without commit 32bd5051 for LU-5757).

            May be we should prevent execution of sanity-hsm/test_500 when MDS and Client version differ to ensure API and Server code are compatible ??

            bfaccini Bruno Faccini (Inactive) added a comment - This has occured because test was done with MDS node running with 2.7.52- PRISTINE-2.6.32-504.12.2.el6_lustre.x86_64 (ie, with commit 32bd5051 for LU-5757 ) and Client node running with 2.7.0-RC4 -PRISTINE-2.6.32-504.8.1.el6.x86_64 (ie, without commit 32bd5051 for LU-5757 ). May be we should prevent execution of sanity-hsm/test_500 when MDS and Client version differ to ensure API and Server code are compatible ??

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: