[LU-6453] Interop 2.7.0<->2.8 sanity-hsm test_500: One llapi HSM test failed Created: 10/Apr/15  Updated: 19/Mar/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-5757 hsm: userspace can set about any HSM ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/61061480-df8e-11e4-b5b0-5254006e85c2.

The sub-test test_500 failed with the following error:

One llapi HSM test failed
== sanity-hsm test 500: various LLAPI HSM tests == 13:56:09 (1428612969)
CMD: onyx-38vm3 /usr/sbin/lctl get_param -n version
CMD: onyx-38vm5 pkill -INT -x lhsmtool_posix
Starting test test1 at 1428612970
Finishing test test1 at 1428612972
Starting test test2 at 1428612972
Finishing test test2 at 1428612972
Starting test test3 at 1428612972
NULL archive numbers
NULL archive numbers
maximum of 32 archives supported
maximum of 32 archives supported
Finishing test test3 at 1428612972
Starting test test4 at 1428612972
Finishing test test4 at 1428612972
Starting test test5 at 1428612972
Finishing test test5 at 1428612972
Starting test test6 at 1428612972
Finishing test test6 at 1428612972
Starting test test7 at 1428612972
Finishing test test7 at 1428612972
Starting test test50 at 1428612972
Finishing test test50 at 1428612972
Starting test test51 at 1428612972
llapi_hsm_test: llapi_hsm_test.c:387: test51: assertion 'rc == 0' failed: llapi_hsm_state_set_fd failed: Invalid argument
 sanity-hsm test_500: @@@@@@ FAIL: One llapi HSM test failed 


 Comments   
Comment by Bruno Faccini (Inactive) [ 11/Jun/15 ]

This has occured because test was done with MDS node running with 2.7.52-PRISTINE-2.6.32-504.12.2.el6_lustre.x86_64 (ie, with commit 32bd5051 for LU-5757) and Client node running with 2.7.0-RC4-PRISTINE-2.6.32-504.8.1.el6.x86_64 (ie, without commit 32bd5051 for LU-5757).

May be we should prevent execution of sanity-hsm/test_500 when MDS and Client version differ to ensure API and Server code are compatible ??

Comment by parinay v kondekar (Inactive) [ 27/Nov/15 ]
  • Similar issue is seen during master <-> 2.5.1 interop .
  • Configuration
    Configuration : 4 node setup - 1 MDS/ 1OSS/2 clients.
    Release
    server - 3.10.0_229.20.1.el7.x86_64_ga7eface
    clients - 2.6.32_431.17.1.el6.x86_64
    
    Server 2.7.63
    Client 2.5.1
    interop
    
    stdout.log
    == sanity-hsm test 500: various LLAPI HSM tests == 13:22:21 (1448457741)
    Copytool is stopped on fre1112
    Waiting 20 secs for update
    Updated after 6s: wanted 'stopped' got 'stopped'
    Starting test test1 at 1448457750
    Finishing test test1 at 1448457756
    Starting test test2 at 1448457756
    Finishing test test2 at 1448457756
    Starting test test3 at 1448457756
    NULL archive numbers
    NULL archive numbers
    maximum of 32 archives supported
    maximum of 32 archives supported
    Finishing test test3 at 1448457756
    Starting test test4 at 1448457756
    Finishing test test4 at 1448457756
    Starting test test5 at 1448457756
    Finishing test test5 at 1448457756
    Starting test test6 at 1448457756
    Finishing test test6 at 1448457756
    Starting test test7 at 1448457756
    Finishing test test7 at 1448457756
    Starting test test50 at 1448457756
    Finishing test test50 at 1448457756
    Starting test test51 at 1448457756
    Starting test test51 at 1448457756
    llapi_hsm_test: llapi_hsm_test.c:387: test51: assertion 'rc == 0' failed: llapi_hsm_state_set_fd failed: Invalid argument
     sanity-hsm test_500: @@@@@@ FAIL: One llapi HSM test failed 
      Trace dump:
      = /usr/lib64/lustre/tests/test-framework.sh:4672:error()
      = /usr/lib64/lustre/tests/sanity-hsm.sh:4234:test_500()
      = /usr/lib64/lustre/tests/test-framework.sh:4932:run_one()
      = /usr/lib64/lustre/tests/test-framework.sh:4968:run_one_logged()
      = /usr/lib64/lustre/tests/test-framework.sh:4774:run_test()
      = /usr/lib64/lustre/tests/sanity-hsm.sh:4236:main()
    Dumping lctl log to /tmp/test_logs/1448457713/sanity-hsm.test_500.*.1448457757.log
    Resetting fail_loc and fail_val on all nodes...done.
    FAIL 500 (17s)
    == sanity-hsm test complete, duration 45 sec == 13:22:38 (1448457758)
    sanity-hsm: FAIL: test_500 One llapi HSM test failed
    Stopping clients: fre1111,fre1112 /mnt/lustre2 (opts:)
    Stopping client fre1112 /mnt/lustre2 opts:
    Stopping client fre1111 /mnt/lustre2 opts:
    
    
    
    stderr.log
    running as uid/gid/euid/egid 500/500/500/500, groups:
     [touch] [/mnt/lustre/d0_runas_test/f2433]
    pdsh@fre1111: fre1112: ssh exited with exit code 1
    
    
Comment by Saurabh Tandan (Inactive) [ 23/Dec/15 ]

Another instance found for :
Server: Master , Build# 3276
Client: b2_7_fe/34
https://testing.hpdd.intel.com/test_sets/147848da-a5a5-11e5-a14a-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 19/Jan/16 ]

Another instance for interop: EL7 Server/2.7.1 Client
Server: master, build# 3303, RHEL 7
Client: 2.7.1, b2_7_fe/34
https://testing.hpdd.intel.com/test_sets/64407d5e-bac4-11e5-9137-5254006e85c2

Occured around 15 times in past 30 days.

Comment by Saurabh Tandan (Inactive) [ 10/Feb/16 ]

Another instance found for interop tag 2.7.66 - EL7 Server/2.7.1 Client, build# 3316
https://testing.hpdd.intel.com/test_sets/be4c447a-cc91-11e5-b80c-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 24/Feb/16 ]

Another instance found for interop - EL7 Server/2.7.1 Client, tag 2.7.90.
https://testing.hpdd.intel.com/test_sessions/495aabae-d306-11e5-be5c-5254006e85c2
Another instance found for interop - EL6.7 Server/2.7.1 Client, tag 2.7.90.
https://testing.hpdd.intel.com/test_sessions/42ace612-d560-11e5-9cc2-5254006e85c2

Comment by Bob Glossman (Inactive) [ 17/Oct/17 ]

another on master:
https://testing.hpdd.intel.com/test_sets/9320b6c8-b303-11e7-a282-5254006e85c2

Comment by Sarah Liu [ 18/Mar/19 ]

interop of b2_10
https://testing.whamcloud.com/test_sets/62641a4e-432a-11e9-92fe-52540065bddc

Generated at Sat Feb 10 02:00:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.