[LU-6669] Hard Failover recovery-mds-scale test_failover_mds: test_failover_mds returned 3 Created: 01/Jun/15  Updated: 14/Dec/21  Resolved: 14/Dec/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/f446d474-007b-11e5-9650-5254006e85c2.

The sub-test test_failover_mds failed with the following error:

test_failover_mds returned 3

test log

CMD: shadow-21vm1,shadow-21vm2 cat /tmp/client-load.pid
shadow-21vm2: cat: /tmp/client-load.pid: No such file or directory
shadow-21vm1: cat: /tmp/client-load.pid: No such file or directory


 Comments   
Comment by Sarah Liu [ 12/Jun/15 ]

The failure is due to failed writing pid into LOAD_PID_FILE in run_*.sh

# recovery-*-scale scripts use this to signal the client loads to die
echo $$ >$LOAD_PID_FILE
Comment by Oleg Drokin [ 12/Jun/15 ]

To mee it looks like something wiped all or most of /tmp.
This is supported by lack of various log files that would have been there otherwise.

This could be some external force or a bug in one of the test scripts.

Comment by Sarah Liu [ 30/Jun/15 ]

recovery-*-scale.sh are all affected by this failure

Comment by Sarah Liu [ 12/Aug/15 ]

lustre-master build 3118

https://testing.hpdd.intel.com/test_sets/057df5c4-35c4-11e5-8c30-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 10/Dec/15 ]

master, build# 3264, 2.7.64 tag
Hard Failover: EL6.7 Server/Client
https://testing.hpdd.intel.com/test_sets/7b412132-9edd-11e5-87a9-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 10/Dec/15 ]

This test has failed with the same error around 14 times in the past month.
https://testing.hpdd.intel.com/sub_tests/5c07fe72-9e2d-11e5-87a9-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/cf49392e-9e9a-11e5-b163-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/7b4609e0-9edd-11e5-87a9-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/2c618474-9ebc-11e5-98a4-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/0c64a484-9d37-11e5-8e88-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/3b5e3922-9c64-11e5-9866-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/ab3c95d2-9b7d-11e5-9930-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/c1059ce0-9a8a-11e5-8b28-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/d5af25c2-99b2-11e5-9bd2-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/fae19aac-9938-11e5-802b-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/b2f329fe-98fc-11e5-8079-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/7d5e3812-9909-11e5-aeec-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/e1399bd6-95f0-11e5-a6d2-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/f7c7dc36-8a70-11e5-ba42-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 10/Dec/15 ]

master, build# 3264, 2.7.64 tag
Hard Failover: EL6.7 Server/Client - ZFS
https://testing.hpdd.intel.com/test_sets/2c5961c2-9ebc-11e5-98a4-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 10/Dec/15 ]

master, build# 3264, 2.7.64 tag
Hard Failover: EL6.7 Server/Client - ZFS
recovery-double-scale test_pairwise_fail failed with the the same issue.
https://testing.hpdd.intel.com/test_sets/2ce07428-9ebc-11e5-98a4-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 10/Dec/15 ]

master, build# 3264, 2.7.64 tag
Hard Failover: EL7 Server/Client
https://testing.hpdd.intel.com/test_sets/cf43bf1c-9e9a-11e5-b163-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 11/Dec/15 ]

master, build# 3264, 2.7.64 tag
Hard Failover: EL7 Server/Client - ZFS
https://testing.hpdd.intel.com/test_sets/572d25ba-9e20-11e5-91b0-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 15/Dec/15 ]

master, build# 3266, 2.7.64 tag
Hard Failover: EL6.7 Server/SLES11 SP3 Clients
https://testing.hpdd.intel.com/test_sets/b71e8f10-a080-11e5-85ed-5254006e85c2
Hard Failover:EL7 Server/SLES11 SP3 Client
https://testing.hpdd.intel.com/test_sets/a39034e8-a077-11e5-8d69-5254006e85c2
Hard Failover: EL7 Server/SLES11 SP3 Client
https://testing.hpdd.intel.com/test_sets/a403d43e-a077-11e5-8d69-5254006e85c2

Also, recovery-mds-scale test_failover_ost failed with same error.
https://testing.hpdd.intel.com/test_sets/b6b0f202-a080-11e5-85ed-5254006e85c2

recovery-double-scale test_pairwise_fail failing with same issue.
Hard Failover: EL7 Server/SLES11 SP3 Client
https://testing.hpdd.intel.com/test_sets/a428828e-a077-11e5-8d69-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 20/Jan/16 ]

Another instance found for hardfailover: EL6.7 Server/SLES11 SP3 Clients
https://testing.hpdd.intel.com/test_sets/762762d0-ba4c-11e5-9a07-5254006e85c2

Generated at Sat Feb 10 02:02:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.