[LU-9600] recovery-random-scale test_fail_client_mds: test_fail_client_mds returned 7 Created: 05/Jun/17  Updated: 07/Feb/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0, Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Casper Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

trevis, failover
clients: SLES12, master branch, v2.9.58, b3591
servers: EL7, ldiskfs, master branch, v2.9.58, b3591


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

https://testing.hpdd.intel.com/test_sessions/e6b87235-1ff0-4e96-a53f-ca46ffe5ed7e

From suite_log:

Starting client: trevis-38vm1:  -o user_xattr,flock trevis-38vm7:trevis-38vm3:/lustre /mnt/lustre
CMD: trevis-38vm1 mkdir -p /mnt/lustre
CMD: trevis-38vm1 mount -t lustre -o user_xattr,flock trevis-38vm7:trevis-38vm3:/lustre /mnt/lustre
pdsh@trevis-38vm1: trevis-38vm6: mcmd: connect failed: No route to host
pdsh@trevis-38vm1: trevis-38vm5: mcmd: connect failed: No route to host
 recovery-random-scale : @@@@@@ FAIL: start client on trevis-38vm1 failed 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4952:error()
  = /usr/lib64/lustre/tests/recovery-random-scale.sh:264:main()

and

pdsh@trevis-38vm1: trevis-38vm6: mcmd: connect failed: No route to host
pdsh@trevis-38vm1: trevis-38vm5: mcmd: connect failed: No route to host
trevis-38vm1: CMD: trevis-38vm1 lctl get_param -n at_max
trevis-38vm1: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
 auster : @@@@@@ FAIL: import is not in FULL state 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4952:error()
  = /usr/lib64/lustre/tests/test-framework.sh:6288:wait_clients_import_state()
  = /usr/lib64/lustre/tests/test-framework.sh:2737:fail()
  = /usr/lib64/lustre/tests/test-framework.sh:3463:stopall()
  = auster:113:reset_lustre()
  = auster:217:run_suite()
  = auster:234:run_suite_logged()
  = auster:298:run_suites()
  = auster:334:main()


 Comments   
Comment by Sarah Liu [ 08/Jun/17 ]

I think this is caused by the PDSH problem, in DCO-7216

Generated at Sat Feb 10 02:27:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.