[LU-7775] replay-single test_70d: cannot touch /mnt/lustre/d70d.replay-single/test1/a: No such file or directory Created: 12/Feb/16  Updated: 29/Nov/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0, Lustre 2.10.0, Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-7117 replay-single test_70d: timeout and m... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Joe Gmitter <joseph.gmitter@intel.com>

The sub-test test_70d failed with the following error:

/usr/lib64/lustre/tests/replay-single.sh: line 2172: kill: (25681) - No such process

suite_log excerpt:

onyx-61vm2: CMD: onyx-61vm2.onyx.hpdd.intel.com lctl get_param -n at_max
onyx-61vm1: CMD: onyx-61vm1.onyx.hpdd.intel.com lctl get_param -n at_max
touch: cannot touch `/mnt/lustre/d70d.replay-single/test1/a': No such file or directory
touch fails
onyx-61vm2: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 5 sec
onyx-61vm1: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 5 sec
/usr/lib64/lustre/tests/replay-single.sh: line 2114: kill: (25681) - No such process
 replay-single test_70d: @@@@@@ FAIL: 25681 stopped 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4670:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:4704:error()
  = /usr/lib64/lustre/tests/replay-single.sh:2115:random_fail_mdt()
  = /usr/lib64/lustre/tests/replay-single.sh:2235:test_70d()
  = /usr/lib64/lustre/tests/test-framework.sh:4951:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:4988:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4853:run_test()
  = /usr/lib64/lustre/tests/replay-single.sh:2241:main()
Dumping lctl log to /logdir/test_logs/2016-02-11/lustre-reviews-el6_7-x86_64--review-dne-part-2--1_7_1__37315__-70264984240460-122237/replay-single.test_70d.*.1455227814.log
CMD: onyx-61vm1.onyx.hpdd.intel.com,onyx-61vm2,onyx-61vm3,onyx-61vm7,onyx-61vm8 /usr/sbin/lctl dk > /logdir/test_logs/2016-02-11/lustre-reviews-el6_7-x86_64--review-dne-part-2--1_7_1__37315__-70264984240460-122237/replay-single.test_70d.debug_log.\$(hostname -s).1455227814.log;
         dmesg > /logdir/test_logs/2016-02-11/lustre-reviews-el6_7-x86_64--review-dne-part-2--1_7_1__37315__-70264984240460-122237/replay-single.test_70d.dmesg.\$(hostname -s).1455227814.log
Resetting fail_loc on all nodes...CMD: onyx-61vm1.onyx.hpdd.intel.com,onyx-61vm2,onyx-61vm3,onyx-61vm7,onyx-61vm8 lctl set_param -n fail_loc=0 	    fail_val=0 2>/dev/null || true
done.
/usr/lib64/lustre/tests/replay-single.sh: line 2172: kill: (25681) - No such process
FAIL 70d (160s)

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/a8e09b5e-d146-11e5-a17d-5254006e85c2.



 Comments   
Comment by Sebastien Buisson (Inactive) [ 07/Mar/16 ]

Hit here:
https://testing.hpdd.intel.com/test_sets/c4249be6-e26b-11e5-abf3-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 17/May/16 ]

Another instance : https://testing.hpdd.intel.com/test_sets/be6215a4-1b9a-11e6-9e5d-5254006e85c2

Comment by James Casper [ 04/May/17 ]

https://testing.hpdd.intel.com/test_sessions/30cc75b6-594f-4255-accf-24fe11bdd565

Very similar failure in 2.9.56 (b3565): replay-single, test_70d: 17427 stopped

Process tries to start before failure, but cannot:

Started  17427
rm: cannot remove '/mnt/lustre/d70d.replay-single/test1': Directory not empty
rmdir fails

Then when it is called for later, it doesn't exist:

/usr/lib64/lustre/tests/replay-single.sh: line 2108: kill: (17427) - No such process
 replay-single test_70d: @@@@@@ FAIL: 17427 stopped 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4931:error()
  = /usr/lib64/lustre/tests/replay-single.sh:2109:random_fail_mdt()
  = /usr/lib64/lustre/tests/replay-single.sh:2232:test_70d()
  = /usr/lib64/lustre/tests/test-framework.sh:5207:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5246:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:5093:run_test()
  = /usr/lib64/lustre/tests/replay-single.sh:2238:main()
Generated at Sat Feb 10 02:11:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.