[LU-6935] replay-single test_70b FAIL: import is not in FULL state Created: 31/Jul/15  Updated: 10/Oct/21  Resolved: 10/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

review-dne-part-2 in autotest


Issue Links:
Related
is related to LU-6919 replay-single test_70b: "Cannot send ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

replay-single test 70 fails, actually hangs. There are several problems here:
1. The test fails complaining that "import is not in FULL state":

07:07:46:shadow-9vm9:  rpc : @@@@@@ FAIL: can't put import for mdc.lustre-MDT0002-mdc-*.mds_server_uuid into FULL state after 1475 sec, have REPLAY 
07:07:46:shadow-9vm6:    1  cleanup 2741 sec
07:07:46:shadow-9vm9:    1  cleanup 2741 sec
07:07:47:shadow-9vm9:   Trace dump:
07:07:47:shadow-9vm9:   = /usr/lib64/lustre/tests/test-framework.sh:4727:error_noexit()
07:07:47:shadow-9vm9:   = /usr/lib64/lustre/tests/test-framework.sh:4758:error()
07:07:47:shadow-9vm9:   = /usr/lib64/lustre/tests/test-framework.sh:5830:_wait_import_state()
07:07:47:shadow-9vm9:   = /usr/lib64/lustre/tests/test-framework.sh:5849:wait_import_state()
07:07:47:shadow-9vm9:   = /usr/lib64/lustre/tests/test-framework.sh:5858:wait_import_state_mount()
07:07:47:shadow-9vm9:   = rpc.sh:20:main()
07:07:47:shadow-9vm9: CMD: shadow-9vm4,shadow-9vm6,shadow-9vm7,shadow-9vm8,shadow-9vm9.shadow.whamcloud.com PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:./../utils:/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh check_logdir /tmp/test_logs/1435992181 
07:07:47:shadow-9vm6:  rpc : @@@@@@ FAIL: can't put import for mdc.lustre-MDT0002-mdc-*.mds_server_uuid into FULL state after 1475 sec, have REPLAY 
07:07:47:shadow-9vm6:    1  cleanup 2742 sec
07:07:47:shadow-9vm9:    1  cleanup 2742 sec
07:07:47:shadow-9vm6:   Trace dump:
07:07:47:shadow-9vm6:   = /usr/lib64/lustre/tests/test-framework.sh:4727:error_noexit()
07:07:47:shadow-9vm6:   = /usr/lib64/lustre/tests/test-framework.sh:4758:error()
07:07:47:shadow-9vm6:   = /usr/lib64/lustre/tests/test-framework.sh:5830:_wait_import_state()
07:07:47:shadow-9vm6:   = /usr/lib64/lustre/tests/test-framework.sh:5849:wait_import_state()
07:07:47:shadow-9vm6:   = /usr/lib64/lustre/tests/test-framework.sh:5858:wait_import_state_mount()
07:07:47:shadow-9vm6:   = rpc.sh:20:main()
07:07:47:shadow-9vm9: CMD: shadow-9vm4 uname -n
07:07:47:shadow-9vm6: CMD: shadow-9vm4,shadow-9vm6,shadow-9vm6.shadow.whamcloud.com,shadow-9vm7,shadow-9vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:./../utils:/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh check_logdir /tmp/test_logs/1435992181 
07:07:47:shadow-9vm9: Dumping lctl log to /tmp/test_logs/1435992181/rpc..*.1435993666.log
07:07:47:shadow-9vm9: CMD: shadow-9vm4,shadow-9vm6,shadow-9vm7,shadow-9vm8,shadow-9vm9.shadow.whamcloud.com /usr/sbin/lctl dk > /tmp/test_logs/1435992181/rpc..debug_log.\$(hostname -s).1435993666.log;
07:07:47:shadow-9vm9:          dmesg > /tmp/test_logs/1435992181/rpc..dmesg.\$(hostname -s).1435993666.log
07:07:48:shadow-9vm6: CMD: shadow-9vm4 uname -n
07:07:48:shadow-9vm6: Dumping lctl log to /tmp/test_logs/1435992181/rpc..*.1435993667.log
07:07:48:shadow-9vm6: CMD: shadow-9vm4,shadow-9vm6,shadow-9vm6.shadow.whamcloud.com,shadow-9vm7,shadow-9vm8 /usr/sbin/lctl dk > /tmp/test_logs/1435992181/rpc..debug_log.\$(hostname -s).1435993667.log;
07:07:48:shadow-9vm6:          dmesg > /tmp/test_logs/1435992181/rpc..dmesg.\$(hostname -s).1435993667.log
07:07:48:shadow-9vm6:    1  cleanup 2743 sec
07:07:48:shadow-9vm9:    1  cleanup 2743 sec
07:07:48:shadow-9vm9: CMD: shadow-9vm4,shadow-9vm6,shadow-9vm7,shadow-9vm8,shadow-9vm9.shadow.whamcloud.com rsync -az /tmp/test_logs/1435992181/rpc..*.1435993666.log shadow-9vm9.shadow.whamcloud.com:/tmp/test_logs/1435992181
07:07:48:shadow-9vm6: CMD: shadow-9vm4,shadow-9vm6,shadow-9vm6.shadow.whamcloud.com,shadow-9vm7,shadow-9vm8 rsync -az /tmp/test_logs/1435992181/rpc..*.1435993667.log shadow-9vm6.shadow.whamcloud.com:/tmp/test_logs/1435992181
07:07:49:shadow-9vm6:    1  cleanup 2744 sec
07:07:49:shadow-9vm9:    1  cleanup 2744 sec
07:07:49: replay-single test_70b: @@@@@@ FAIL: import is not in FULL state 
07:07:49:  Trace dump:
07:07:49:  = /usr/lib64/lustre/tests/test-framework.sh:4727:error_noexit()
07:07:49:  = /usr/lib64/lustre/tests/test-framework.sh:4758:error()
07:07:49:  = /usr/lib64/lustre/tests/test-framework.sh:6004:wait_clients_import_state()
07:07:49:  = /usr/lib64/lustre/tests/test-framework.sh:2574:fail()
07:07:49:  = /usr/lib64/lustre/tests/replay-single.sh:2091:test_70b()
07:07:49:  = /usr/lib64/lustre/tests/test-framework.sh:5020:run_one()
07:07:49:  = /usr/lib64/lustre/tests/test-framework.sh:5057:run_one_logged()
07:07:49:  = /usr/lib64/lustre/tests/test-framework.sh:4907:run_test()
07:07:49:  = /usr/lib64/lustre/tests/replay-single.sh:2102:main()

2. The test does fail in a way that Maloo can recognize. So, autotest times the test out. In the test reports below, it looks like test 70b never ran, but that the test suite failed. In the Maloo report, 93/93 tests pass, but clearly not all the replay-single tests were run and looking at the suite_stdout, we see the error message above.

3. No logs are collected to analyze this failure.

This test has failed in this way five times this month:
2015-07-10 08:11:47 - https://testing.hpdd.intel.com/test_sets/a941c760-2725-11e5-bc86-5254006e85c2
2015-07-11 21:00:38 - https://testing.hpdd.intel.com/test_sets/ee8a4f52-2858-11e5-ba19-5254006e85c2
2015-07-20 14:23:34 - https://testing.hpdd.intel.com/test_sets/cff07ed6-2f33-11e5-92dd-5254006e85c2
2015-07-21 16:12:58 - https://testing.hpdd.intel.com/test_sets/3fadf97a-300f-11e5-97d6-5254006e85c2
2015-07-31 06:19:18 - https://testing.hpdd.intel.com/test_sets/9635fe7a-3797-11e5-9d53-5254006e85c2



 Comments   
Comment by James Nunez (Inactive) [ 18/Aug/15 ]

Another instance:
2015-08-17 14:31:52 - https://testing.hpdd.intel.com/test_sets/5ef8538a-4530-11e5-a64b-5254006e85c2
2015-08-24 04:34:26 - https://testing.hpdd.intel.com/test_sets/d5b1a28c-4a5e-11e5-aa52-5254006e85c2
2015-10-09 05:22:26 - https://testing.hpdd.intel.com/test_sets/31cb92a0-6e89-11e5-8442-5254006e85c2
2015-10-10 04:37:06 - https://testing.hpdd.intel.com/test_sets/11d0fbc0-6f4c-11e5-83a9-5254006e85c2
2015-10-10 20:27:48 - https://testing.hpdd.intel.com/test_sets/00459074-6fd7-11e5-a914-5254006e85c2
2015-10-12 08:21:18 - https://testing.hpdd.intel.com/test_sets/29c7ee14-7104-11e5-88e8-5254006e85c2
2015-10-13 07:03:22 - https://testing.hpdd.intel.com/test_sets/5102634c-71c2-11e5-bffb-5254006e85c2
2015-10-17 12:53:18 - https://testing.hpdd.intel.com/test_sets/3f5e1392-751a-11e5-812b-5254006e85c2
2015-10-19 09:26:21 - https://testing.hpdd.intel.com/test_sets/2f0ad1d8-768c-11e5-ad25-5254006e85c2
2015-11-09 17:34:18 - https://testing.hpdd.intel.com/test_sets/119eb292-8751-11e5-bf92-5254006e85c2
2015-11-13 13:28:35 - https://testing.hpdd.intel.com/test_sets/8cdd0490-8a53-11e5-935c-5254006e85c2

Comment by Andreas Dilger [ 10/Oct/21 ]

There are failures in LU-10616, but they have a different symptom.

Generated at Sat Feb 10 02:04:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.