Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.8.0
-
None
-
autotest review-dne-part-2
-
3
-
9223372036854775807
Description
replay-single test 101 times out on mount of the mds1 with the abort recovery flag. The last information in the test_log is
01:57:12 (1452765432) waiting for onyx-34vm7 network 900 secs ... 01:57:12 (1452765432) network interface is UP CMD: onyx-34vm7 hostname CMD: onyx-34vm7 test -b /dev/lvm-Role_MDS/P1 Starting mds1: -o abort_recovery /dev/lvm-Role_MDS/P1 /mnt/mds1 CMD: onyx-34vm7 mkdir -p /mnt/mds1; mount -t lustre -o abort_recovery /dev/lvm-Role_MDS/P1 /mnt/mds1
From the MDS1 console, we see:
01:57:22:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 01:57:22:LustreError: 14301:0:(mdt_handler.c:5605:mdt_iocontrol()) lustre-MDT0000: Aborting recovery for device 01:57:44:LustreError: 14301:0:(ldlm_lib.c:2479:target_stop_recovery_thread()) lustre-MDT0000: Aborting recovery 01:57:44:Lustre: 14377:0:(ldlm_lib.c:1945:target_recovery_overseer()) recovery is aborted, evict exports in recovery 01:57:44:Lustre: 14377:0:(ldlm_lib.c:1945:target_recovery_overseer()) Skipped 2 previous similar messages 01:57:44:Lustre: lustre-MDT0000: disconnecting 5 stale clients 01:57:44:LustreError: 14377:0:(update_records.c:72:update_records_dump()) master transno = 382252089401 batchid = 373662154835 flags = 0 ops = 19 params = 9 01:57:44:LustreError: 14377:0:(update_records.c:72:update_records_dump()) master transno = 382252089401 batchid = 373662154836 flags = 0 ops = 28 params = 24 01:57:44:LustreError: 14377:0:(update_records.c:72:update_records_dump()) master transno = 382252089401 batchid = 377957122268 flags = 0 ops = 19 params = 9 01:57:44: Press any key to continue. 01:57:44: Press any key to continue. 01:57:44: Press any key to continue. 01:57:44: Press any key to continue. 01:57:44: Press any key to continue. 01:57:44: [H [J 01:57:44: GNU GRUB version 0.97 (631K lower / 2096116K upper memory)
We’ve seen this error four times in the past two months during review-dne-part-2 testing. Logs are at
2015-11-27 03:10:27 - https://testing.hpdd.intel.com/test_sets/874faa9a-9503-11e5-bdeb-5254006e85c2
2015-12-12 02:31:59 - https://testing.hpdd.intel.com/test_sets/77362cfc-a0e2-11e5-9d88-5254006e85c2
2016-01-02 08:22:17 - https://testing.hpdd.intel.com/test_sets/102b7ef4-b177-11e5-bf32-5254006e85c2
2016-01-14 08:30:36 - https://testing.hpdd.intel.com/test_sets/4723f9d4-bae8-11e5-87b4-5254006e85c2
Attachments
Issue Links
- is related to
-
LU-8753 Recovery already passed deadline with DNE
- Resolved