Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.12.4
-
None
-
DNE/ZFS
-
3
-
9223372036854775807
Description
conf-sanity tes_93 fails with ''mds2: import is not in FULL state after 40''. Looking at the console log for MDS 2/4 (vm5) for the failure at https://testing.whamcloud.com/test_sets/1e5610e6-43ab-11ea-8072-52540065bddc, we see that
trevis-42vm5: trevis-42vm5.trevis.whamcloud.com: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid 40 CMD: trevis-42vm4 zfs get -H -o value lustre:svname lustre-mdt3/mdt3 Starting mds3: lustre-mdt3/mdt3 /mnt/lustre-mds3 CMD: trevis-42vm4 mkdir -p /mnt/lustre-mds3; mount -t lustre lustre-mdt3/mdt3 /mnt/lustre-mds3 CMD: trevis-42vm5 zfs get -H -o value lustre:svname lustre-mdt4/mdt4 Starting mds4: lustre-mdt4/mdt4 /mnt/lustre-mds4 CMD: trevis-42vm5 mkdir -p /mnt/lustre-mds4; mount -t lustre lustre-mdt4/mdt4 /mnt/lustre-mds4 trevis-42vm5: rpc : @@@@@@ FAIL: can't put import for os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid into FULL state after 40 sec, have trevis-42vm5: Trace dump: trevis-42vm5: = /usr/lib64/lustre/tests/test-framework.sh:5900:error() trevis-42vm5: = /usr/lib64/lustre/tests/test-framework.sh:7027:_wait_import_state() trevis-42vm5: = /usr/lib64/lustre/tests/test-framework.sh:7049:wait_import_state()
In all the console logs, we just get confirmation that the MSD1 can’t connect in the allotted time limit.
Looking at the console log for the MDS1/3 (vm4), we see
[83050.530881] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-42vm5.trevis.whamcloud.com: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid 40 [83050.717460] Lustre: DEBUG MARKER: zfs get -H -o value lustre:svname lustre-mdt3/mdt3 [83050.753501] Lustre: DEBUG MARKER: trevis-42vm5.trevis.whamcloud.com: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid 40 [83051.092507] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds3; mount -t lustre lustre-mdt3/mdt3 /mnt/lustre-mds3 [83064.412617] LustreError: 3861:0:(fail.c:129:__cfs_fail_timeout_set()) cfs_fail_timeout id 90e sleeping for 10000ms [83064.414535] LustreError: 3861:0:(fail.c:129:__cfs_fail_timeout_set()) Skipped 76 previous similar messages [83074.420718] LustreError: 3861:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 90e awake [83074.422537] LustreError: 3861:0:(fail.c:133:__cfs_fail_timeout_set()) Skipped 76 previous similar messages [83089.482356] Lustre: lustre-MDT0002: Imperative Recovery not enabled, recovery window 60-180 [83089.484385] Lustre: Skipped 6 previous similar messages [83090.262090] Lustre: cli-ctl-lustre-MDT0002: Allocated super-sequence [0x0000000280000400-0x00000002c0000400]:2:mdt] [83091.385126] Lustre: DEBUG MARKER: /usr/sbin/lctl mark rpc : @@@@@@ FAIL: can\'t put import for os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid into FULL state after 40 sec, have [83091.610883] Lustre: DEBUG MARKER: rpc : @@@@@@ FAIL: can't put import for os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid into FULL state after 40 sec, have
Looking at the console log for the OSS (vm3), we see
[83053.780999] Lustre: DEBUG MARKER: == rpc test complete, duration -o sec ================================================================ 00:08:41 (1580342921) [83054.152945] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-42vm5.trevis.whamcloud.com: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid 40 [83054.372472] Lustre: DEBUG MARKER: trevis-42vm5.trevis.whamcloud.com: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid 40 [83092.153168] Lustre: cli-lustre-OST0000-super: Allocated super-sequence [0x0000000240000400-0x0000000280000400]:0:ost] [83092.155157] Lustre: Skipped 2 previous similar messages [83094.995539] Lustre: DEBUG MARKER: /usr/sbin/lctl mark rpc : @@@@@@ FAIL: can\'t put import for os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid into FULL state after 40 sec, have [83095.232157] Lustre: DEBUG MARKER: rpc : @@@@@@ FAIL: can't put import for os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid into FULL state after 40 sec, have
On the client1 (vm1) console log, we see
[83271.701328] Lustre: 17215:0:(lmv_obd.c:269:lmv_init_ea_size()) lustre-clilmv-ffff97f8f9748800: NULL export for 1 [83292.919916] Lustre: DEBUG MARKER: /usr/sbin/lctl mark conf-sanity test_93: @@@@@@ FAIL: mds2: import is not in FULL state after 40 [83293.136689] Lustre: DEBUG MARKER: conf-sanity test_93: @@@@@@ FAIL: mds2: import is not in FULL state after 40
This is the first time we’ve seen this issue for banch testing; 29 JAN 2020 for 2.12.3.109 DNE/ZFS.
In the past year, we’ve seen a similar issue twice when running testing for patches, but the patch being tested may have cause the failure.
Attachments
Issue Links
- mentioned in
-
Page Loading...