[LU-13179] conf-sanity test 93 fails with ''mds2: import is not in FULL state after 40'' Created: 31/Jan/20 Updated: 31/Jan/20 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
DNE/ZFS |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
conf-sanity tes_93 fails with ''mds2: import is not in FULL state after 40''. Looking at the console log for MDS 2/4 (vm5) for the failure at https://testing.whamcloud.com/test_sets/1e5610e6-43ab-11ea-8072-52540065bddc, we see that trevis-42vm5: trevis-42vm5.trevis.whamcloud.com: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid 40 CMD: trevis-42vm4 zfs get -H -o value lustre:svname lustre-mdt3/mdt3 Starting mds3: lustre-mdt3/mdt3 /mnt/lustre-mds3 CMD: trevis-42vm4 mkdir -p /mnt/lustre-mds3; mount -t lustre lustre-mdt3/mdt3 /mnt/lustre-mds3 CMD: trevis-42vm5 zfs get -H -o value lustre:svname lustre-mdt4/mdt4 Starting mds4: lustre-mdt4/mdt4 /mnt/lustre-mds4 CMD: trevis-42vm5 mkdir -p /mnt/lustre-mds4; mount -t lustre lustre-mdt4/mdt4 /mnt/lustre-mds4 trevis-42vm5: rpc : @@@@@@ FAIL: can't put import for os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid into FULL state after 40 sec, have trevis-42vm5: Trace dump: trevis-42vm5: = /usr/lib64/lustre/tests/test-framework.sh:5900:error() trevis-42vm5: = /usr/lib64/lustre/tests/test-framework.sh:7027:_wait_import_state() trevis-42vm5: = /usr/lib64/lustre/tests/test-framework.sh:7049:wait_import_state() In all the console logs, we just get confirmation that the MSD1 can’t connect in the allotted time limit. Looking at the console log for the MDS1/3 (vm4), we see [83050.530881] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-42vm5.trevis.whamcloud.com: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid 40 [83050.717460] Lustre: DEBUG MARKER: zfs get -H -o value lustre:svname lustre-mdt3/mdt3 [83050.753501] Lustre: DEBUG MARKER: trevis-42vm5.trevis.whamcloud.com: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid 40 [83051.092507] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds3; mount -t lustre lustre-mdt3/mdt3 /mnt/lustre-mds3 [83064.412617] LustreError: 3861:0:(fail.c:129:__cfs_fail_timeout_set()) cfs_fail_timeout id 90e sleeping for 10000ms [83064.414535] LustreError: 3861:0:(fail.c:129:__cfs_fail_timeout_set()) Skipped 76 previous similar messages [83074.420718] LustreError: 3861:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 90e awake [83074.422537] LustreError: 3861:0:(fail.c:133:__cfs_fail_timeout_set()) Skipped 76 previous similar messages [83089.482356] Lustre: lustre-MDT0002: Imperative Recovery not enabled, recovery window 60-180 [83089.484385] Lustre: Skipped 6 previous similar messages [83090.262090] Lustre: cli-ctl-lustre-MDT0002: Allocated super-sequence [0x0000000280000400-0x00000002c0000400]:2:mdt] [83091.385126] Lustre: DEBUG MARKER: /usr/sbin/lctl mark rpc : @@@@@@ FAIL: can\'t put import for os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid into FULL state after 40 sec, have [83091.610883] Lustre: DEBUG MARKER: rpc : @@@@@@ FAIL: can't put import for os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid into FULL state after 40 sec, have Looking at the console log for the OSS (vm3), we see [83053.780999] Lustre: DEBUG MARKER: == rpc test complete, duration -o sec ================================================================ 00:08:41 (1580342921) [83054.152945] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-42vm5.trevis.whamcloud.com: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid 40 [83054.372472] Lustre: DEBUG MARKER: trevis-42vm5.trevis.whamcloud.com: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid 40 [83092.153168] Lustre: cli-lustre-OST0000-super: Allocated super-sequence [0x0000000240000400-0x0000000280000400]:0:ost] [83092.155157] Lustre: Skipped 2 previous similar messages [83094.995539] Lustre: DEBUG MARKER: /usr/sbin/lctl mark rpc : @@@@@@ FAIL: can\'t put import for os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid into FULL state after 40 sec, have [83095.232157] Lustre: DEBUG MARKER: rpc : @@@@@@ FAIL: can't put import for os[cp].lustre-OST0000-osc-MDT0001.ost_server_uuid into FULL state after 40 sec, have On the client1 (vm1) console log, we see [83271.701328] Lustre: 17215:0:(lmv_obd.c:269:lmv_init_ea_size()) lustre-clilmv-ffff97f8f9748800: NULL export for 1 [83292.919916] Lustre: DEBUG MARKER: /usr/sbin/lctl mark conf-sanity test_93: @@@@@@ FAIL: mds2: import is not in FULL state after 40 [83293.136689] Lustre: DEBUG MARKER: conf-sanity test_93: @@@@@@ FAIL: mds2: import is not in FULL state after 40 This is the first time we’ve seen this issue for banch testing; 29 JAN 2020 for 2.12.3.109 DNE/ZFS. In the past year, we’ve seen a similar issue twice when running testing for patches, but the patch being tested may have cause the failure. |