[LU-11206] recovery-mds-scale test failover_ost fails with 'import is not in FULL state' Created: 03/Aug/18 Updated: 01/Apr/19 Resolved: 27/Feb/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.1 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Patrick Farrell (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
recovery-mds-scale test_failover_ost fails with an OST in IDLE state when it should be in FULL state. From the test_log we can see that we successfully failover OSTs 55 times and, on the 56th failover, we see that one OST is in the IDLE state trevis-4vm8: trevis-4vm8.trevis.whamcloud.com: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-ffff*.ost_server_uuid,osc.lustre-OST0001-osc-ffff*.ost_server_uuid,osc.lustre-OST0002-osc-ffff*.ost_server_uuid,osc.lustre-OST0003-osc-ffff*.ost_server_uuid,osc.lustre-OST0004-osc-ffff*.ost_server_uuid,osc.lustre-OST0005-osc-ffff*.ost_server_uuid,osc.lustre-OST0006-osc-ffff*.ost_server_uuid trevis-4vm7: trevis-4vm7.trevis.whamcloud.com: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-ffff*.ost_server_uuid,osc.lustre-OST0001-osc-ffff*.ost_server_uuid,osc.lustre-OST0002-osc-ffff*.ost_server_uuid,osc.lustre-OST0003-osc-ffff*.ost_server_uuid,osc.lustre-OST0004-osc-ffff*.ost_server_uuid,osc.lustre-OST0005-osc-ffff*.ost_server_uuid,osc.lustre-OST0006-osc-ffff*.ost_server_uuid trevis-4vm8: CMD: trevis-4vm8.trevis.whamcloud.com lctl get_param -n at_max trevis-4vm7: CMD: trevis-4vm7.trevis.whamcloud.com lctl get_param -n at_max trevis-4vm8: osc.lustre-OST0000-osc-ffff*.ost_server_uuid in FULL state after 0 sec trevis-4vm7: osc.lustre-OST0000-osc-ffff*.ost_server_uuid in FULL state after 0 sec trevis-4vm8: osc.lustre-OST0001-osc-ffff*.ost_server_uuid in FULL state after 0 sec trevis-4vm7: osc.lustre-OST0001-osc-ffff*.ost_server_uuid in FULL state after 0 sec trevis-4vm8: osc.lustre-OST0002-osc-ffff*.ost_server_uuid in FULL state after 0 sec trevis-4vm7: osc.lustre-OST0002-osc-ffff*.ost_server_uuid in FULL state after 0 sec trevis-4vm8: osc.lustre-OST0003-osc-ffff*.ost_server_uuid in FULL state after 0 sec trevis-4vm7: osc.lustre-OST0003-osc-ffff*.ost_server_uuid in FULL state after 0 sec trevis-4vm7: osc.lustre-OST0004-osc-ffff*.ost_server_uuid in FULL state after 0 sec trevis-4vm7: osc.lustre-OST0005-osc-ffff*.ost_server_uuid in FULL state after 0 sec trevis-4vm7: osc.lustre-OST0006-osc-ffff*.ost_server_uuid in FULL state after 0 sec trevis-4vm8: rpc : @@@@@@ FAIL: can't put import for osc.lustre-OST0004-osc-ffff*.ost_server_uuid into FULL state after 1475 sec, have IDLE trevis-4vm8: Trace dump: trevis-4vm8: = /usr/lib64/lustre/tests/test-framework.sh:5742:error() trevis-4vm8: = /usr/lib64/lustre/tests/test-framework.sh:6846:_wait_import_state() trevis-4vm8: = /usr/lib64/lustre/tests/test-framework.sh:6868:wait_import_state() trevis-4vm8: = /usr/lib64/lustre/tests/test-framework.sh:6877:wait_import_state_mount() trevis-4vm8: = rpc.sh:21:main() I can’t find any stack traces or any Lustre errors that would explain what the issue is here. So far, we've only seen this for failover testing and only with ZFS. Logs for this failure are at There are two older test sessions that fail in a similar way, but there are no successful OST failovers when the fail with this error: |
| Comments |
| Comment by Jian Yu [ 25/Sep/18 ] |
|
conf-sanity test 93 failed with the same issue in DNE ZFS test sessions on master branch: https://testing.whamcloud.com/test_sets/fa748b22-c0f6-11e8-b143-52540065bddc It's affecting patch testing. |
| Comment by Patrick Farrell (Inactive) [ 11/Feb/19 ] |
|
Master: |
| Comment by Gerrit Updater [ 11/Feb/19 ] |
|
Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34225 |
| Comment by Gerrit Updater [ 27/Feb/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34225/ |
| Comment by Peter Jones [ 27/Feb/19 ] |
|
Landed for 2.13 |
| Comment by Gerrit Updater [ 20/Mar/19 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34471 |
| Comment by Gerrit Updater [ 01/Apr/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34471/ |