[LU-10623] recovery-mds-scale test_failover_mds: test_failover_mds returned 1 Created: 07/Feb/18 Updated: 27/Sep/18 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
recovery-mds-scale test_failover_mds - test_failover_mds returned 1 This issue was created by maloo for sarah_lw <wei3.liu@intel.com> This issue relates to the following test suite run: test_failover_mds failed with the following error: test_failover_mds returned 1 OSS dmesg [ 1605.917547] Lustre: DEBUG MARKER: ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=79 DURATION=86400 PERIOD=1200 [ 1611.495999] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Wait mds1 recovery complete before doing next failover... [ 1611.675012] Lustre: DEBUG MARKER: Wait mds1 recovery complete before doing next failover... [ 1612.443877] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-41vm8.onyx.hpdd.intel.com: executing _wait_recovery_complete *.lustre-MDT0000.recovery_status 1475 [ 1612.612480] Lustre: DEBUG MARKER: onyx-41vm8.onyx.hpdd.intel.com: executing _wait_recovery_complete *.lustre-MDT0000.recovery_status 1475 [ 1612.844001] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Checking clients are in FULL state before doing next failover... [ 1613.319217] Lustre: DEBUG MARKER: Checking clients are in FULL state before doing next failover... [ 1614.125969] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-41vm4.onyx.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid [ 1614.512767] Lustre: DEBUG MARKER: onyx-41vm4.onyx.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid [ 1614.738444] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec [ 1615.846397] Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec [ 1618.441045] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-41vm3.onyx.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid [ 1626.408858] Lustre: DEBUG MARKER: onyx-41vm3.onyx.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid [ 1641.069651] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec [ 1648.149579] Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec [ 1648.497647] LNet: Service thread pid 31239 was inactive for 42.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 1648.499403] Pid: 31239, comm: ll_ost_io00_025 [ 1648.499841] Call Trace: [ 1648.500246] [<ffffffff816ab6b9>] schedule+0x29/0x70 [ 1648.500869] [<ffffffff816a9004>] schedule_timeout+0x174/0x2c0 [ 1648.501498] [<ffffffff811de9fe>] ? kmalloc_order_trace+0x2e/0xa0 [ 1648.502102] [<ffffffff8109a6c0>] ? process_timeout+0x0/0x10 [ 1648.502788] [<ffffffffc0a4e4de>] target_bulk_io+0x4ae/0xab0 [ptlrpc] [ 1648.503443] [<ffffffff810c6440>] ? default_wake_function+0x0/0x20 [ 1648.504086] [<ffffffffc0af8326>] tgt_brw_write+0x1866/0x1d50 [ptlrpc] [ 1648.504830] [<ffffffffc0a4bec0>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc] [ 1648.505518] [<ffffffffc0afa965>] tgt_request_handle+0x925/0x13b0 [ptlrpc] [ 1648.506221] [<ffffffffc0a9ec7e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc] [ 1648.507062] [<ffffffffc0aa2422>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [ 1648.507719] [<ffffffffc0aa1990>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc] [ 1648.508390] [<ffffffff810b252f>] kthread+0xcf/0xe0 [ 1648.508898] [<ffffffff810b2460>] ? kthread+0x0/0xe0 [ 1648.509447] [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [ 1648.509983] [<ffffffff810b2460>] ? kthread+0x0/0xe0 |