Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.11.0
-
None
-
3
-
9223372036854775807
Description
recovery-mds-scale test_failover_mds - test_failover_mds returned 1
^^^^^^^^^^^^^ DO NOT REMOVE LINE ABOVE ^^^^^^^^^^^^^
This issue was created by maloo for sarah_lw <wei3.liu@intel.com>
This issue relates to the following test suite run:
https://testing.hpdd.intel.com/test_sets/4929a310-fded-11e7-bd00-52540065bddc
test_failover_mds failed with the following error:
test_failover_mds returned 1
OSS dmesg
[ 1605.917547] Lustre: DEBUG MARKER: ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=79 DURATION=86400 PERIOD=1200 [ 1611.495999] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Wait mds1 recovery complete before doing next failover... [ 1611.675012] Lustre: DEBUG MARKER: Wait mds1 recovery complete before doing next failover... [ 1612.443877] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-41vm8.onyx.hpdd.intel.com: executing _wait_recovery_complete *.lustre-MDT0000.recovery_status 1475 [ 1612.612480] Lustre: DEBUG MARKER: onyx-41vm8.onyx.hpdd.intel.com: executing _wait_recovery_complete *.lustre-MDT0000.recovery_status 1475 [ 1612.844001] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Checking clients are in FULL state before doing next failover... [ 1613.319217] Lustre: DEBUG MARKER: Checking clients are in FULL state before doing next failover... [ 1614.125969] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-41vm4.onyx.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid [ 1614.512767] Lustre: DEBUG MARKER: onyx-41vm4.onyx.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid [ 1614.738444] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec [ 1615.846397] Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec [ 1618.441045] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-41vm3.onyx.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid [ 1626.408858] Lustre: DEBUG MARKER: onyx-41vm3.onyx.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid [ 1641.069651] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec [ 1648.149579] Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec [ 1648.497647] LNet: Service thread pid 31239 was inactive for 42.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 1648.499403] Pid: 31239, comm: ll_ost_io00_025 [ 1648.499841] Call Trace: [ 1648.500246] [<ffffffff816ab6b9>] schedule+0x29/0x70 [ 1648.500869] [<ffffffff816a9004>] schedule_timeout+0x174/0x2c0 [ 1648.501498] [<ffffffff811de9fe>] ? kmalloc_order_trace+0x2e/0xa0 [ 1648.502102] [<ffffffff8109a6c0>] ? process_timeout+0x0/0x10 [ 1648.502788] [<ffffffffc0a4e4de>] target_bulk_io+0x4ae/0xab0 [ptlrpc] [ 1648.503443] [<ffffffff810c6440>] ? default_wake_function+0x0/0x20 [ 1648.504086] [<ffffffffc0af8326>] tgt_brw_write+0x1866/0x1d50 [ptlrpc] [ 1648.504830] [<ffffffffc0a4bec0>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc] [ 1648.505518] [<ffffffffc0afa965>] tgt_request_handle+0x925/0x13b0 [ptlrpc] [ 1648.506221] [<ffffffffc0a9ec7e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc] [ 1648.507062] [<ffffffffc0aa2422>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [ 1648.507719] [<ffffffffc0aa1990>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc] [ 1648.508390] [<ffffffff810b252f>] kthread+0xcf/0xe0 [ 1648.508898] [<ffffffff810b2460>] ? kthread+0x0/0xe0 [ 1648.509447] [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [ 1648.509983] [<ffffffff810b2460>] ? kthread+0x0/0xe0