[LU-10623] recovery-mds-scale test_failover_mds: test_failover_mds returned 1 Created: 07/Feb/18  Updated: 27/Sep/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

recovery-mds-scale test_failover_mds - test_failover_mds returned 1
^^^^^^^^^^^^^ DO NOT REMOVE LINE ABOVE ^^^^^^^^^^^^^

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run:
https://testing.hpdd.intel.com/test_sets/4929a310-fded-11e7-bd00-52540065bddc

test_failover_mds failed with the following error:

test_failover_mds returned 1

OSS dmesg

[ 1605.917547] Lustre: DEBUG MARKER: ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=79 DURATION=86400 PERIOD=1200
[ 1611.495999] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Wait mds1 recovery complete before doing next failover...
[ 1611.675012] Lustre: DEBUG MARKER: Wait mds1 recovery complete before doing next failover...
[ 1612.443877] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-41vm8.onyx.hpdd.intel.com: executing _wait_recovery_complete *.lustre-MDT0000.recovery_status 1475
[ 1612.612480] Lustre: DEBUG MARKER: onyx-41vm8.onyx.hpdd.intel.com: executing _wait_recovery_complete *.lustre-MDT0000.recovery_status 1475
[ 1612.844001] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Checking clients are in FULL state before doing next failover...
[ 1613.319217] Lustre: DEBUG MARKER: Checking clients are in FULL state before doing next failover...
[ 1614.125969] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-41vm4.onyx.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[ 1614.512767] Lustre: DEBUG MARKER: onyx-41vm4.onyx.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[ 1614.738444] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
[ 1615.846397] Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
[ 1618.441045] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-41vm3.onyx.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[ 1626.408858] Lustre: DEBUG MARKER: onyx-41vm3.onyx.hpdd.intel.com: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[ 1641.069651] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
[ 1648.149579] Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
[ 1648.497647] LNet: Service thread pid 31239 was inactive for 42.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[ 1648.499403] Pid: 31239, comm: ll_ost_io00_025
[ 1648.499841] 
Call Trace:
[ 1648.500246]  [<ffffffff816ab6b9>] schedule+0x29/0x70
[ 1648.500869]  [<ffffffff816a9004>] schedule_timeout+0x174/0x2c0
[ 1648.501498]  [<ffffffff811de9fe>] ? kmalloc_order_trace+0x2e/0xa0
[ 1648.502102]  [<ffffffff8109a6c0>] ? process_timeout+0x0/0x10
[ 1648.502788]  [<ffffffffc0a4e4de>] target_bulk_io+0x4ae/0xab0 [ptlrpc]
[ 1648.503443]  [<ffffffff810c6440>] ? default_wake_function+0x0/0x20
[ 1648.504086]  [<ffffffffc0af8326>] tgt_brw_write+0x1866/0x1d50 [ptlrpc]
[ 1648.504830]  [<ffffffffc0a4bec0>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc]
[ 1648.505518]  [<ffffffffc0afa965>] tgt_request_handle+0x925/0x13b0 [ptlrpc]
[ 1648.506221]  [<ffffffffc0a9ec7e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
[ 1648.507062]  [<ffffffffc0aa2422>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[ 1648.507719]  [<ffffffffc0aa1990>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc]
[ 1648.508390]  [<ffffffff810b252f>] kthread+0xcf/0xe0
[ 1648.508898]  [<ffffffff810b2460>] ? kthread+0x0/0xe0
[ 1648.509447]  [<ffffffff816b8798>] ret_from_fork+0x58/0x90
[ 1648.509983]  [<ffffffff810b2460>] ? kthread+0x0/0xe0

Generated at Sat Feb 10 02:36:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.