[LU-11386] insanity test 0 hangs on failing over MDT Created: 17/Sep/18  Updated: 17/Sep/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: DNE
Environment:

DNE


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

insanity test 0 started hanging on July 19, 2018 for review-dne-part-4 with https://testing.whamcloud.com/test_sets/7b3d1b66-8af1-11e8-9028-52540065bddc .

insanity test 0 fails all MDTs and fails all OSTs. For each successful failover, you will see the target being failed, rebooted, and the target being started like

Failing mds1 on onyx-41vm9
…
reboot facets: mds1
Failover mds1 to onyx-41vm9
…
mount facets: mds1
…
Starting mds1:   /dev/mapper/mds1_flakey /mnt/lustre-mds1
…
Started lustre-MDT0000

We should see this for each MDT and for each OST. Yet, in the tests that hang, for example https://testing.whamcloud.com/test_sets/3f2ce6e0-ba91-11e8-8c12-52540065bddc, we see the first three MDTs fail, reboot and mount, but not the fourth MDT. The last thing we see in the suite_log is the third MDT starting

CMD: trevis-37vm4 e2label /dev/mapper/mds3_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
CMD: trevis-37vm4 e2label /dev/mapper/mds3_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
CMD: trevis-37vm4 e2label /dev/mapper/mds3_flakey 2>/dev/null
Started lustre-MDT0002

Looking at the console log of the fourth MDT (vm5), we don’t see anything that indicates there is problem with MDT0003, the fourth MDT. In fact, there is nothing obviously wrong with any of the nodes by looking at their console logs.

In all of these failures, the failover of one of the MDTs hangs. Here are links to logs for more insanity test 0’s that hang
https://testing.whamcloud.com/test_sets/f7b26fc6-b9b7-11e8-8c12-52540065bddc
https://testing.whamcloud.com/test_sets/3dc54fa2-adfa-11e8-bbd1-52540065bddc
https://testing.whamcloud.com/test_sets/e0b3ea6e-8f5c-11e8-b0aa-52540065bddc


Generated at Sat Feb 10 02:43:25 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.