[LU-14330] Interop: recovery-small test 143 fails with 'MDD orphan cleanup thread not quit' Created: 13/Jan/21  Updated: 22/Jan/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: interop, tests

Issue Links:
Related
is related to LU-12747 sanity: test 811 fail with "MDD orpha... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

recovery-small test_143 fails for interop testing starting on 19 APRIL 2020 for Lustre server version < 2.13.53.62 and Lustre client version >= 2.13.53.62. This failure does not happen for Lustre servers 2.12.5 and 2.12.6, but we do see this failure for 2.13.0 servers.

Looking at suite_log for the latest failure at https://testing.whamcloud.com/test_sets/8adef6a4-82c3-4286-811b-c3600c371395, we can still see MDD orphan threads

trevis-17vm4: trevis-17vm4.trevis.whamcloud.com: executing _wait_recovery_complete *.lustre-MDT0000.recovery_status 1475
trevis-17vm4: *.lustre-MDT0000.recovery_status status: COMPLETE
CMD: trevis-17vm4 pgrep orph_.*-MDD | wc -l
Waiting 90s for '0'
CMD: trevis-17vm4 pgrep orph_.*-MDD | wc -l
…
CMD: trevis-17vm4 pgrep orph_.*-MDD | wc -l
Update not seen after 90s: want '0' got '1'
 recovery-small test_143: @@@@@@ FAIL: MDD orphan cleanup thread not quit 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6273:error()
  = /usr/lib64/lustre/tests/recovery-small.sh:3030:test_143()

Generated at Sat Feb 10 03:08:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.