[LU-16478] faulty MDT connection can leak a reference to export Created: 15/Jan/23  Updated: 15/May/23  Resolved: 25/Mar/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Alex Zhuravlev Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11388 replay-single test_131b: test timeout Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

if target_handle_connect() races with an export eviction then the following scenario can happen:

  • mdt_obd_reconnect() -> .. nodemap_add_member() grabs a reference to the export
  • then target_handle_connect() finds the export invalide and exits with -ENODEV:
    	if (export->exp_disconnected) {
    		spin_unlock(&export->exp_lock);
    		GOTO(out, rc = -ENODEV);
    

    after umount won't be able to complete with the following symptoms:

    00000020:02000400:1.0:1673726953.882508:0:8583:0:(genops.c:1792:obd_exports_barrier()) lustre-MDT0000 is waiting for obd_unlinked_exports more than 7 seconds. The obd refcount = 4. Is it stuck?
    00000020:02000400:1.0:1673726953.889142:0:8583:0:(genops.c:1758:print_export_data()) lustre-MDT0000: UNLINKED 000000002760c5c2 5a4bd497-6ace-43e0-8e46-c6b0e7dc84ba 0@lo 1 (0 0 0) 1 0 1 0: 00000000e9920e55  4294967301 stale:0
    


 Comments   
Comment by Gerrit Updater [ 16/Jan/23 ]

"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49642
Subject: LU-16478 ldlm: don't reconnect disconnected exports
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 03a4fee9aa9d4a4430655becd796622d79343363

Comment by Gerrit Updater [ 17/Feb/23 ]

"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50041
Subject: LU-16478 tests: a reproducer
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 95c65f97cdc9cc28935518a3f5d755d4bc416d56

Comment by Gerrit Updater [ 21/Mar/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50041/
Subject: LU-16478 target: disconnected export
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 654d5f3fa4df2a0f7275a6da0f050a18881f4f75

Comment by Peter Jones [ 25/Mar/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:27:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.