[LU-12559] Reconnecting from idle exposes import in IMP_NEW state, resulting in EIO Created: 16/Jul/19  Updated: 04/Oct/19  Resolved: 17/Sep/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Bug Priority: Major
Reporter: Patrick Farrell (Inactive) Assignee: Patrick Farrell (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Rank (Obsolete): 9223372036854775807

 Description   

When reconnecting idle imports (in ptlrpc_request_alloc_internal and ptlrpc_disconnect_idle_interpret), we set the import state to IMP_NEW, and then release the import lock before immediately calling ptlrpc_connect_import, which immediately takes the import lock.

However, there's a small gap where an import in the IMP_NEW state is exposed.

This can cause messages sent in this interval to fail, like this:

ptlrpc_import_delay_req()) @@@ Uninitialized import. req@[...]x1638589897143744/t0(0) o101->[....]-OST0000-osc-[...]@o2ib:28/4 lens 328/400 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1
        } else if (imp->imp_state == LUSTRE_IMP_NEW) {
                DEBUG_REQ(D_ERROR, req, "Uninitialized import.");
                *status = -EIO;

The solution is to not release the import lock, so this gap does not exist.



 Comments   
Comment by Gerrit Updater [ 16/Jul/19 ]

Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35530
Subject: LU-12559 ptlrpc: Hold imp lock for idle reconnect
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f31f64e77d5664ade4b40859263afc062861283c

Comment by Gerrit Updater [ 17/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35530/
Subject: LU-12559 ptlrpc: Hold imp lock for idle reconnect
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e9472c54ac820c3a0db2318a6ef894c3971e6e0b

Comment by Peter Jones [ 17/Sep/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 17/Sep/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36215
Subject: LU-12559 ptlrpc: Hold imp lock for idle reconnect
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: ae0990a1183993e7e05c864a033ff771b72cc523

Comment by Gerrit Updater [ 04/Oct/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36215/
Subject: LU-12559 ptlrpc: Hold imp lock for idle reconnect
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: df5d7a7816e8472397e5f99fd2d44d4cd2a4754d

Generated at Sat Feb 10 02:53:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.