[LU-7185] restore flags on ptlrpc_connect_import failure to prevent LBUG Created: 18/Sep/15  Updated: 19/Aug/16  Resolved: 02/Jun/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Major
Reporter: Jeremy Filizetti Assignee: John Hammond
Resolution: Fixed Votes: 0
Labels: SSK, kerberos, patch

Issue Links:
Duplicate
is duplicated by LU-7452 (obd_class.h:815:obd_connect()) ASSER... Resolved
Related
is related to LU-3289 IU Shared Secret Key authentication a... Resolved
is related to LU-5319 Support multiple slots per client in ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Commit 1fc013f90175d1e50d7a22b404ad6abd31a43e38 in LU-5319 failed to restore the original flags ptlrpc_connect_import failure. Handle this gracefully so the system doesn't LBUG.

I'll submit a patch to gerrit when I get a chance to fix my keys. In the mean time the diff is here for reference.

diff --git a/lustre/ldlm/ldlm_lib.c b/lustre/ldlm/ldlm_lib.c
index 2203acb..b68d331 100644
--- a/lustre/ldlm/ldlm_lib.c
+++ b/lustre/ldlm/ldlm_lib.c
@@ -570,6 +570,8 @@ int client_connect_import(const struct lu_env *env,
 
         rc = ptlrpc_connect_import(imp);
         if (rc != 0) {
+               if (data && is_mdc)
+                       data->ocd_connect_flags &= ~OBD_CONNECT_MULTIMODRPCS;
                 LASSERT (imp->imp_state == LUSTRE_IMP_DISCON);
                 GOTO(out_ldlm, rc);
         }


 Comments   
Comment by Oleg Drokin [ 18/Sep/15 ]

What is the assertion being triggered?

Comment by Jeremy Filizetti [ 18/Sep/15 ]

I believe it was in obd_connect:

        LASSERT(ergo(data != NULL, (data->ocd_connect_flags & ocf) ==
                                    data->ocd_connect_flags));

I'll try to see if I still have the vmcore from a couple months ago when I fixed it.

Comment by Gerrit Updater [ 26/Oct/15 ]

Jeremy Filizetti (jeremy.filizetti@gmail.com) uploaded a new patch: http://review.whamcloud.com/16950
Subject: LU-7185 ldlm: Restore connect flags on failure
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 196ce70d791088adeb0d3f07c1c6f7659221ff5f

Comment by Gerrit Updater [ 02/Jun/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16950/
Subject: LU-7185 ldlm: Restore connect flags on failure
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9832fed2569ced118800f33222b02d1ac65312cb

Comment by Peter Jones [ 02/Jun/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:06:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.