[LU-3353] import_sec_validate_get() import ffff88061d4a7000 (FULL) with no sec Created: 15/May/13  Updated: 14/Jun/18  Resolved: 11/Feb/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.4
Fix Version/s: Lustre 2.7.0, Lustre 2.5.5

Type: Bug Priority: Critical
Reporter: Ned Bass Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: llnl

Issue Links:
Related
is related to LU-7030 import_sec_validate_get()) import fff... Closed
Severity: 3
Rank (Obsolete): 8288

 Description   

We started getting many of these messages on our 2.1.4 MDS. The timing corresponded with client reconnection problems, i.e. LU-1934.

2013-05-15 15:54:44 Lustre: lsd-MDT0000: Client a22a8a8a-754a-c1ab-859d-eda3c476a27d (at 192.168.112.1@o2ib6) reconnecting
2013-05-15 15:54:44 Lustre: Skipped 143 previous similar messages
2013-05-15 15:54:44 Lustre: lsd-MDT0000: Client a22a8a8a-754a-c1ab-859d-eda3c476a27d (at 192.168.112.1@o2ib6) refused reconnection, still busy with 1 active RPCs
2013-05-15 15:54:44 Lustre: Skipped 143 previous similar messages
2013-05-15 15:56:31 LustreError: 5204:0:(sec.c:385:import_sec_validate_get()) import ffff88061d4a7000 (FULL) with no sec
2013-05-15 15:56:31 LustreError: 5204:0:(sec.c:385:import_sec_validate_get()) Skipped 2399 previous similar messages

The MDS was just rebooted without a crash dump so, unless this reproduces, we won't be able to tell much about the imports in question.



 Comments   
Comment by Peter Jones [ 16/May/13 ]

Niu

Could you please comment on this one?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 16/May/13 ]

I'm not familiar with lustre security. Ned, do you know what kind of security flavor was specified for the cluster? Thanks.

Comment by Ned Bass [ 16/May/13 ]

I don't think we specify any security flavor, so it should be null.

Comment by Niu Yawei (Inactive) [ 17/May/13 ]

This looks like a race: when the export is being destroyed, it'll kill the imp_sec on it's reverse import, and in the meantime, there is still some inflight requests on it's reverse import, which triggered the error message of "import_sec_validate_get() import ... with no sec". I don't think it could cause any serious damange so far.

Comment by Alexey Lyashkov [ 08/Jan/14 ]

it's bug in reconnect logic, we are disconnect an rev import but lack an invalidate list of requests on import.
in that case lock cancel callback may still in sending queue with waiting sec context state and don't killed at all but it's need some network flap also.
i have crash dump with such situation.

Comment by Amir Shehata (Inactive) [ 02/May/14 ]

the core reason for the race condition, where a reverse import is destroyed while there are client bound requests in flight, is when a client reconnects. The area in the code where I could identify is in:

target_handle_connect()
{
...
 spin_lock(&export->exp_lock);
 if (export->exp_imp_reverse != NULL)
  /* destroyed import can be still referenced in ctxt */
  tmp_imp = export->exp_imp_reverse;
 export->exp_imp_reverse = revimp;
 spin_unlock(&export->exp_lock);
...
}

later on the tmp_imp is destroyed.

While it's being destroyed import_sec_validate_get() could be getting called, and the security could've been cleared already.

the suggested solution is to suppress the error message in import_sec_validate_get() if the import is being destroyed

Comment by Gerrit Updater [ 10/Dec/14 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/10200/
Subject: LU-3353 ptlrpc: Suppress error message when imp_sec is freed
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: edf6116663724467207422dcc0c6120320055cac

Comment by D. Marc Stearman (Inactive) [ 06/Jan/15 ]

Can a patch be landed for 2.5? We still see this on our production clusters, and it would be nice to reduce unneeded log messages.

Comment by Gerrit Updater [ 06/Jan/15 ]

James Simmons (uja.ornl@gmail.com) uploaded a new patch: http://review.whamcloud.com/13254
Subject: LU-3353 ptlrpc: Suppress error message when imp_sec is freed
Project: fs/lustre-release
Branch: b2_5
Current Patch Set: 1
Commit: 2d6469ec1a1ffc820cb7d5a6e18006a4e41bad19

Comment by Jodi Levi (Inactive) [ 11/Feb/15 ]

Patch landed to Master. Patch for other branches will be tracked outside of this ticket.

Generated at Sat Feb 10 01:33:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.