[LU-3353] import_sec_validate_get() import ffff88061d4a7000 (FULL) with no sec Created: 15/May/13 Updated: 14/Jun/18 Resolved: 11/Feb/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.4 |
| Fix Version/s: | Lustre 2.7.0, Lustre 2.5.5 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Ned Bass | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnl | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 8288 | ||||||||
| Description |
|
We started getting many of these messages on our 2.1.4 MDS. The timing corresponded with client reconnection problems, i.e. 2013-05-15 15:54:44 Lustre: lsd-MDT0000: Client a22a8a8a-754a-c1ab-859d-eda3c476a27d (at 192.168.112.1@o2ib6) reconnecting 2013-05-15 15:54:44 Lustre: Skipped 143 previous similar messages 2013-05-15 15:54:44 Lustre: lsd-MDT0000: Client a22a8a8a-754a-c1ab-859d-eda3c476a27d (at 192.168.112.1@o2ib6) refused reconnection, still busy with 1 active RPCs 2013-05-15 15:54:44 Lustre: Skipped 143 previous similar messages 2013-05-15 15:56:31 LustreError: 5204:0:(sec.c:385:import_sec_validate_get()) import ffff88061d4a7000 (FULL) with no sec 2013-05-15 15:56:31 LustreError: 5204:0:(sec.c:385:import_sec_validate_get()) Skipped 2399 previous similar messages The MDS was just rebooted without a crash dump so, unless this reproduces, we won't be able to tell much about the imports in question. |
| Comments |
| Comment by Peter Jones [ 16/May/13 ] |
|
Niu Could you please comment on this one? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 16/May/13 ] |
|
I'm not familiar with lustre security. Ned, do you know what kind of security flavor was specified for the cluster? Thanks. |
| Comment by Ned Bass [ 16/May/13 ] |
|
I don't think we specify any security flavor, so it should be null. |
| Comment by Niu Yawei (Inactive) [ 17/May/13 ] |
|
This looks like a race: when the export is being destroyed, it'll kill the imp_sec on it's reverse import, and in the meantime, there is still some inflight requests on it's reverse import, which triggered the error message of "import_sec_validate_get() import ... with no sec". I don't think it could cause any serious damange so far. |
| Comment by Alexey Lyashkov [ 08/Jan/14 ] |
|
it's bug in reconnect logic, we are disconnect an rev import but lack an invalidate list of requests on import. |
| Comment by Amir Shehata (Inactive) [ 02/May/14 ] |
|
the core reason for the race condition, where a reverse import is destroyed while there are client bound requests in flight, is when a client reconnects. The area in the code where I could identify is in: target_handle_connect()
{
...
spin_lock(&export->exp_lock);
if (export->exp_imp_reverse != NULL)
/* destroyed import can be still referenced in ctxt */
tmp_imp = export->exp_imp_reverse;
export->exp_imp_reverse = revimp;
spin_unlock(&export->exp_lock);
...
}
later on the tmp_imp is destroyed. While it's being destroyed import_sec_validate_get() could be getting called, and the security could've been cleared already. the suggested solution is to suppress the error message in import_sec_validate_get() if the import is being destroyed |
| Comment by Gerrit Updater [ 10/Dec/14 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/10200/ |
| Comment by D. Marc Stearman (Inactive) [ 06/Jan/15 ] |
|
Can a patch be landed for 2.5? We still see this on our production clusters, and it would be nice to reduce unneeded log messages. |
| Comment by Gerrit Updater [ 06/Jan/15 ] |
|
James Simmons (uja.ornl@gmail.com) uploaded a new patch: http://review.whamcloud.com/13254 |
| Comment by Jodi Levi (Inactive) [ 11/Feb/15 ] |
|
Patch landed to Master. Patch for other branches will be tracked outside of this ticket. |