[LU-8553] incorrect fix in LU-7558 Created: 26/Aug/16 Updated: 22/Jul/18 Resolved: 22/Jul/18 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Alexey Lyashkov | Assignee: | Mikhail Pershin |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | ptlrpc | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Comments |
| Comment by Alexey Lyashkov [ 26/Aug/16 ] |
|
grr.. if (imp->imp_state == LUSTRE_IMP_CLOSED) { spin_unlock(&imp->imp_lock); CERROR("can't connect to a closed import\n"); RETURN(-EINVAL); } else if (imp->imp_state == LUSTRE_IMP_FULL) { spin_unlock(&imp->imp_lock); CERROR("already connected\n"); RETURN(0); } else if (imp->imp_state == LUSTRE_IMP_CONNECTING) { spin_unlock(&imp->imp_lock); CERROR("already connecting\n"); RETURN(-EALREADY); } so open a race to send new connect while import switched to the recovery. but changing a last conditional to the something like imp_state != LUSTRE_IMP_DISCON => return -EALREADY. solve same bug without new flag introduce. But that patch isn't solve second problem anyway. Second problem is introduced with many ptlrpcd threads added as part of SMP improvement work. first connect interpret scheduled to ptlrpcd thread X, but it thread may blocked for some time while processed other interprets like IO processing, so we may enter to situation when client send send second connect while first in flight. It should be easy reconnect with lctl tool which send a parallel connect request, or recovery vs pinger race. First connect send while replay request failed, second request send from pinger. First thread may blocked on cancel unused locks in that case. |
| Comment by Mikhail Pershin [ 22/Jul/18 ] |
|
Closing issue as an outdated one. Alexey, feel free to reopen if you think it still exists |