[LU-5569] recreating a reverse import produce a various fails. Created: 02/Sep/14 Updated: 15/Mar/19 Resolved: 19/Sep/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Alexey Lyashkov | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HB, patch, ptlrpc | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 15532 | ||||||||||||||||||||||||||||||||
| Description |
|
Don't reallocate a new reverse import for each client reconnect. First problem is send_rpc vs class_destroy_import() race. If sending Second problem, Target_handle_connect function stop an update a Target_handle_connect function stops update connection information Third problem, connection flags aren't updates atomically for an Fourth problem, client reconnecting after network flap have result some examples 00000100:00100000:1.0:1407845348.937766:0:62024:0:(service.c:1929:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_419:4960df0f-75ed-07a2-cee7-063090dc59cd+4:19257:x1475700821793316:12345-1748@gni1:8 Request procesed in 55us (106us total) trans 0 rc 0/0 00000100:00020000:1.0:1407845393.600747:0:81897:0:(client.c:1115:ptlrpc_import_delay_req()) @@@ req wrong generation: req@ffff880304e39800 x1475078782385806/t0(0) o105->snx11063-OST0070@1748@gni1:15/16 lens 344/192 e 0 to 1 dl 1407845389 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 |
| Comments |
| Comment by Alexey Lyashkov [ 02/Sep/14 ] |
|
patch to tests for such bugs. |
| Comment by Johann Lombardi (Inactive) [ 02/Sep/14 ] |
|
Alexey, in gerrit 9335, you mentioned a "data corruption" problem. Could you please elaborate and explain why this issue only shows up with the AST resend patch? Thanks in advance |
| Comment by Johann Lombardi (Inactive) [ 03/Sep/14 ] |
|
I talked to Alexey on Skype to get an answer to the question above. The problem is that ldlm_handle_ast_error() doesn't evict the client in some error cases (he mentioned EIO & EPROTO) where the AST wasn't delivered or properly processed by the client. The server then cancels the lock locally and grants the conflicting lock while the client still thinks it owns a valid lock and might continue writing to the file. |
| Comment by Alexey Lyashkov [ 04/Sep/14 ] |
| Comment by Chris Horn [ 04/Sep/14 ] |
Alexey, I'm having trouble understanding your description of this fourth problem. Is the following description correct? When a client reconnects after a network flap we do not currently wakeup any RPCs in the (reverse) import queue (specifically the imp_sending_list of the reverse import). This means we need to wait for the original request to timeout before the server can resend the request. |
| Comment by Alexey Lyashkov [ 05/Sep/14 ] |
|
Chris, you have a correct description. Thanks for rephase! |
| Comment by Oleg Drokin [ 05/Sep/14 ] |
|
I want to note here too that the test_10d added by one of the patches fails 100% of the time in testing, which implies that it's either incorrect, or the actual fix fails at fixing the issue at hand. |
| Comment by Alexey Lyashkov [ 05/Sep/14 ] |
|
Oleg, which patch you point to? first patch (test only) should be failed in new added tests, second patch should be don't fail as fixes issue. |
| Comment by Chris Horn [ 05/Sep/14 ] |
|
We discovered the bugs addressed by this change while investigating some non-POSIX compliant behavior exhibited by Lustre. Below is a description of the problem based on my own understanding and the fixes that were proposed to address it. Higher layers of Lustre (CLIO) generally rely on lower layers to enforce POSIX compliance. In this case, the Lustre Distributed Lock Manager (LDLM) and ptlrpc layers are interacting in such a way that results in inappropriate errors being returned to the client. The interaction revolves around a pair of clients performing I/O. One client (the writer) creates and writes data to a single file striped across eight OSTs. A second client (the reader) reads the data written by the writer. The reader requests a protected read lock for each stripe of the file from the corresponding OST. Upon receipt of the lock enqueue request, the OSS notes that it has already granted a conflicting lock to the writer. As a result the server sends a blocking AST (BL AST) to the writer. This is a request that the writer cancel its lock so that it may be granted to the reader. Upon receipt of the BL AST the writer should first reply that it has received the BL AST, and then, after flushing any pages to the server, send a lock cancel request back to the server. When the server receives both the reply to the BL AST and the lock cancel request it can then grant the lock to the reader via a completion AST (CP AST). The server sends a CP AST to the reader who must then acknowledge receipt of the CP AST before it can use the resource covered by the requested lock. In Lustre, different components communicate via import and export pairs. An import is for sending requests and receiving replies, and an export is for receiving requests and sending replies. However, it is not possible to send a request via an export. As a result, servers utilize a reverse import to send AST requests to clients. A reverse import converts an import and export pair into a corresponding export and import pair. Currently, a new reverse import is created whenever the server creates an export for a client (re)connection. This prevents us from being able to re-send requests that were linked to the old reverse import. Historically this has not been problematic as servers did not have the ability to resend ASTs. With This particular bug (there are other potential flavors) arises if the reader reconnects to an OST granting the lock while the OSS is trying to deliver the CP AST (or, equivalently, if the client is unable to acknowledge receipt of the CP AST). Based on Lustre trace data we determined that an OSS was unable to deliver a CP AST to the reader. While the OSS was waiting for the CP AST acknowledgement the reader reconnected to the OST granting the lock. As mentioned above, this created a new reverse import for this client. When the OSS attempted to resend the CP AST (after a timeout) it found that the old import for the reader had been destroyed. It was thus unable to re-send the request, and the request was immediately failed with a status of -EIO. When LDLM interpreted the failed request, it did not handle the -EIO request status appropriately. LDLM converted the -EIO error into -EAGAIN which was then returned to the client. Two fixes are proposed to address different aspects of this bug:
2. Server-side: change reverse import life cycle:
|
| Comment by Oleg Drokin [ 04/Feb/15 ] |
|
Ok. So this is a problem that has been there for quite a while I see? In other words I guess I am asking does anybody have any compelling reasons for why this should basically be treated differently than what's described above (i.e. as a blocker because I am overlooking something and it's a new disasterous failure of epic proportiions instead?) |
| Comment by Cory Spitz [ 19/Aug/15 ] |
|
http://review.whamcloud.com/#/c/11750 still needs help getting through review. |
| Comment by James A Simmons [ 16/Sep/15 ] |
|
It failed Oleg's review process. He list the backtrace he gotten. |
| Comment by Alexey Lyashkov [ 17/Sep/15 ] |
|
Oleg's backtraces related to the different bug. It's related to the wrong obd device release process where we may hold a any export live after obd device freed aka but i still not able to update due lack access to Intel Gerrit with google account and password login blocked by intel admins. |
| Comment by James A Simmons [ 17/Sep/15 ] |
|
Alex email your latest |
| Comment by Gerrit Updater [ 19/Sep/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11750/ |
| Comment by Peter Jones [ 19/Sep/15 ] |
|
Landed for 2.8 |
| Comment by Andreas Dilger [ 26/Oct/15 ] |
|
This patch caused a regression on master. See |