[LU-1573] avoid data corruption for direct io data Created: 27/Jun/12 Updated: 03/Feb/17 Resolved: 03/Feb/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Alexey Lyashkov | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
any lustre version |
||
| Issue Links: |
|
||||
| Story Points: | 4 | ||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 4001 | ||||
| Description |
|
when we call a shutdown (without -f) we a set a 'notransno' flag to put all requests in replay queue. case 'A': LCONSOLE_WARN("Failing over %s\n", obd->obd_name); obd->obd_fail = 1; obd->obd_no_transno = 1; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ if that will be raced with obd_commitrw_write() which process a DIO request - reply will be sent without last_commited trasno update. If user land application will be reused a same buffer for different data after exit from write(2) call, ptlrpc will started to replay that request, but send a invalid data to the OST. so we a corrupt data on OST side. replicate that bug is very easy. we found that bug in testing DIO under failover. |
| Comments |
| Comment by Alexey Lyashkov [ 27/Jun/12 ] |
|
remote: http://review.whamcloud.com/3197 patch created as port of 2.1 code, so question about create same for osd_io.c and possible for osd-zfs. |
| Comment by Andreas Dilger [ 04/Jul/12 ] |
|
Alex, I was looking to see how this would apply to OFD/osd-ldiskfs/osd-zfs, but it is not at all obvious due to changes in this code. Could you (or Mike, as you see fit) please look into porting this patch for the updated OFD/OSD code. |
| Comment by Alex Zhuravlev [ 05/Jul/12 ] |
|
shouldn't we just always set transno to 0 on OSS side for sync requests? |
| Comment by Oleg Drokin [ 05/Jul/12 ] |
|
I personally think this is not really a "critical" issue. Sure the corruption is real, but the condition is not all that normal. Soft failovers are usually only done in testing for simplicity and speed, or for administrative reasons. I guess the only case not like this is the fail back once the primary node comes back online. |
| Comment by Alexey Lyashkov [ 05/Jul/12 ] |
|
Alex, for sync requests - we need always reply with request transno <= committed transno, or don't send reply at all. |
| Comment by Alex Zhuravlev [ 05/Jul/12 ] |
|
for sync request transno makes no sense, IMHO. |
| Comment by Nathan Rutman [ 21/Nov/12 ] |
|
Xyratex-bug-id: MRP-542 |
| Comment by Alex Zhuravlev [ 18/Dec/12 ] |
|
not sure what's expected in this ticket.. as I said before, if RPC is not a subject to replay, then just do not set transno in reply. |
| Comment by Alexey Lyashkov [ 15/Mar/13 ] |
|
Alex, that is not enough because DIO page have a pointer to user page, instead of buffered IO have a copy user data inside. |
| Comment by Alex Zhuravlev [ 15/Mar/13 ] |
|
sorry, I don't follow... it doesn't matter where the buffers are, imo. sync request means OST replies when the write is committed. in turn, this mean recovery/replay is not needed in the first place. then setting transno to 0 should be enough. |
| Comment by Alexey Lyashkov [ 27/Mar/13 ] |
|
when we have set a no_trasno - we have stop sending a transo updates to a clients (same way used to create recovery barrier in tests). after it - all requests put in recovery queue as rq_transno isn't less then last_transo from a clients. Ptlrpc don't have a differences between buffered write (where kernel have a copy of userspace data) and DIO - don't have a copy and have a direct pointer to userspace. both types of requests put in replay queue to able to replay after fail. that description enough? or that is fixes in 2.4 in different way? may you verify DIO requests don't put in recovery queue after recovery barrier? |
| Comment by Alex Zhuravlev [ 27/Mar/13 ] |
|
you're confusing last_committed (which we do not update when notransno flag is set) and per-rpc transno which we set always: void target_committed_to_req(struct ptlrpc_request *req) if (!exp->exp_obd->obd_no_transno && req->rq_repmsg != NULL) |
| Comment by Alex Zhuravlev [ 26/Apr/13 ] |
|
as per discussion with Alexey at LUG: DIO should be waiting for IO completion on OST (otherwise it can get stuck awaiting for last transno to be transfered by ping), given so it makes sense to set transno to 0 in this case. |
| Comment by Alexey Lyashkov [ 26/Apr/13 ] |
|
so my original patch looks fine. that is drop a reply for a no trasno mode, simulating to waiting a commit from journal and prevent a moving from sending to replay list. |
| Comment by Gerrit Updater [ 18/Jan/17 ] |
|
James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/24940 |
| Comment by Gerrit Updater [ 03/Feb/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/16680/ |
| Comment by Peter Jones [ 03/Feb/17 ] |
|
Landed for 2.10 |