[LU-763] DIO write does not force sync journal commit on OST Created: 14/Oct/11 Updated: 29/May/17 Resolved: 29/May/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Eric Barton (Inactive) | Assignee: | Oleg Drokin |
| Resolution: | Cannot Reproduce | Votes: | 1 |
| Labels: | None | ||
| Severity: | 2 |
| Rank (Obsolete): | 10463 |
| Description |
|
using sgp_dd to write to a Lustre file with async journal commits enabled, the JBD history shows that journal entries are being committed only at the timeout interval, and not synchronously on each write. This will cause data loss if the OST crashes after the data is written but before the journal is flushed. |
| Comments |
| Comment by Peter Jones [ 14/Oct/11 ] |
|
Oleg can you please look into this one? Thanks Peter |
| Comment by Andreas Dilger [ 14/Oct/11 ] |
|
Eric, the clients will do bulk RPC recovery if the io does not commit, because the transno for the RPC is not reported as committed yet. This is needed for ZFS also, which would suffer terribly if it had to do sync writes w/o the ZIL. |
| Comment by Johann Lombardi (Inactive) [ 14/Oct/11 ] |
|
hm, weird, OBD_BRW_ASYNC is not set for dios, so we should really be triggering a journal flush & wait. |
| Comment by Eric Barton (Inactive) [ 14/Oct/11 ] |
|
Sorry my bad, but forgot to say the whole point of this bug which is that sgp_dd had "dio=1" set - i.e. the client was doing O_DIRECT writes which should force sync journal writes since the client cannot replay such writes. |
| Comment by Oleg Drokin [ 14/Oct/11 ] |
|
I just performed a number of tests and I don't think I can reproduce this at all. Tried on 1.8.6-wc and 1.8.7-rc1 Here is my testcase:
|
| Comment by Eric Barton (Inactive) [ 16/Oct/11 ] |
|
Shame there aren't timestamps in the transaction log to prove that the non-dio I/O all happened after the last small transaction. The program doing the writing when we observed DIO writes not doing sync journal commits was sgp_dd running as follows... sgp_dd time=1 bs=512 bpt=2048 thr=16 dio=1 of=/mnt/lustre/ost0/tf2.out if=/dev/zero count=8192000 & ...and the file was in a directory where the default layout was 1 stripe on ost0. I can't imagine why any of that would make a difference, but I don't think we can rule out that it doesn't. For the full evidence, look in the commitT-OUT5sec.tar.bz2 attached to NTAP-3. The transaction history is printed in writeJBDstats.before.out and writeJBDstats.after.out, and you can compare the 2 to see what occurred during the test run. See runWrite8OSTs.ksh for what got run. <Note - edited this comment because commitT-OUT10sec.tar.bz2 didn't set dio=1 in the sgp_dd commands run> |
| Comment by Oleg Drokin [ 16/Oct/11 ] |
|
I just checked sgp_dd man page here: http://linux.die.net/man/8/sgp_dd The man page tells me that sgp_dd defaults to non-direct io and if you want direct io you must pass in dio=1 flag that seems to be absent in the command line provided. Regarding the timestamp in transaction history, you can fully trust me that I did check the history in between the dd commands (and I did a sync before too) and at the completion of directio dd there were only short transactions in the log, the long one was added after the normal io dd was run. |
| Comment by Andreas Dilger [ 29/May/17 ] |
|
Close old ticket. |