[LU-4388] fsync on client does not cause OST_SYNCs to be issued Created: 17/Dec/13 Updated: 14/Jun/18 Resolved: 24/Apr/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0, Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Nathaniel Clark | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | MB | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 12048 | ||||
| Description |
|
Issuing an fsync() only causes MDS_SYNC's to be issued. Testing: |
| Comments |
| Comment by Andreas Dilger [ 19/Dec/13 ] |
|
Looking at the ll_fsync() code path I see a number of gaps in the handling. This should be passing through the file range for fsync, since it is possible to sync only a range of the file. The Lustre OST protocol has always supported this, and it makes sense to implement it correctly. On the OST side, it looks like ofd_sync() passes start/end correctly, but the new ofd_sync_hdl() does not. The whole callchain for tgt_sync->dt_object_sync->osd_object_sync->do_sync() need to take start and end arguments, and then pass it down to f_op->fsync() for those newer variants of fsync() that allow specifying a range of the file. I pushed http://review.whamcloud.com/8626 to fix handling of {start, end}limits for fsync, but this will not fix the core problem of OST_SYNC not being sent at all. Looking at the twisty maze of CLIO that the fsync call passes through, it seems like the fsync() call should generate an OST_SYNC RPC, but I didn't look at actual debug logs to trace the callpath to see where it actually goes wrong. It is probably worthwhile to create a test case that writes 900kB (less than one RPC, so it isn't sent immediately), fsync (via "multiop -y"), immediately fail the OST (mark it read-only) and abort recovery when fsync() returns, then remount the client and verify that the data actually made it to disk. There are sanity.sh test_24 {a,b}() that test fsync() behaviour in terms of returning an error to the caller on failure, but there isn't a test that verifies fsync() is actually working correctly. Since we've had problems in this area several times (it is not a problem that can be seen easily, since the data is normally written to disk within a few seconds by itself anyway), it probably makes sense to start by writing this test first, and then using it to debug the problem. |
| Comment by Andreas Dilger [ 20/Dec/13 ] |
|
Note that my patch is not addressing the core issue here, just a problem that I found when I was looking at this bug originally. Someone with more CLIO skill or time to debug needs to look at this problem. |
| Comment by Peter Jones [ 24/Dec/13 ] |
|
Bobijam Could you please look into this one? Thanks Peter |
| Comment by Zhenyu Xu [ 24/Dec/13 ] |
|
Nathaniel, Can you describe the problem in detail? What's the full commands and the respective outputs? |
| Comment by Zhenyu Xu [ 27/Dec/13 ] |
|
dd ... conv=fsync issues several ll_writepages() with wbc->sync_mode == WB_SYNC_ALL and it calls after write finishes, it calls ll_fsync() with datasync==0 00000080:00000001:0.0:1388123592.960635:0:20741:0:(rw.c:1112:ll_writepages()) Process entered ... 00000080:00200000:0.0:1388123592.960646:0:20741:0:(file.c:2742:cl_sync_file_range()) VFS Op:inode=[0x200000400:0x1:0x0](ffff8800006da138), sync_mode=1 ... 00000080:00000001:1.0:1388123592.973789:0:20741:0:(file.c:2778:cl_sync_file_range()) Process leaving (rc=0 : 0 : 0) 00000080:00000001:1.0:1388123592.973791:0:20741:0:(rw.c:1147:ll_writepages()) Process leaving (rc=0 : 0 : 0) 00000080:00000001:1.0:1388123592.973794:0:20741:0:(file.c:2804:ll_fsync()) Process entered 00000080:00200000:1.0:1388123592.973794:0:20741:0:(file.c:2807:ll_fsync()) VFS Op:inode=[0x200000400:0x1:0x0](ffff8800006da138), datasync=0 |
| Comment by Zhenyu Xu [ 27/Dec/13 ] |
|
Nathaniel, Use conv=fdatasync will issue the OST_SYNC. dd conv=fsync does not set the datasync parameter for fsync(), conv=fdatasync set it. |
| Comment by Andreas Dilger [ 27/Dec/13 ] |
|
Does this mean there is a bug in out code that it doesn't send an OST_SYNC in case of conv=fsync? |
| Comment by Zhenyu Xu [ 27/Dec/13 ] |
|
A little bit confused about the datasync parameter of fsync(), is it datasync=1 means only sync data while datasync=0 means sync data+metadata? |
| Comment by Andreas Dilger [ 30/Dec/13 ] |
|
Looking at the comments around vfs_fsync() and vfs_fsync_range() * vfs_fsync - perform a fsync or fdatasync on a file
* @file: file to sync
* @datasync: only perform a fdatasync operation
*
* Write back data and metadata for @file to disk. If @datasync is
* set only metadata needed to access modified file data is written.
It does appear that datasync=1 means sync data and enough metadata to get the data back (e.g. i_size and block allocations) while datasync=0 means sync everything. See also http://man7.org/linux/man-pages/man2/fsync.2.html
So if datasync=0 then we should always be sending OST_SYNC RPCs, but if datasync=1 then that is only needed if the size is changed or if blocks were allocated. That is a difficult/impossible decision for a client to make, so the OST_SYNC RPC should be sent to the OST with a DATASYNC flag and let the OSD decide. For older branches it probably makes sense to always send OST_SYNC. |
| Comment by Jinshan Xiong (Inactive) [ 31/Dec/13 ] |
|
I see. So this is a misunderstanding of @datasync parameter. This bug may exist in all series of lustre client implementation. Thanks for explanation. |
| Comment by Zhenyu Xu [ 31/Dec/13 ] |
|
patch tracking at http://review.whamcloud.com/8684 |
| Comment by Jodi Levi (Inactive) [ 12/Mar/14 ] |
|
Patch has landed to Master. Please reopen if more work is needed in this ticket. |
| Comment by Andreas Dilger [ 21/Mar/14 ] |
|
The patch http://review.whamcloud.com/8626 has not landed yet. |
| Comment by Jodi Levi (Inactive) [ 24/Apr/14 ] |
|
Patch landed to Master. |
| Comment by Hans Henrik Happe [ 15/May/15 ] |
|
Should this have been fixed after 2.5.0 or is it only 2.6.0? I see the same problem with b2_5 on zfs (fdatasync okay and slow, fsync not touching the disk). I think it is important that fsync works as expected. |
| Comment by Zhenyu Xu [ 16/May/15 ] |
|
2.6.0 has the patches and 2.5.0 does not. |
| Comment by Hans Henrik Happe [ 17/May/15 ] |
|
Given the severity (data loss, inconsistent databases, etc..), shouldn't it be included in the maintenance release? |
| Comment by Gerrit Updater [ 18/May/15 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/14840 |
| Comment by Gerrit Updater [ 25/Jun/15 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/15390 |