[LU-14640] ASSERTION( !PageDirty(lnb[i].lnb_page) in osd_write_commit() Created: 25/Apr/21 Updated: 22/Jan/22 Resolved: 31/Aug/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Upstream |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Alex Zhuravlev | Assignee: | Arshad Hussain |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
Lustre: DEBUG MARKER: == sanity-benchmark test fsx: fsx ==================================================================== 22:56:45 (1619287005) LustreError: 8995:0:(osd_io.c:1563:osd_write_commit()) ASSERTION( !PageDirty(lnb[i].lnb_page) ) failed: LustreError: 8995:0:(osd_io.c:1563:osd_write_commit()) LBUG Pid: 8995, comm: ll_ost_io00_009 4.18.0 #36 SMP Thu Mar 25 14:56:29 MSK 2021 Call Trace: [<0>] libcfs_call_trace+0x76/0xa0 [libcfs] [<0>] lbug_with_loc+0x3e/0x80 [libcfs] [<0>] osd_write_commit+0x33a/0x9a0 [osd_ldiskfs] [<0>] ofd_commitrw+0xbb2/0x2d90 [ofd] [<0>] tgt_brw_write+0xeb7/0x22f0 [ptlrpc] [<0>] tgt_request_handle+0xbe0/0x1970 [ptlrpc] [<0>] ptlrpc_main+0x134f/0x30e0 [ptlrpc] bisected to: cb037f305c 2020-12-14 | LU-14160 fallocate: Add punch mode to fallocate [Arshad Hussain] |
| Comments |
| Comment by Andreas Dilger [ 25/Apr/21 ] |
|
Hi Arshad, could you please take a look. I suspect this is being caused by fallocate(PUNCH_HOLE) on non-page boundaries, which is leaving the partially-zeroed page in the buffer cache. With osd-ldiskfs we don't leave any pages in the buffer cache, so that we don't need to invalidate them later when doing large direct read/write operations. These buffers (potentially at both the start and end of the punch) should be treated the same way as a regular truncate where it zeroes the partial page, submits it for write, then invalidates the buffer afterward. |
| Comment by Arshad Hussain [ 25/Apr/21 ] |
|
Hi Andreas, I am looking into this... |
| Comment by Arshad Hussain [ 26/Apr/21 ] |
|
Hi Andreas, > ... These buffers (potentially at both the start and end of the punch) should be treated the same way as a regular truncate where it zeroes the partial page, submits it for write, then invalidates the buffer afterward. Please correct me. It looks like the fallocate(PUNCH) is currently doing partial page flush.
osd_trans_stop()
---->osd_process_truncates()
-------->osd_execute_punch()
{
osd_partial_page_flush(start); ---> calls (filemap_fdatawrite_range)
osd_partial_page_flush(end); ---> calls (filemap_fdatawrite_range)
}
Unfortunalely, on my local setup I could not get sanity-benchmark(fsx) test to reproduce the assertion yet even running in a loop. I am forcing a partial punch for verification via script. This is is also passing.
#!/bin/bash
fname=/mnt/lustre/A
yes 'a' | dd of=$fname bs=4096 count=100
sync; sleep 1
for i in {100..1}; do
offset=$(($i * 4096))
length=4090
echo "Running: fallocate -p -o $offset -l $length $fname"
fallocate -p -o $offset -l $length $fname
done
|
| Comment by Arshad Hussain [ 26/Apr/21 ] |
|
Hi Alex, I could not get sanity-benchmark(fsx) test to reproduce the assertion on my local setup. Could you help me with your setup details ? I am trying to reproduct this on the current master (a2b5290 |
| Comment by Alex Zhuravlev [ 26/Apr/21 ] |
|
nothing really special - /tmp in ram (tmpfs), then REFORMAT=y ONLY=fsx ONLY_REPEAT=1000 sh sanity-benchmark.sh |
| Comment by Arshad Hussain [ 26/Apr/21 ] |
|
>nothing really special - /tmp in ram (tmpfs), then REFORMAT=y ONLY=fsx ONLY_REPEAT=1000 sh sanity-benchmark.sh I will try out with increased loop count. Thanks! |
| Comment by Gerrit Updater [ 27/Apr/21 ] |
|
Arshad Hussain (arshad.hussain@aeoncomputing.com) uploaded a new patch: https://review.whamcloud.com/43462 |
| Comment by Gerrit Updater [ 31/Aug/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43462/ |
| Comment by Peter Jones [ 31/Aug/21 ] |
|
Landed for 2.15 |