[LU-14640] ASSERTION( !PageDirty(lnb[i].lnb_page) in osd_write_commit() Created: 25/Apr/21  Updated: 22/Jan/22  Resolved: 31/Aug/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Upstream
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Alex Zhuravlev Assignee: Arshad Hussain
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-14402 LBUG: osd_write_commit() ASSERTION( !... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
Lustre: DEBUG MARKER: == sanity-benchmark test fsx: fsx ==================================================================== 22:56:45 (1619287005)
LustreError: 8995:0:(osd_io.c:1563:osd_write_commit()) ASSERTION( !PageDirty(lnb[i].lnb_page) ) failed: 
LustreError: 8995:0:(osd_io.c:1563:osd_write_commit()) LBUG
Pid: 8995, comm: ll_ost_io00_009 4.18.0 #36 SMP Thu Mar 25 14:56:29 MSK 2021
Call Trace:
[<0>] libcfs_call_trace+0x76/0xa0 [libcfs]
[<0>] lbug_with_loc+0x3e/0x80 [libcfs]
[<0>] osd_write_commit+0x33a/0x9a0 [osd_ldiskfs]
[<0>] ofd_commitrw+0xbb2/0x2d90 [ofd]
[<0>] tgt_brw_write+0xeb7/0x22f0 [ptlrpc]
[<0>] tgt_request_handle+0xbe0/0x1970 [ptlrpc]
[<0>] ptlrpc_main+0x134f/0x30e0 [ptlrpc]

bisected to:

cb037f305c 2020-12-14 | LU-14160 fallocate: Add punch mode to fallocate [Arshad Hussain]


 Comments   
Comment by Andreas Dilger [ 25/Apr/21 ]

Hi Arshad, could you please take a look.

I suspect this is being caused by fallocate(PUNCH_HOLE) on non-page boundaries, which is leaving the partially-zeroed page in the buffer cache. With osd-ldiskfs we don't leave any pages in the buffer cache, so that we don't need to invalidate them later when doing large direct read/write operations. These buffers (potentially at both the start and end of the punch) should be treated the same way as a regular truncate where it zeroes the partial page, submits it for write, then invalidates the buffer afterward.

Comment by Arshad Hussain [ 25/Apr/21 ]

Hi Andreas, I am looking into this...

Comment by Arshad Hussain [ 26/Apr/21 ]

Hi Andreas,

> ... These buffers (potentially at both the start and end of the punch) should be treated the same way as a regular truncate where it zeroes the partial page, submits it for write, then invalidates the buffer afterward. 

Please correct me. It looks like the fallocate(PUNCH) is currently doing partial page flush.

 

osd_trans_stop()
---->osd_process_truncates()
-------->osd_execute_punch()
             {
                     osd_partial_page_flush(start);  ---> calls (filemap_fdatawrite_range)
                     osd_partial_page_flush(end);    ---> calls (filemap_fdatawrite_range)
             }

Unfortunalely, on my local setup I could not get sanity-benchmark(fsx) test to reproduce the assertion yet even running in a loop.

I am forcing a partial punch for verification  via script. This is is also passing.

 

#!/bin/bash
fname=/mnt/lustre/A
yes 'a' | dd of=$fname bs=4096 count=100
sync; sleep 1
for i in {100..1}; do 

   offset=$(($i * 4096))
   length=4090 
   echo "Running: fallocate -p -o $offset -l $length $fname"
   fallocate -p -o $offset -l $length $fname
done

 

 

Comment by Arshad Hussain [ 26/Apr/21 ]

Hi Alex,

I could not get sanity-benchmark(fsx) test to reproduce the assertion on my local setup. Could you help me with your setup details ? I am trying to reproduct this on the current master (a2b5290 LU-14613 tests: fix sanity/60f on a local setup)

Comment by Alex Zhuravlev [ 26/Apr/21 ]

nothing really special - /tmp in ram (tmpfs), then REFORMAT=y ONLY=fsx ONLY_REPEAT=1000 sh sanity-benchmark.sh

Comment by Arshad Hussain [ 26/Apr/21 ]

>nothing really special - /tmp in ram (tmpfs), then REFORMAT=y ONLY=fsx ONLY_REPEAT=1000 sh sanity-benchmark.sh

I will try out with increased loop count. Thanks!

Comment by Gerrit Updater [ 27/Apr/21 ]

Arshad Hussain (arshad.hussain@aeoncomputing.com) uploaded a new patch: https://review.whamcloud.com/43462
Subject: LU-14640 osd: non aligned flush
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 48a492c9d87163dc8c55a4ad1042bbab9a07448c

Comment by Gerrit Updater [ 31/Aug/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43462/
Subject: LU-14640 osd: ASSERTION(!PageDirty(lnb[i].lnb_page)
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 76c5e8ed9560fe232bcc0c2ee0069dbdb8411565

Comment by Peter Jones [ 31/Aug/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:11:29 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.