[LU-14814] sanity test_398b: osc_cache_writeback_range ASSERTION( hp == 0 && discard == 0 ) failed: Created: 04/Jul/21  Updated: 06/Aug/21  Resolved: 29/Jul/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Patrick Farrell Assignee: Patrick Farrell
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-13725 sanity test_398b: osc_cache_writeback... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
[16265.830571] LustreError: 901425:0:(osc_cache.c:2983:osc_cache_writeback_range()) ASSERTION( hp == 0 && discard == 0 ) failed: 
[16265.832824] LustreError: 901425:0:(osc_cache.c:2983:osc_cache_writeback_range()) LBUG
[16265.834345] Pid: 901425, comm: fio 4.18.0-240.22.1.el8_3.x86_64 #1 SMP Thu Apr 8 19:01:30 UTC 2021
[16265.836046] Call Trace TBD:
[16265.836920] [<0>] libcfs_call_trace+0x6f/0x90 [libcfs]
[16265.837978] [<0>] lbug_with_loc+0x43/0x80 [libcfs]
[16265.839049] [<0>] osc_cache_writeback_range+0xb0f/0xe80 [osc]
[16265.840187] [<0>] osc_lock_flush+0x1bf/0x270 [osc]
[16265.841155] [<0>] osc_lock_lockless_cancel+0x37/0xe0 [osc]
[16265.842461] [<0>] cl_lock_cancel+0x69/0x140 [obdclass]
[16265.843569] [<0>] lov_lock_cancel+0x55/0x190 [lov]
[16265.844555] [<0>] cl_lock_cancel+0x69/0x140 [obdclass]
[16265.845595] [<0>] cl_lock_release+0x4d/0x120 [obdclass]
[16265.846648] [<0>] cl_io_unlock+0x10a/0x280 [obdclass]
[16265.847676] [<0>] cl_io_loop+0xb2/0x1e0 [obdclass]
[16265.848800] [<0>] ll_file_io_generic+0x447/0xc40 [lustre]
[16265.849880] [<0>] ll_file_write_iter+0x4ff/0x620 [lustre]
[16265.851003] [<0>] new_sync_write+0x124/0x170
[16265.851864] [<0>] vfs_write+0xa5/0x1a0
[16265.852641] [<0>] ksys_pwrite64+0x61/0xa0
[16265.853490] [<0>] do_syscall_64+0x5b/0x1a0
[16265.854343] [<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca
[16265.855423] Kernel panic - not syncing: LBUG

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_398b - trevis-28vm9 crashed during sanity test_398b



 Comments   
Comment by Patrick Farrell [ 04/Jul/21 ]

I am inclined to think this is an existing bug, but it has started showing up after the recent parallel DIO changes, and 398b does test those.

I am thinking about it...  Need some more debug, will probably push a patch with that in mind as I can't seem to reproduce locally.

Comment by Gerrit Updater [ 06/Jul/21 ]

Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/44152
Subject: LU-14814 osc: osc: Do not flush on lockless cancel
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 80d9d3fed2e20d9a24381d282e3bacc2bfda0a29

Comment by Andreas Dilger [ 12/Jul/21 ]

+1 on master: https://testing.whamcloud.com/test_sets/f45cbd1e-26f9-4386-804e-ce8cdc2923dd

Comment by Andreas Dilger [ 23/Jul/21 ]

+1 on master: https://testing.whamcloud.com/test_sets/1d9f8fdc-276f-4c39-9e6a-fb2553b6d946

Comment by Alex Zhuravlev [ 27/Jul/21 ]

+1 on master: https://testing.whamcloud.com/test_sets/8f896db7-1cbf-484b-8a9d-a0c3c4ee7c92

Comment by Gerrit Updater [ 28/Jul/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/44152/
Subject: LU-14814 osc: osc: Do not flush on lockless cancel
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6717c573ed90da9175e3c93c19759ea2dcd38bec

Comment by Peter Jones [ 29/Jul/21 ]

Landed for 21.5

Generated at Sat Feb 10 03:13:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.