[LU-17116] losetup --direct-io=on trigger LASSERT(ll_dio_aio) in ll_file_io_generic Created: 14/Sep/23  Updated: 14/Sep/23  Resolved: 14/Sep/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Guillaume Courrier Assignee: Guillaume Courrier
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Related
is related to LU-16695 switch Lustre to use IOCB_APPEND and ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

When trying to use direct I/O on loop devices to format and mount an XFS filesystem backed on a Lustre file, we ran into the following crash:

PID: 51772  TASK: ffff950e22f117c0  CPU: 1   COMMAND: "loop4"
 #0 [ffffafe700c8b8a0] machine_kexec at ffffffffb9659a5e
 #1 [ffffafe700c8b8f8] __crash_kexec at ffffffffb975928d
 #2 [ffffafe700c8b9c0] panic at ffffffffb96b1498
 #3 [ffffafe700c8ba48] ll_direct_IO_impl at ffffffffc1755194 [lustre]
 #4 [ffffafe700c8bb08] generic_file_read_iter at ffffffffb981cd1f
 #5 [ffffafe700c8bb50] vvp_io_read_start at ffffffffc17673be [lustre]
 #6 [ffffafe700c8bbe8] cl_io_start at ffffffffc09e673d [obdclass]
 #7 [ffffafe700c8bc10] cl_io_loop at ffffffffc09e9d1a [obdclass]
 #8 [ffffafe700c8bc48] ll_file_io_generic at ffffffffc170e510 [lustre]
 #9 [ffffafe700c8bd40] ll_file_read_iter at ffffffffc170f9a6 [lustre]
#10 [ffffafe700c8bdb0] lo_rw_aio at ffffffffc08037a9 [loop]
#11 [ffffafe700c8be28] loop_queue_work at ffffffffc0804bc7 [loop]
#12 [ffffafe700c8bee0] kthread_worker_fn at ffffffffb96d5224
#13 [ffffafe700c8bf10] kthread at ffffffffb96d4802
#14 [ffffafe700c8bf50] ret_from_fork at ffffffffba000242 

This crash is triggered by `LASSERT(ll_dio_aio)` in `ll_file_io_generic`. The issue is that the loop block device uses `kiocb::ki_flags = IOCB_DIRECT` to trigger a direct I/O but Lustre only looks at `file::f_flags & O_DIRECT` to assess whether we are in DIO or not. This leads to inconsistencies in the expected variables that should be available in the read/write code paths.

This crash was produced with:

truncate -s 100M /mnt/lustre/disk
losetup -f b 4096 -direct-io=on /mnt/lustre/disk 

 



 Comments   
Comment by Guillaume Courrier [ 14/Sep/23 ]

I have a patch ready to fix this issue. I will after I create a test for it.

Comment by Patrick Farrell [ 14/Sep/23 ]

This isn't exactly a duplicate of LU-16695, but as Guillaume noted, it is fixed by LU-16695.  I'm noting the relationship and will close as a dupe now that we've got them linked.

Comment by Patrick Farrell [ 14/Sep/23 ]

This is fixed by the patch in LU-16695, so mark this a duplicate.

Generated at Sat Feb 10 03:32:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.