[LU-17116] losetup --direct-io=on trigger LASSERT(ll_dio_aio) in ll_file_io_generic Created: 14/Sep/23 Updated: 14/Sep/23 Resolved: 14/Sep/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Guillaume Courrier | Assignee: | Guillaume Courrier |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
When trying to use direct I/O on loop devices to format and mount an XFS filesystem backed on a Lustre file, we ran into the following crash:
PID: 51772 TASK: ffff950e22f117c0 CPU: 1 COMMAND: "loop4"
#0 [ffffafe700c8b8a0] machine_kexec at ffffffffb9659a5e
#1 [ffffafe700c8b8f8] __crash_kexec at ffffffffb975928d
#2 [ffffafe700c8b9c0] panic at ffffffffb96b1498
#3 [ffffafe700c8ba48] ll_direct_IO_impl at ffffffffc1755194 [lustre]
#4 [ffffafe700c8bb08] generic_file_read_iter at ffffffffb981cd1f
#5 [ffffafe700c8bb50] vvp_io_read_start at ffffffffc17673be [lustre]
#6 [ffffafe700c8bbe8] cl_io_start at ffffffffc09e673d [obdclass]
#7 [ffffafe700c8bc10] cl_io_loop at ffffffffc09e9d1a [obdclass]
#8 [ffffafe700c8bc48] ll_file_io_generic at ffffffffc170e510 [lustre]
#9 [ffffafe700c8bd40] ll_file_read_iter at ffffffffc170f9a6 [lustre]
#10 [ffffafe700c8bdb0] lo_rw_aio at ffffffffc08037a9 [loop]
#11 [ffffafe700c8be28] loop_queue_work at ffffffffc0804bc7 [loop]
#12 [ffffafe700c8bee0] kthread_worker_fn at ffffffffb96d5224
#13 [ffffafe700c8bf10] kthread at ffffffffb96d4802
#14 [ffffafe700c8bf50] ret_from_fork at ffffffffba000242
This crash is triggered by `LASSERT(ll_dio_aio)` in `ll_file_io_generic`. The issue is that the loop block device uses `kiocb::ki_flags = IOCB_DIRECT` to trigger a direct I/O but Lustre only looks at `file::f_flags & O_DIRECT` to assess whether we are in DIO or not. This leads to inconsistencies in the expected variables that should be available in the read/write code paths. This crash was produced with: truncate -s 100M /mnt/lustre/disk losetup -f b 4096 -direct-io=on /mnt/lustre/disk
|
| Comments |
| Comment by Guillaume Courrier [ 14/Sep/23 ] |
|
I have a patch ready to fix this issue. I will after I create a test for it. |
| Comment by Patrick Farrell [ 14/Sep/23 ] |
|
This isn't exactly a duplicate of |
| Comment by Patrick Farrell [ 14/Sep/23 ] |
|
This is fixed by the patch in |