[LU-17213] BUG: unable to handle kernel paging request at ll_direct_IO+0xd50 Created: 19/Oct/23  Updated: 09/Nov/23  Resolved: 09/Nov/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Trivial
Reporter: Alex Zhuravlev Assignee: Patrick Farrell
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

hit with racer:

[ 2837.860185] BUG: unable to handle kernel paging request at ffff9fa21b1adfe8
[ 2837.860349] PGD 8ae01067 P4D 8ae01067 PUD 1d137c067 PMD 1d07a0067 PTE 800ffffe64e52060
[ 2837.860454] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 2837.860524] CPU: 2 PID: 187710 Comm: lfs Tainted: G        W  O     --------- -  - 4.18.0 #2
[ 2837.860641] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc36 04/01/2014
[ 2837.860786] RIP: 0010:ll_direct_IO+0xd50/0x11a0 [lustre]
[ 2837.860860] Code: e8 95 77 21 ff 48 8b 3c 24 4d 89 fe 89 c2 4c 89 f6 4c 8b 7c 24 40 e8 cf 40 39 ff 80 7c 24 60 00 74 16 4c 89 f7 e8 b0 24 39 ff <41> f6 86 e0 00 00 00 01 0f 84 92 03 00 00 8b 05 48 25 23 ff 83 e0
[ 2837.861099] RSP: 0018:ffff9fa2358f3aa8 EFLAGS: 00010286
[ 2837.861169] RAX: 0000000000000000 RBX: fffffffffffffffc RCX: ffff9fa2358f39f0
[ 2837.861269] RDX: 0000000000000001 RSI: 800ffffe64e52060 RDI: 0000000000000286
[ 2837.861370] RBP: 0000000000002b9f R08: ffff9fa29b1ad000 R09: 0000000000000000
[ 2837.861472] R10: 000000019b1ad000 R11: 000000000019b1ad R12: 0000000000000b9f
[ 2837.861573] R13: 0000000000002b9f R14: ffff9fa21b1adf08 R15: ffff9fa2358f3c38
[ 2837.861674] FS:  00007f3bdbc3a440(0000) GS:ffff9fa249c00000(0000) knlGS:0000000000000000
[ 2837.861775] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2837.861860] CR2: ffff9fa21b1adfe8 CR3: 0000000195caa001 CR4: 0000000000370ea0
[ 2837.861968] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2837.862070] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2837.862172] Call Trace:
[ 2837.862214]  generic_file_direct_write+0x8c/0x160
[ 2837.862287]  __generic_file_write_iter+0xb2/0x1c0
[ 2837.862366]  ? lov_object_maxbytes+0x29/0x40 [lov]
[ 2837.862455]  vvp_io_write_start+0x397/0xc40 [lustre]
[ 2837.862564]  ? cl_lock_request+0x61/0x1d0 [obdclass]
[ 2837.862656]  cl_io_start+0x55/0x110 [obdclass]
[ 2837.862747]  cl_io_loop+0x95/0x200 [obdclass]
[ 2837.862832]  ll_file_io_generic+0x3f8/0xd90 [lustre]
[ 2837.862909]  ? __lock_acquire.isra.16+0x211/0x5b0
[ 2837.862996]  ll_file_write_iter+0x5f0/0x890 [lustre]
[ 2837.863072]  ? __lock_acquire.isra.16+0x2f3/0x5b0
[ 2837.863145]  new_sync_write+0xfa/0x130
[ 2837.863199]  vfs_write+0xb9/0x1c0
[ 2837.863253]  ksys_pwrite64+0x5f/0xa0
[ 2837.863307]  ? ksys_lseek+0x5d/0xa0

seem to be caused by this code:

				cl_sync_io_note(env, &sdio->csd_sync, result);
				if (sync_submit) {
					cl_sub_dio_free(sdio);
					LASSERT(sdio->csd_creator_free);


 Comments   
Comment by Gerrit Updater [ 19/Oct/23 ]

"Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52757
Subject: LU-17213 llite: check sdio before freeing it
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 42d22113eb07ac5f1431d27efa4c87f69a969b3e

Comment by Gerrit Updater [ 08/Nov/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52757/
Subject: LU-17213 llite: check sdio before freeing it
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 21295b169bd70c68cd99e2db6bac3fa60a8f2c83

Comment by Peter Jones [ 09/Nov/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:33:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.