[LU-17453] Use dget_parent/dput during d_revalidate Created: 22/Jan/24  Updated: 29/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Shaun Tancheff Assignee: Shaun Tancheff
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

There appears to be a race that can be triggered by parallel-scale-nfsv3.

In any case the use of dget/dput prevents the dentry from disappearing while it is being validated

This can result in a crash:

[ 1998.665129][T23978] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 1998.672417][T23978] #PF: supervisor read access in kernel mode
[ 1998.675733][T23978] #PF: error_code(0x0000) - not-present page
[ 1998.679427][T23978] PGD 0 P4D 0 
[ 1998.683876][T23978] Oops: 0000 [#1] PREEMPT SMP PTI
[ 1998.686690][T23978] CPU: 4 PID: 23978 Comm: dd Kdump: loaded Tainted: G        W  OE     N 5.14.21-150400.24.41-default #1 SLE15-SP4 e37e7aadb4e42246eb51815d42fa73d67a617d00
[ 1998.698157][T23978] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1998.702225][T23978] RIP: 0010:path_openat+0x81a/0x1080
[ 1998.705221][T23978] Code: 00 f0 ff ff 49 89 c3 48 89 44 24 10 48 8b 4c 24 30 0f 86 c6 fa ff ff e9 fb fc ff ff 48 8b 5a 30 8b 15 c6 4c 9f 01 85 d2 75 0d <0f> b7 03 66 25 00 f0 66 3d 00 10 74 28 8b 0d ab 4c 9f 01 85 c9 75
[ 1998.723281][T23978] RSP: 0018:ffffa86744fc7c50 EFLAGS: 00010246
[ 1998.727185][T23978] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000004b00000000
[ 1998.731275][T23978] RDX: 0000000000000000 RSI: 0000000000000064 RDI: ffff8a19d4465418
[ 1998.736350][T23978] RBP: ffffa86744fc7e3c R08: 00000000000090c8 R09: 0000000000000001
[ 1998.742956][T23978] R10: ffff8a19dc9ad190 R11: ffff8a19cb28d300 R12: 0000000000008042
[ 1998.751696][T23978] R13: 0000000000000000 R14: ffffa86744fc7d00 R15: ffff8a19c2101600
[ 1998.757626][T23978] FS:  00007f1fb2ceb740(0000) GS:ffff8a19fbd00000(0000) knlGS:0000000000000000
[ 1998.769153][T23978] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1998.774199][T23978] CR2: 0000000000000000 CR3: 000000011e8aa000 CR4: 00000000000406e0
[ 1998.777925][T23978] Call Trace:
[ 1998.780410][T23978]  <TASK>
[ 1998.782562][T23978]  ? do_filp_open+0xd9/0x140
[ 1998.789075][T23978]  do_filp_open+0xc5/0x140
[ 1998.792461][T23978]  ? _raw_spin_unlock+0xa/0x30
[ 1998.794971][T23978]  ? kmem_cache_alloc+0x4d/0x4c0
[ 1998.797279][T23978]  ? _raw_spin_unlock+0xa/0x30
[ 1998.800835][T23978]  ? do_sys_openat2+0x23e/0x310
[ 1998.810720][T23978]  do_sys_openat2+0x23e/0x310
[ 1998.815004][T23978]  do_sys_open+0x57/0x80
[ 1998.818226][T23978]  do_syscall_64+0x5b/0x80
[ 1998.821460][T23978]  ? syscall_exit_to_user_mode+0x18/0x40
[ 1998.825997][T23978]  ? _raw_spin_unlock+0xa/0x30
[ 1998.829523][T23978]  ? filp_close+0x51/0x80
[ 1998.836459][T23978]  ? syscall_exit_to_user_mode+0x18/0x40
[ 1998.839667][T23978]  ? do_syscall_64+0x67/0x80
[ 1998.842921][T23978]  ? exc_page_fault+0x67/0x150
[ 1998.846876][T23978]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 1998.850821][T23978] RIP: 0033:0x7f1fb27e077d


 Comments   
Comment by Gerrit Updater [ 22/Jan/24 ]

"Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53757
Subject: LU-17453 llite: use dget_parent in d_revalidate
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c9491a1fe5bf6a1d023639511c3552befa28d5e2

Comment by Patrick Farrell [ 22/Jan/24 ]

Can you give any more details about how this race manifests?  What goes wrong?

Comment by Shaun Tancheff [ 23/Jan/24 ]

Updated description to include the crash back-trace.

Comment by Gerrit Updater [ 29/Jan/24 ]

"Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53850
Subject: LU-17453 test: use dget_parent to access dentry.d_parent
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 76514113901fdb3dd507b7c5c17757580ccf03e9

Generated at Sat Feb 10 03:35:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.