[LU-13216] Kernel NULL pointer dereference in lustre_msg_set_conn_cnt() Created: 07/Feb/20  Updated: 17/Mar/20  Resolved: 17/Mar/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Sebastien Buisson Assignee: Sebastien Buisson
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

When running auster test suite with SHARED_KEY enabled, sanity-sec test_28 crashes because of a kernel NULL pointer dereference in lustre_msg_set_conn_cnt(). This function gets called from sptlrpc_req_refresh_ctx() via ctx_refresh_timeout().

[10565.205946] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[10565.207453] IP: [<ffffffffc09f0a0c>] lustre_msg_set_conn_cnt+0xc/0xa0 [ptlrpc]
[10565.208767] PGD 80000000797c8067 PUD 7accd067 PMD 0 
[10565.209685] Oops: 0000 [#1] SMP 
[10565.210291] Modules linked in: obdecho(OE) ptlrpc_gss(OE) mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel joydev lrw gf128mul glue_helper ablk_helper cryptd i2c_piix4 pcspkr parport_pc virtio_balloon parport ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk 8139too crct10dif_pclmul crct10dif_common crc32c_intel ata_piix serio_raw libata 8139cp virtio_pci virtio_ring
[10565.224100]  virtio mii floppy [last unloaded: libcfs]
[10565.224871] CPU: 0 PID: 21330 Comm: bash Kdump: loaded Tainted: G           OE  ------------   3.10.0-1062.9.1.el7.x86_64 #1
[10565.226677] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[10565.227736] task: ffff8b8afbeb41c0 ti: ffff8b8adf28c000 task.ti: ffff8b8adf28c000
[10565.229220] RIP: 0010:[<ffffffffc09f0a0c>]  [<ffffffffc09f0a0c>] lustre_msg_set_conn_cnt+0xc/0xa0 [ptlrpc]
[10565.230845] RSP: 0018:ffff8b8adf28f630  EFLAGS: 00010246
[10565.231719] RAX: ffff8b8afafd3800 RBX: ffff8b8afb5a8a00 RCX: ffff8b8adf28ffd8
[10565.232866] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000000
[10565.234022] RBP: ffff8b8adf28f640 R08: ffff8b8afb5a8a50 R09: ffff8b8aeb749680
[10565.235166] R10: ffffffff8d9a093d R11: ffff8b8af9354f00 R12: ffff8b8aeb749680
[10565.236317] R13: 0000000000000000 R14: ffff8b8aeb749698 R15: ffff8b8aeb749778
[10565.237469] FS:  00007efeee08d740(0000) GS:ffff8b8affc00000(0000) knlGS:0000000000000000
[10565.238759] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10565.239692] CR2: 0000000000000008 CR3: 000000007afda000 CR4: 00000000000606f0
[10565.240848] Call Trace:
[10565.241304]  [<ffffffffc0a1d558>] sptlrpc_req_refresh_ctx+0x3c8/0xa50 [ptlrpc]
[10565.242632]  [<ffffffffc078a369>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
[10565.243806]  [<ffffffffc0a1dd60>] sptlrpc_import_check_ctx+0x180/0x3b0 [ptlrpc]
[10565.245007]  [<ffffffffc09b46e6>] ldlm_lock_match_with_skip+0x216/0x7f0 [ptlrpc]
[10565.246228]  [<ffffffff8d98eb44>] ? vsnprintf+0x234/0x6a0
[10565.247150]  [<ffffffffc0b8a5e9>] mdc_lock_match+0xb9/0x180 [mdc]
[10565.248146]  [<ffffffffc0b8db9b>] mdc_revalidate_lock+0x12b/0x1f0 [mdc]
[10565.249228]  [<ffffffffc0b8df02>] mdc_intent_lock+0x2a2/0x560 [mdc]
[10565.250310]  [<ffffffffc0c2c470>] ? ll_md_need_convert+0x1b0/0x1b0 [lustre]
[10565.251481]  [<ffffffffc09c32a0>] ? ldlm_expired_completion_wait+0x220/0x220 [ptlrpc]
[10565.252778]  [<ffffffffc0b91370>] ? mdc_changelog_cdev_finish+0x1f0/0x1f0 [mdc]
[10565.253995]  [<ffffffffc0bcdd4a>] lmv_intent_lock+0x47a/0xaf0 [lmv]
[10565.255018]  [<ffffffff8d733682>] ? from_kgid+0x12/0x20
[10565.255881]  [<ffffffffc0c2c787>] ? ll_i2suppgid+0x37/0x40 [lustre]
[10565.256918]  [<ffffffffc0c2c7c3>] ? ll_i2gids+0x33/0xb0 [lustre]
[10565.257900]  [<ffffffff8d733682>] ? from_kgid+0x12/0x20
[10565.258772]  [<ffffffffc0c2c470>] ? ll_md_need_convert+0x1b0/0x1b0 [lustre]
[10565.259915]  [<ffffffffc0bfbaee>] ll_inode_revalidate+0x12e/0x690 [lustre]
[10565.261045]  [<ffffffffc0bfc253>] ll_inode_permission+0x203/0x3f0 [lustre]
[10565.262162]  [<ffffffff8d8559b7>] ? __follow_mount_rcu+0x37/0x100
[10565.263156]  [<ffffffff8d8565d1>] __inode_permission+0x71/0xd0
[10565.264107]  [<ffffffff8d856648>] inode_permission+0x18/0x50
[10565.265033]  [<ffffffff8d85a6ae>] link_path_walk+0x27e/0x8b0
[10565.265948]  [<ffffffff8d7bd99b>] ? unlock_page+0x2b/0x30
[10565.266825]  [<ffffffff8d85ae4a>] path_lookupat+0x7a/0x8b0
[10565.267715]  [<ffffffff8d824ef5>] ? kmem_cache_alloc+0x35/0x1f0
[10565.268675]  [<ffffffff8d85c45f>] ? getname_flags+0x4f/0x1a0
[10565.269587]  [<ffffffff8d85b6ab>] filename_lookup+0x2b/0xc0
[10565.270484]  [<ffffffff8d85d5f7>] user_path_at_empty+0x67/0xc0
[10565.271434]  [<ffffffff8d7f3ecd>] ? handle_mm_fault+0x39d/0x9b0
[10565.272388]  [<ffffffff8d85d661>] user_path_at+0x11/0x20
[10565.273249]  [<ffffffff8d850343>] vfs_fstatat+0x63/0xc0
[10565.274099]  [<ffffffff8d8506fe>] SYSC_newstat+0x2e/0x60
[10565.274962]  [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146
[10565.276028]  [<ffffffff8dd8de15>] ? system_call_after_swapgs+0xa2/0x146
[10565.277096]  [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146
[10565.278163]  [<ffffffff8dd8de15>] ? system_call_after_swapgs+0xa2/0x146
[10565.279227]  [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146
[10565.280294]  [<ffffffff8dd8de15>] ? system_call_after_swapgs+0xa2/0x146
[10565.281368]  [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146
[10565.282515]  [<ffffffff8dd8de15>] ? system_call_after_swapgs+0xa2/0x146
[10565.283593]  [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146
[10565.284709]  [<ffffffff8dd8de15>] ? system_call_after_swapgs+0xa2/0x146
[10565.285797]  [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146
[10565.286860]  [<ffffffff8d850bbe>] SyS_newstat+0xe/0x10
[10565.287695]  [<ffffffff8dd8dede>] system_call_fastpath+0x25/0x2a
[10565.288665]  [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146
[10565.289730] Code: c0 c7 05 0c ec 07 00 00 00 04 00 e8 af ca c7 ff 48 c7 c7 e0 f5 a6 c0 e8 e3 16 c9 ff 0f 1f 00 66 66 66 66 90 55 48 89 e5 41 54 53 <81> 7f 08 d3 0b d0 0b 48 89 fb 75 1d 41 89 f4 ba 98 00 00 00 31 
[10565.294849] RIP  [<ffffffffc09f0a0c>] lustre_msg_set_conn_cnt+0xc/0xa0 [ptlrpc]
[10565.296089]  RSP <ffff8b8adf28f630>
[10565.296684] CR2: 0000000000000008


 Comments   
Comment by Sebastien Buisson [ 07/Feb/20 ]

The problem seems to be due to patch c1fad6a9a5 ("LU-10467 ptlrpc: convert waiting in sptlrpc_req_refresh_ctx()").
https://review.whamcloud.com/35987

This patch aims at converting the waiting routine in sptlrpc_req_refresh_ctx(). But it changes slightly its behavior, in case timeout is 0. With the initial implementation, in case of zero timeout, the timeout callback was not called before starting an infinite, interruptible wait. But with the new implementation, this timeout callback gets called before going for an infinite, interruptible wait.
Unfortunately, when called with zero timeout, sptlrpc_req_refresh_ctx() is not supposed to try to refresh the request, what the timeout callback ctx_refresh_timeout() precisely does. When it tries, it can hit the bug whose stack trace is detailed above.

I will propose a patch to address this regression.

Comment by Gerrit Updater [ 07/Feb/20 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37473
Subject: LU-13216 ptlrpc: do not refresh req in case of zero timeout
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 07c8fb64d3e4191a423314ffba0918c205d52793

Comment by Gerrit Updater [ 17/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37473/
Subject: LU-13216 ptlrpc: sptlrpc_req_refresh_ctx's timeout semantic
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0b09d826149f4baadce305df63396bf86eb20cf7

Comment by Peter Jones [ 17/Mar/20 ]

Landed for 2.14

Generated at Sat Feb 10 02:59:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.