[LU-13216] Kernel NULL pointer dereference in lustre_msg_set_conn_cnt() Created: 07/Feb/20 Updated: 17/Mar/20 Resolved: 17/Mar/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sebastien Buisson | Assignee: | Sebastien Buisson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
When running auster test suite with SHARED_KEY enabled, sanity-sec test_28 crashes because of a kernel NULL pointer dereference in lustre_msg_set_conn_cnt(). This function gets called from sptlrpc_req_refresh_ctx() via ctx_refresh_timeout(). [10565.205946] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [10565.207453] IP: [<ffffffffc09f0a0c>] lustre_msg_set_conn_cnt+0xc/0xa0 [ptlrpc] [10565.208767] PGD 80000000797c8067 PUD 7accd067 PMD 0 [10565.209685] Oops: 0000 [#1] SMP [10565.210291] Modules linked in: obdecho(OE) ptlrpc_gss(OE) mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel joydev lrw gf128mul glue_helper ablk_helper cryptd i2c_piix4 pcspkr parport_pc virtio_balloon parport ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk 8139too crct10dif_pclmul crct10dif_common crc32c_intel ata_piix serio_raw libata 8139cp virtio_pci virtio_ring [10565.224100] virtio mii floppy [last unloaded: libcfs] [10565.224871] CPU: 0 PID: 21330 Comm: bash Kdump: loaded Tainted: G OE ------------ 3.10.0-1062.9.1.el7.x86_64 #1 [10565.226677] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [10565.227736] task: ffff8b8afbeb41c0 ti: ffff8b8adf28c000 task.ti: ffff8b8adf28c000 [10565.229220] RIP: 0010:[<ffffffffc09f0a0c>] [<ffffffffc09f0a0c>] lustre_msg_set_conn_cnt+0xc/0xa0 [ptlrpc] [10565.230845] RSP: 0018:ffff8b8adf28f630 EFLAGS: 00010246 [10565.231719] RAX: ffff8b8afafd3800 RBX: ffff8b8afb5a8a00 RCX: ffff8b8adf28ffd8 [10565.232866] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000000 [10565.234022] RBP: ffff8b8adf28f640 R08: ffff8b8afb5a8a50 R09: ffff8b8aeb749680 [10565.235166] R10: ffffffff8d9a093d R11: ffff8b8af9354f00 R12: ffff8b8aeb749680 [10565.236317] R13: 0000000000000000 R14: ffff8b8aeb749698 R15: ffff8b8aeb749778 [10565.237469] FS: 00007efeee08d740(0000) GS:ffff8b8affc00000(0000) knlGS:0000000000000000 [10565.238759] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [10565.239692] CR2: 0000000000000008 CR3: 000000007afda000 CR4: 00000000000606f0 [10565.240848] Call Trace: [10565.241304] [<ffffffffc0a1d558>] sptlrpc_req_refresh_ctx+0x3c8/0xa50 [ptlrpc] [10565.242632] [<ffffffffc078a369>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [10565.243806] [<ffffffffc0a1dd60>] sptlrpc_import_check_ctx+0x180/0x3b0 [ptlrpc] [10565.245007] [<ffffffffc09b46e6>] ldlm_lock_match_with_skip+0x216/0x7f0 [ptlrpc] [10565.246228] [<ffffffff8d98eb44>] ? vsnprintf+0x234/0x6a0 [10565.247150] [<ffffffffc0b8a5e9>] mdc_lock_match+0xb9/0x180 [mdc] [10565.248146] [<ffffffffc0b8db9b>] mdc_revalidate_lock+0x12b/0x1f0 [mdc] [10565.249228] [<ffffffffc0b8df02>] mdc_intent_lock+0x2a2/0x560 [mdc] [10565.250310] [<ffffffffc0c2c470>] ? ll_md_need_convert+0x1b0/0x1b0 [lustre] [10565.251481] [<ffffffffc09c32a0>] ? ldlm_expired_completion_wait+0x220/0x220 [ptlrpc] [10565.252778] [<ffffffffc0b91370>] ? mdc_changelog_cdev_finish+0x1f0/0x1f0 [mdc] [10565.253995] [<ffffffffc0bcdd4a>] lmv_intent_lock+0x47a/0xaf0 [lmv] [10565.255018] [<ffffffff8d733682>] ? from_kgid+0x12/0x20 [10565.255881] [<ffffffffc0c2c787>] ? ll_i2suppgid+0x37/0x40 [lustre] [10565.256918] [<ffffffffc0c2c7c3>] ? ll_i2gids+0x33/0xb0 [lustre] [10565.257900] [<ffffffff8d733682>] ? from_kgid+0x12/0x20 [10565.258772] [<ffffffffc0c2c470>] ? ll_md_need_convert+0x1b0/0x1b0 [lustre] [10565.259915] [<ffffffffc0bfbaee>] ll_inode_revalidate+0x12e/0x690 [lustre] [10565.261045] [<ffffffffc0bfc253>] ll_inode_permission+0x203/0x3f0 [lustre] [10565.262162] [<ffffffff8d8559b7>] ? __follow_mount_rcu+0x37/0x100 [10565.263156] [<ffffffff8d8565d1>] __inode_permission+0x71/0xd0 [10565.264107] [<ffffffff8d856648>] inode_permission+0x18/0x50 [10565.265033] [<ffffffff8d85a6ae>] link_path_walk+0x27e/0x8b0 [10565.265948] [<ffffffff8d7bd99b>] ? unlock_page+0x2b/0x30 [10565.266825] [<ffffffff8d85ae4a>] path_lookupat+0x7a/0x8b0 [10565.267715] [<ffffffff8d824ef5>] ? kmem_cache_alloc+0x35/0x1f0 [10565.268675] [<ffffffff8d85c45f>] ? getname_flags+0x4f/0x1a0 [10565.269587] [<ffffffff8d85b6ab>] filename_lookup+0x2b/0xc0 [10565.270484] [<ffffffff8d85d5f7>] user_path_at_empty+0x67/0xc0 [10565.271434] [<ffffffff8d7f3ecd>] ? handle_mm_fault+0x39d/0x9b0 [10565.272388] [<ffffffff8d85d661>] user_path_at+0x11/0x20 [10565.273249] [<ffffffff8d850343>] vfs_fstatat+0x63/0xc0 [10565.274099] [<ffffffff8d8506fe>] SYSC_newstat+0x2e/0x60 [10565.274962] [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146 [10565.276028] [<ffffffff8dd8de15>] ? system_call_after_swapgs+0xa2/0x146 [10565.277096] [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146 [10565.278163] [<ffffffff8dd8de15>] ? system_call_after_swapgs+0xa2/0x146 [10565.279227] [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146 [10565.280294] [<ffffffff8dd8de15>] ? system_call_after_swapgs+0xa2/0x146 [10565.281368] [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146 [10565.282515] [<ffffffff8dd8de15>] ? system_call_after_swapgs+0xa2/0x146 [10565.283593] [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146 [10565.284709] [<ffffffff8dd8de15>] ? system_call_after_swapgs+0xa2/0x146 [10565.285797] [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146 [10565.286860] [<ffffffff8d850bbe>] SyS_newstat+0xe/0x10 [10565.287695] [<ffffffff8dd8dede>] system_call_fastpath+0x25/0x2a [10565.288665] [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146 [10565.289730] Code: c0 c7 05 0c ec 07 00 00 00 04 00 e8 af ca c7 ff 48 c7 c7 e0 f5 a6 c0 e8 e3 16 c9 ff 0f 1f 00 66 66 66 66 90 55 48 89 e5 41 54 53 <81> 7f 08 d3 0b d0 0b 48 89 fb 75 1d 41 89 f4 ba 98 00 00 00 31 [10565.294849] RIP [<ffffffffc09f0a0c>] lustre_msg_set_conn_cnt+0xc/0xa0 [ptlrpc] [10565.296089] RSP <ffff8b8adf28f630> [10565.296684] CR2: 0000000000000008 |
| Comments |
| Comment by Sebastien Buisson [ 07/Feb/20 ] |
|
The problem seems to be due to patch c1fad6a9a5 (" This patch aims at converting the waiting routine in sptlrpc_req_refresh_ctx(). But it changes slightly its behavior, in case timeout is 0. With the initial implementation, in case of zero timeout, the timeout callback was not called before starting an infinite, interruptible wait. But with the new implementation, this timeout callback gets called before going for an infinite, interruptible wait. I will propose a patch to address this regression. |
| Comment by Gerrit Updater [ 07/Feb/20 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37473 |
| Comment by Gerrit Updater [ 17/Mar/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37473/ |
| Comment by Peter Jones [ 17/Mar/20 ] |
|
Landed for 2.14 |