Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13216

Kernel NULL pointer dereference in lustre_msg_set_conn_cnt()

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      When running auster test suite with SHARED_KEY enabled, sanity-sec test_28 crashes because of a kernel NULL pointer dereference in lustre_msg_set_conn_cnt(). This function gets called from sptlrpc_req_refresh_ctx() via ctx_refresh_timeout().

      [10565.205946] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [10565.207453] IP: [<ffffffffc09f0a0c>] lustre_msg_set_conn_cnt+0xc/0xa0 [ptlrpc]
      [10565.208767] PGD 80000000797c8067 PUD 7accd067 PMD 0 
      [10565.209685] Oops: 0000 [#1] SMP 
      [10565.210291] Modules linked in: obdecho(OE) ptlrpc_gss(OE) mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel joydev lrw gf128mul glue_helper ablk_helper cryptd i2c_piix4 pcspkr parport_pc virtio_balloon parport ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk 8139too crct10dif_pclmul crct10dif_common crc32c_intel ata_piix serio_raw libata 8139cp virtio_pci virtio_ring
      [10565.224100]  virtio mii floppy [last unloaded: libcfs]
      [10565.224871] CPU: 0 PID: 21330 Comm: bash Kdump: loaded Tainted: G           OE  ------------   3.10.0-1062.9.1.el7.x86_64 #1
      [10565.226677] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [10565.227736] task: ffff8b8afbeb41c0 ti: ffff8b8adf28c000 task.ti: ffff8b8adf28c000
      [10565.229220] RIP: 0010:[<ffffffffc09f0a0c>]  [<ffffffffc09f0a0c>] lustre_msg_set_conn_cnt+0xc/0xa0 [ptlrpc]
      [10565.230845] RSP: 0018:ffff8b8adf28f630  EFLAGS: 00010246
      [10565.231719] RAX: ffff8b8afafd3800 RBX: ffff8b8afb5a8a00 RCX: ffff8b8adf28ffd8
      [10565.232866] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000000
      [10565.234022] RBP: ffff8b8adf28f640 R08: ffff8b8afb5a8a50 R09: ffff8b8aeb749680
      [10565.235166] R10: ffffffff8d9a093d R11: ffff8b8af9354f00 R12: ffff8b8aeb749680
      [10565.236317] R13: 0000000000000000 R14: ffff8b8aeb749698 R15: ffff8b8aeb749778
      [10565.237469] FS:  00007efeee08d740(0000) GS:ffff8b8affc00000(0000) knlGS:0000000000000000
      [10565.238759] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [10565.239692] CR2: 0000000000000008 CR3: 000000007afda000 CR4: 00000000000606f0
      [10565.240848] Call Trace:
      [10565.241304]  [<ffffffffc0a1d558>] sptlrpc_req_refresh_ctx+0x3c8/0xa50 [ptlrpc]
      [10565.242632]  [<ffffffffc078a369>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
      [10565.243806]  [<ffffffffc0a1dd60>] sptlrpc_import_check_ctx+0x180/0x3b0 [ptlrpc]
      [10565.245007]  [<ffffffffc09b46e6>] ldlm_lock_match_with_skip+0x216/0x7f0 [ptlrpc]
      [10565.246228]  [<ffffffff8d98eb44>] ? vsnprintf+0x234/0x6a0
      [10565.247150]  [<ffffffffc0b8a5e9>] mdc_lock_match+0xb9/0x180 [mdc]
      [10565.248146]  [<ffffffffc0b8db9b>] mdc_revalidate_lock+0x12b/0x1f0 [mdc]
      [10565.249228]  [<ffffffffc0b8df02>] mdc_intent_lock+0x2a2/0x560 [mdc]
      [10565.250310]  [<ffffffffc0c2c470>] ? ll_md_need_convert+0x1b0/0x1b0 [lustre]
      [10565.251481]  [<ffffffffc09c32a0>] ? ldlm_expired_completion_wait+0x220/0x220 [ptlrpc]
      [10565.252778]  [<ffffffffc0b91370>] ? mdc_changelog_cdev_finish+0x1f0/0x1f0 [mdc]
      [10565.253995]  [<ffffffffc0bcdd4a>] lmv_intent_lock+0x47a/0xaf0 [lmv]
      [10565.255018]  [<ffffffff8d733682>] ? from_kgid+0x12/0x20
      [10565.255881]  [<ffffffffc0c2c787>] ? ll_i2suppgid+0x37/0x40 [lustre]
      [10565.256918]  [<ffffffffc0c2c7c3>] ? ll_i2gids+0x33/0xb0 [lustre]
      [10565.257900]  [<ffffffff8d733682>] ? from_kgid+0x12/0x20
      [10565.258772]  [<ffffffffc0c2c470>] ? ll_md_need_convert+0x1b0/0x1b0 [lustre]
      [10565.259915]  [<ffffffffc0bfbaee>] ll_inode_revalidate+0x12e/0x690 [lustre]
      [10565.261045]  [<ffffffffc0bfc253>] ll_inode_permission+0x203/0x3f0 [lustre]
      [10565.262162]  [<ffffffff8d8559b7>] ? __follow_mount_rcu+0x37/0x100
      [10565.263156]  [<ffffffff8d8565d1>] __inode_permission+0x71/0xd0
      [10565.264107]  [<ffffffff8d856648>] inode_permission+0x18/0x50
      [10565.265033]  [<ffffffff8d85a6ae>] link_path_walk+0x27e/0x8b0
      [10565.265948]  [<ffffffff8d7bd99b>] ? unlock_page+0x2b/0x30
      [10565.266825]  [<ffffffff8d85ae4a>] path_lookupat+0x7a/0x8b0
      [10565.267715]  [<ffffffff8d824ef5>] ? kmem_cache_alloc+0x35/0x1f0
      [10565.268675]  [<ffffffff8d85c45f>] ? getname_flags+0x4f/0x1a0
      [10565.269587]  [<ffffffff8d85b6ab>] filename_lookup+0x2b/0xc0
      [10565.270484]  [<ffffffff8d85d5f7>] user_path_at_empty+0x67/0xc0
      [10565.271434]  [<ffffffff8d7f3ecd>] ? handle_mm_fault+0x39d/0x9b0
      [10565.272388]  [<ffffffff8d85d661>] user_path_at+0x11/0x20
      [10565.273249]  [<ffffffff8d850343>] vfs_fstatat+0x63/0xc0
      [10565.274099]  [<ffffffff8d8506fe>] SYSC_newstat+0x2e/0x60
      [10565.274962]  [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146
      [10565.276028]  [<ffffffff8dd8de15>] ? system_call_after_swapgs+0xa2/0x146
      [10565.277096]  [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146
      [10565.278163]  [<ffffffff8dd8de15>] ? system_call_after_swapgs+0xa2/0x146
      [10565.279227]  [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146
      [10565.280294]  [<ffffffff8dd8de15>] ? system_call_after_swapgs+0xa2/0x146
      [10565.281368]  [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146
      [10565.282515]  [<ffffffff8dd8de15>] ? system_call_after_swapgs+0xa2/0x146
      [10565.283593]  [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146
      [10565.284709]  [<ffffffff8dd8de15>] ? system_call_after_swapgs+0xa2/0x146
      [10565.285797]  [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146
      [10565.286860]  [<ffffffff8d850bbe>] SyS_newstat+0xe/0x10
      [10565.287695]  [<ffffffff8dd8dede>] system_call_fastpath+0x25/0x2a
      [10565.288665]  [<ffffffff8dd8de21>] ? system_call_after_swapgs+0xae/0x146
      [10565.289730] Code: c0 c7 05 0c ec 07 00 00 00 04 00 e8 af ca c7 ff 48 c7 c7 e0 f5 a6 c0 e8 e3 16 c9 ff 0f 1f 00 66 66 66 66 90 55 48 89 e5 41 54 53 <81> 7f 08 d3 0b d0 0b 48 89 fb 75 1d 41 89 f4 ba 98 00 00 00 31 
      [10565.294849] RIP  [<ffffffffc09f0a0c>] lustre_msg_set_conn_cnt+0xc/0xa0 [ptlrpc]
      [10565.296089]  RSP <ffff8b8adf28f630>
      [10565.296684] CR2: 0000000000000008
      

      Attachments

        Activity

          [LU-13216] Kernel NULL pointer dereference in lustre_msg_set_conn_cnt()
          pjones Peter Jones added a comment -

          Landed for 2.14

          pjones Peter Jones added a comment - Landed for 2.14

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37473/
          Subject: LU-13216 ptlrpc: sptlrpc_req_refresh_ctx's timeout semantic
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 0b09d826149f4baadce305df63396bf86eb20cf7

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37473/ Subject: LU-13216 ptlrpc: sptlrpc_req_refresh_ctx's timeout semantic Project: fs/lustre-release Branch: master Current Patch Set: Commit: 0b09d826149f4baadce305df63396bf86eb20cf7

          Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37473
          Subject: LU-13216 ptlrpc: do not refresh req in case of zero timeout
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 07c8fb64d3e4191a423314ffba0918c205d52793

          gerrit Gerrit Updater added a comment - Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37473 Subject: LU-13216 ptlrpc: do not refresh req in case of zero timeout Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 07c8fb64d3e4191a423314ffba0918c205d52793

          The problem seems to be due to patch c1fad6a9a5 ("LU-10467 ptlrpc: convert waiting in sptlrpc_req_refresh_ctx()").
          https://review.whamcloud.com/35987

          This patch aims at converting the waiting routine in sptlrpc_req_refresh_ctx(). But it changes slightly its behavior, in case timeout is 0. With the initial implementation, in case of zero timeout, the timeout callback was not called before starting an infinite, interruptible wait. But with the new implementation, this timeout callback gets called before going for an infinite, interruptible wait.
          Unfortunately, when called with zero timeout, sptlrpc_req_refresh_ctx() is not supposed to try to refresh the request, what the timeout callback ctx_refresh_timeout() precisely does. When it tries, it can hit the bug whose stack trace is detailed above.

          I will propose a patch to address this regression.

          sebastien Sebastien Buisson added a comment - The problem seems to be due to patch c1fad6a9a5 (" LU-10467 ptlrpc: convert waiting in sptlrpc_req_refresh_ctx()"). https://review.whamcloud.com/35987 This patch aims at converting the waiting routine in sptlrpc_req_refresh_ctx() . But it changes slightly its behavior, in case timeout is 0. With the initial implementation, in case of zero timeout, the timeout callback was not called before starting an infinite, interruptible wait. But with the new implementation, this timeout callback gets called before going for an infinite, interruptible wait. Unfortunately, when called with zero timeout, sptlrpc_req_refresh_ctx() is not supposed to try to refresh the request, what the timeout callback ctx_refresh_timeout() precisely does. When it tries, it can hit the bug whose stack trace is detailed above. I will propose a patch to address this regression.

          People

            sebastien Sebastien Buisson
            sebastien Sebastien Buisson
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: