Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-815

BUG: unable to handle kernel NULL pointer dereference" in lprocfs_rd_import()

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • None
    • None
    • Lustre-2.0, RHEL6.0
    • 3
    • 24,449
    • 6530

    Description

      We've been hitting this problem for several months whe we reading in "/proc/fs/lustre/osc/<OST>/import".

      I saw there's maybe a related patch (BZ#22032 - WC's git: 839280926956f16552194fe803ba21096770ebc4) which was integrated for Lustre-2.1. What do you think of this? If 22032's patch is not related, then does this sound to you as a know problem?

      ==============================================================================
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      IP: [<ffffffffa0482d3d>] lprocfs_rd_import+0x32d/0x6b0 [obdclass]
      PGD c7cf9f067 PUD ae9bcc067 PMD 0
      Oops: 0000 1 SMP
      last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/infiniband/mlx4_0/ports/1/rate
      CPU 5
      Modules linked in: sit(U) tunnel4(U) lmv(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) lquota(U) fid(U) fld(U) ko2iblnd(U)
      ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U)
      ib_sa(U) ib_uverbs(U) ib_umad(U) mlx4_ib(U) mlx4_core(U) ib_mthca(U) ib_mad(U) ib_core(U) ipmi_devintf(U) ipmi_si(U)
      ipmi_msghandler(U) iptable_filter(U) ip_tables(U) x_tables(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) auth_rpcgss(U) sunrpc(U)
      acpi_cpufreq(U) freq_table(U) vtune_drv(U) autofs4(U) ipv6(U) sg(U) i7core_edac(U) edac_core(U) i2c_i801(U) i2c_core(U)
      igb(U) ioatdma(U) dca(U) iTCO_wdt(U) iTCO_vendor_support(U) ext3(U) jbd(U) mbcache(U) sd_mod(U) crc_t10dif(U) usbhid(U)
      hid(U) ehci_hcd(U) ahci(U) uhci_hcd(U) dm_mod(U) [last unloaded: libcfs]

      Pid: 29413, comm: grep Not tainted 2.6.32-30.el6.Bull.14.x86_64 #1 bullx super-node
      RIP: 0010:[<ffffffffa0482d3d>] [<ffffffffa0482d3d>] lprocfs_rd_import+0x32d/0x6b0 [obdclass]
      RSP: 0018:ffff8806e57ffd78 EFLAGS: 00010206
      RAX: 0000000000000000 RBX: ffff880c7db5a000 RCX: 0000000000000038
      RDX: ffff880c6fd42105 RSI: 00000000fffffffe RDI: 0000000000000013
      RBP: ffff8806e57ffe38 R08: 0000000000000000 R09: 00000000fffffffe
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000105 R14: 0000000000000000 R15: 0000000000001000
      FS: 00002b8d09d85f60(0000) GS:ffff88088e440000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000018 CR3: 00000009da84e000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process grep (pid: 29413, threadinfo ffff8806e57fe000, task ffff8807bf266c50)
      Stack:
      0000000000000000 ffffea0029a6ddf8 00000200011cce48 00000010e68896a0
      <0> ffff880a4323e948 ffff880c7db5a000 ffff880a4323e438 ffff880c6fd42000
      <0> ffff8806e57ffde8 ffff880880001d80 ffff880c7d7c2300 00000000000000d0
      Call Trace:
      [<ffffffff8113e377>] ? alloc_pages_current+0x87/0xd0
      [<ffffffffa0480651>] lprocfs_fops_read+0xd1/0x1e0 [obdclass]
      [<ffffffff811b6a36>] proc_reg_read+0x76/0xb0
      [<ffffffff81157f55>] vfs_read+0xb5/0x1a0
      [<ffffffff810c5282>] ? audit_syscall_entry+0x252/0x280
      [<ffffffff81158091>] sys_read+0x51/0x90
      [<ffffffff8100c172>] system_call_fastpath+0x16/0x1b
      Code: 18 08 75 a2 48 8b 9d 68 ff ff ff 66 ff 83 78 02 00 00 48 8b 43 60 44 8b 83 28 02 00 00 44 8b b3 14 01 00 00 44
      8b a3 24 02 00 00 <48> 8b 78 18 44 89 85 58 ff ff ff e8 d3 5e dc ff 49 63 fd 48 03
      RIP [<ffffffffa0482d3d>] lprocfs_rd_import+0x32d/0x6b0 [obdclass]
      RSP <ffff8806e57ffd78>
      ==============================================================================

      And further+in-deep analysis clearly indicates this problem comes from a race between a process reading
      "/proc/fs/lustre/osc/<OST>/import" special file via lprocfs layer and other Lustre layers dealing with
      imports.

      Thanks,

      Attachments

        Activity

          People

            adilger Andreas Dilger
            lustre-bull Lustre Bull (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: