Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2256

NULL pointer dereference in seq_server_alloc_meta+0x51e/0x700

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.4.0
    • 3
    • 5394

    Description

      Our Grove-Test MDS crashed due to the following:

      2012-10-30 23:11:40 BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      2012-10-30 23:11:40 IP: [<ffffffffa05c393e>] seq_server_alloc_meta+0x51e/0x700 [fid]
      2012-10-30 23:11:40 PGD 20168e8067 PUD 2010af0067 PMD 0 
      2012-10-30 23:11:40 Oops: 0000 [#1] SMP 
      2012-10-30 23:11:40 last sysfs file: /sys/devices/pci0000:80/0000:80:02.2/0000:83:00.0/host7/port-7:1/expander-7:1/port-7:1:18/end_device-7:1:18/target7:0:42/7:0:42:0/timeout
      2012-10-30 23:11:40 CPU 13 
      2012-10-30 23:11:40 Modules linked in: osp(U) mdt(U) mdd(U) lod(U) mgs(U) mgc(U) osd_zfs(U) lquota(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) ko2iblnd(U) lnet(U) libcfs(U) acpi_cpufreq freq_table mperf sha512_generic sha256_generic ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa mlx4_ib ib_mad ib_core dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath dm_mod vhost_net macvtap macvlan tun kvm zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate sg ses enclosure sd_mod crc_t10dif isci libsas wmi mpt2sas scsi_transport_sas raid_class sb_edac edac_core i2c_i801 i2c_core ahci iTCO_wdt iTCO_vendor_support ioatdma shpchp ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc mlx4_en mlx4_core igb dca [last unloaded: libcfs]
      2012-10-30 23:11:40 
      2012-10-30 23:11:40 Pid: 33877, comm: mdt_mdss_0001 Tainted: P        W  ----------------   2.6.32-220.23.1.1chaos.ch5.x86_64 #1 appro 2620x-in/S2600GZ
      2012-10-30 23:11:40 RIP: 0010:[<ffffffffa05c393e>]  [<ffffffffa05c393e>] seq_server_alloc_meta+0x51e/0x700 [fid]
      2012-10-30 23:11:40 RSP: 0018:ffff881fc88f9ca0  EFLAGS: 00010246
      2012-10-30 23:11:40 RAX: 0000000000000000 RBX: 0000000200005dd8 RCX: 00000002000061c0
      2012-10-30 23:11:40 RDX: 00000000000003e8 RSI: ffff880f6f574cc0 RDI: ffff880fc9589f40
      2012-10-30 23:11:40 RBP: ffff881fc88f9ce0 R08: 0000000000000000 R09: ffff881b34578600
      2012-10-30 23:11:40 R10: 0000000000000009 R11: ffffffffa0c86db0 R12: ffff881b345787e8
      2012-10-30 23:11:40 R13: ffff880f6f574d30 R14: ffff880f6f574cc0 R15: ffff880fc9589f40
      2012-10-30 23:11:40 FS:  00002aaaab47e700(0000) GS:ffff8810788a0000(0000) knlGS:0000000000000000
      2012-10-30 23:11:40 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      2012-10-30 23:11:40 CR2: 0000000000000010 CR3: 00000020110da000 CR4: 00000000000406e0
      2012-10-30 23:11:40 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      2012-10-30 23:11:40 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      2012-10-30 23:11:40 Process mdt_mdss_0001 (pid: 33877, threadinfo ffff881fc88f8000, task ffff881fc88f2aa0)
      2012-10-30 23:11:40 Stack:
      2012-10-30 23:11:40  ffff881fc88f9cb0 ffff8809c74e3000 ffff880fc9589940 ffff8809c74e3000
      2012-10-30 23:11:40 <0> ffff880fc9589940 ffff880fc9589f40 ffff881b345787e8 00000000ffffffea
      2012-10-30 23:11:40 <0> ffff881fc88f9d30 ffffffffa05c3e9f ffff881fc88f9d10 ffffc900c5ea4988
      2012-10-30 23:11:40 Call Trace:
      2012-10-30 23:11:40  [<ffffffffa05c3e9f>] seq_query+0x37f/0x6d0 [fid]
      2012-10-30 23:11:40  [<ffffffffa0eef322>] mdt_handle_common+0x932/0x1760 [mdt]
      2012-10-30 23:11:40  [<ffffffffa0ef01c5>] mdt_mdss_handle+0x15/0x20 [mdt]
      2012-10-30 23:11:40  [<ffffffffa0bee8cc>] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
      2012-10-30 23:11:40  [<ffffffffa088a6be>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      2012-10-30 23:11:40  [<ffffffffa089c14f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
      2012-10-30 23:11:40  [<ffffffffa0be5c79>] ? ptlrpc_wait_event+0xa9/0x2a0 [ptlrpc]
      2012-10-30 23:11:40  [<ffffffff81051ba3>] ? __wake_up+0x53/0x70
      2012-10-30 23:11:40  [<ffffffffa0befebc>] ptlrpc_main+0xc0c/0x19f0 [ptlrpc]
      2012-10-30 23:11:40  [<ffffffffa0bef2b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
      2012-10-30 23:11:40  [<ffffffff8100c14a>] child_rip+0xa/0x20
      2012-10-30 23:11:40  [<ffffffffa0bef2b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
      2012-10-30 23:11:40  [<ffffffffa0bef2b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
      2012-10-30 23:11:40  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      2012-10-30 23:11:40 Code: d1 fc ff ff 66 0f 1f 84 00 00 00 00 00 49 8b 86 f8 00 00 00 49 8b 96 e8 00 00 00 4c 89 f6 49 8b 5e 30 49 8b 0e 4c 89 ff 48 8b 00 <48> 8b 40 10 48 8b 40 28 48 63 80 40 01 00 00 49 89 5e 18 49 0f 
      2012-10-30 23:11:40 RIP  [<ffffffffa05c393e>] seq_server_alloc_meta+0x51e/0x700 [fid]
      2012-10-30 23:11:40  RSP <ffff881fc88f9ca0>
      2012-10-30 23:11:40 CR2: 0000000000000010
      

      The specific Lustre version installed:

      2012-10-29 16:37:23 Lustre: Lustre: Build Version: 2.3.54-1chaos-2surya1-2surya1--PRISTINE-2.6.32-220.23.1.1chaos.ch5.x86_64
      

      This is the 2.3.54 tag, with our local LLNL patches that haven't yet landed, and a couple patches for LU-2139 (which shouldn't affect the MDS stability).

      We did successfully get a crash dump if it is needed:

      2012-10-30 23:27:02 The dumpfile is saved to /var/crash/vmcore-grove-mds2-2012-10-31-06:11:59.
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              prakash Prakash Surya (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: