Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7951

NULL pointer dereference in vvp_io_write_start

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • None
    • Lustre 2.5.5
    • TOSS 2.4-7 (RHEL 6.7), Lustre 2.5.5-3chaos_2.6.32_573.18.1.1chaos.ch5.4.x86_64.x86_64
    • 3
    • 9223372036854775807

    Description

      On one compute cluster we are seeing multiple crashes per day. Console log shows:

      <ConMan> Console [syrah143] log at 2016-03-16 03:00:00 PDT.
      2016-03-16 03:47:44 BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
      2016-03-16 03:47:44 IP: [<ffffffffa0b987cd>] vvp_io_write_start+0x2ad/0x3d0 [lustre]
      2016-03-16 03:47:44 PGD fe9c89067 PUD fc112d067 PMD 0
      2016-03-16 03:47:44 Oops: 0002 [#1] SMP
      2016-03-16 03:47:44 last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/infiniband/qib0/ports/1/state
      2016-03-16 03:47:44 CPU 28
      2016-03-16 03:47:44 Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) acpi_cpufreq freq_table mperf ko2iblnd(U) lnet(U) sha512_generic crc32c_intel libcfs(U) ipmi_devintf ipmi_si ipmi_msghandler ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm uinput microcode iTCO_wdt iTCO_vendor_support wmi isci libsas scsi_transport_sas ahci joydev sb_edac edac_core lpc_ich mfd_core i2c_i801 ioatdma ib_qib(U) ib_mad ib_core ib_addr xt_owner nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_LOG xt_multiport iptable_filter ip_tables ipv6 nfs lockd fscache auth_rpcgss nfs_acl sunrpc igb dca i2c_algo_bit i2c_core ptp pps_core [last unloaded: cpufreq_ondemand]
       
      2016-03-16 03:47:44 Pid: 88396, comm: react_charge_tr Not tainted 2.6.32-573.18.1.1chaos.ch5.4.x86_64 #1 Cray Inc. SERVER-GB512X-CN/S2600JF
      2016-03-16 03:47:44 RIP: 0010:[<ffffffffa0b987cd>]  [<ffffffffa0b987cd>] vvp_io_write_start+0x2ad/0x3d0 [lustre]
      2016-03-16 03:47:44 RSP: 0018:ffff880818b3f788  EFLAGS: 00010202
      2016-03-16 03:47:44 RAX: 0000000000000000 RBX: 0000000000d162ac RCX: 0000000000000001
      2016-03-16 03:47:44 RDX: ffff880a1321c9d4 RSI: ffffffffa0bc95a0 RDI: ffff880a1321c9d4
      2016-03-16 03:47:44 RBP: ffff880818b3f7d8 R08: ffff880b49965650 R09: ffff880e736d0ee8
      2016-03-16 03:47:44 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880b49965608
      2016-03-16 03:47:44 R13: ffff880a1321c638 R14: ffff880e736d0ee8 R15: 0000000000001000
      2016-03-16 03:47:44 FS:  00002aaaaf64db20(0000) GS:ffff88085c580000(0000) knlGS:0000000000000000
      2016-03-16 03:47:44 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      2016-03-16 03:47:44 CR2: 0000000000000080 CR3: 0000000fc6de5000 CR4: 00000000000407e0
      2016-03-16 03:47:44 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      2016-03-16 03:47:44 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      2016-03-16 03:47:44 Process react_charge_tr (pid: 88396, threadinfo ffff880818b3c000, task ffff88083045f520)
      2016-03-16 03:47:44 Stack:
      2016-03-16 03:47:44  ffff88093562cf08 ffff880b49965608 ffff88093562cf08 ffff88093562cf08
      2016-03-16 03:47:44 <d> ffff880818b3f808 ffff880e736d0ee8 ffff880b49965608 ffff88093562cf08
      2016-03-16 03:47:44 <d> ffff880b49965608 ffff880b61cadd80 ffff880818b3f808 ffffffffa0671e5a
      2016-03-16 03:47:44 Call Trace:
      2016-03-16 03:47:44  [<ffffffffa0671e5a>] cl_io_start+0x6a/0x140 [obdclass]
      2016-03-16 03:47:44  [<ffffffffa0b67579>] ll_cl_init+0x3a9/0x570 [lustre]
      2016-03-16 03:47:44  [<ffffffffa066cd72>] ? cl_lock_mutex_try+0x112/0x120 [obdclass]
      2016-03-16 03:47:44  [<ffffffffa0b67933>] ll_prepare_write+0x53/0x170 [lustre]
      2016-03-16 03:47:44  [<ffffffffa0b845ee>] ll_write_begin+0x7e/0x1a0 [lustre]
      2016-03-16 03:47:44  [<ffffffff81128443>] generic_file_buffered_write+0x123/0x300
      2016-03-16 03:47:44  [<ffffffff8107ecf7>] ? current_fs_time+0x27/0x30
      2016-03-16 03:47:44  [<ffffffff81129f60>] __generic_file_aio_write+0x260/0x490
      2016-03-16 03:47:44  [<ffffffff8112a218>] generic_file_aio_write+0x88/0x100
      2016-03-16 03:47:44  [<ffffffffa0b98676>] vvp_io_write_start+0x156/0x3d0 [lustre]
      2016-03-16 03:47:44  [<ffffffffa0671e5a>] cl_io_start+0x6a/0x140 [obdclass]
      2016-03-16 03:47:44  [<ffffffffa0676964>] cl_io_loop+0xb4/0x1b0 [obdclass]
      2016-03-16 03:47:44  [<ffffffffa0b35356>] ll_file_io_generic+0x1a6/0x750 [lustre]
      2016-03-16 03:47:44  [<ffffffffa0b40169>] ll_file_aio_write+0x1b9/0x6f0 [lustre]
      2016-03-16 03:47:44  [<ffffffffa0b3ffb0>] ? ll_file_aio_write+0x0/0x6f0 [lustre]
      2016-03-16 03:47:44  [<ffffffff81192a3b>] do_sync_readv_writev+0xfb/0x140
      2016-03-16 03:47:44  [<ffffffffa0ab8a83>] ? lov_io_fini+0x383/0x480 [lov]
      2016-03-16 03:47:44  [<ffffffffa0b6a95b>] ? ll_stats_ops_tally+0x7b/0xa0 [lustre]
      2016-03-16 03:47:44  [<ffffffff810a1d50>] ? autoremove_wake_function+0x0/0x40
      2016-03-16 03:47:44  [<ffffffff81233a16>] ? security_file_permission+0x16/0x20
      2016-03-16 03:47:44  [<ffffffff81193b26>] do_readv_writev+0xd6/0x1f0
      2016-03-16 03:47:44  [<ffffffff811930fc>] ? generic_file_llseek_size+0x8c/0xd0
      2016-03-16 03:47:44  [<ffffffffa0b359d2>] ? ll_file_seek+0xd2/0x2b0 [lustre]
      2016-03-16 03:47:44  [<ffffffff81193c86>] vfs_writev+0x46/0x60
      2016-03-16 03:47:44  [<ffffffff81193db1>] sys_writev+0x51/0xd0
      2016-03-16 03:47:44  [<ffffffff8100b112>] system_call_fastpath+0x16/0x1b
      2016-03-16 03:47:44 Code: a4 24 f4 00 00 00 fe e9 d9 fe ff ff 66 90 41 8b 4c 24 78 85 c9 0f 84 0e fe ff ff 49 8b 5d 68 49 89 5c 24 60 49 8b 86 b0 00 00 00 <48> 89 98 80 00 00 00 e9 f2 fd ff ff 0f 1f 80 00 00 00 00 8b 15
      2016-03-16 03:47:44 RIP  [<ffffffffa0b987cd>] vvp_io_write_start+0x2ad/0x3d0 [lustre]
      2016-03-16 03:47:44  RSP <ffff880818b3f788>
      2016-03-16 03:47:44 CR2: 0000000000000080
      

      Latest node crashdump is 1.2GB compressed, so please advise if you want it.

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              charr Cameron Harr
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: