Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3281

IO Fails - client stack overrun

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0, Lustre 2.8.0
    • Lustre 2.4.0
    • hyperion/LLNL
    • 3
    • 8119

    Description

      Testing fix for LU-3188 http://review.whamcloud.com/#change,6191
      Client panics immediately when running IOR

      2013-05-05 12:06:15 Lustre: DEBUG MARKER: == test iorssf: iorssf == 12:06:15
      2013-05-05 12:30:42 BUG: scheduling while atomic: ior/5692/0x10000002
      2013-05-05 12:30:42 BUG: unable to handle kernel paging request at 0000000315c2e000
      2013-05-05 12:30:42 IP: [<ffffffff810568e4>] update_curr+0x144/0x1f0
      2013-05-05 12:30:42 PGD 106a964067 PUD 0
      2013-05-05 12:30:42 Thread overran stack, or stack corrupted
      2013-05-05 12:30:42 Oops: 0000 [#1] SMP
      2013-05-05 12:30:42 last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:03:00.0/infiniband/mlx4_0/ports/1/pkeys/127
      2013-05-05 12:30:42 CPU 9
      2013-05-05 12:30:42 Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ko2iblnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) ipmi_devintf acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr mlx4_ib ib_sa ib_mad iw_cxgb4 iw_cxgb3 ib_core ext4 mbcache jbd2 dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm sg sd_mod crc_t10dif wmi dcdbas sb_edac edac_core ahci i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ioatdma nfs lockd fscache auth_rpcgss nfs_acl sunrpc mlx4_en mlx4_core igb dca ptp pps_core be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: cpufreq_ondemand]
      2013-05-05 12:30:43
      
      2013-05-05 12:30:43 Pid: 5692, comm: ior Not tainted 2.6.32-358.2.1.el6.x86_64 #1 Dell Inc. PowerEdge C6220/0HYFFG
      2013-05-05 12:30:43 RIP: 0010:[<ffffffff810568e4>]  [<ffffffff810568e4>] update_curr+0x144/0x1f0
      2013-05-05 12:30:43 RSP: 0018:ffff88089c423db8  EFLAGS: 00010086
      2013-05-05 12:30:43 RAX: ffff880840d79540 RBX: 0000000072806048 RCX: ffff880877f101c0
      2013-05-05 12:30:43 RDX: 00000000000192d8 RSI: 0000000000000000 RDI: ffff880840d79578
      2013-05-05 12:30:43 RBP: ffff88089c423de8 R08: ffffffff8160bb65 R09: 0000000000000007
      2013-05-05 12:30:43 R10: 0000000000000010 R11: 0000000000000007 R12: ffff88089c436768
      2013-05-05 12:30:43 R13: 00000000007c9fa8 R14: 0000082565f22284 R15: ffff880840d79540
      2013-05-05 12:30:43 FS:  00002aaaafebf8c0(0000) GS:ffff88089c420000(0000) knlGS:0000000000000000
      2013-05-05 12:30:43 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      2013-05-05 12:30:43 CR2: 0000000315c2e000 CR3: 000000106aea6000 CR4: 00000000000407e0
      2013-05-05 12:30:43 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      2013-05-05 12:30:43 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      2013-05-05 12:30:43 Process ior (pid: 5692, threadinfo ffff880872806000, task ffff880840d79540)
      2013-05-05 12:30:43 Stack:
      2013-05-05 12:30:43  ffff88089c423dc8 ffffffff81013783 ffff880840d79578 ffff88089c436768
      2013-05-05 12:30:43 <d> 0000000000000000 0000000000000000 ffff88089c423e18 ffffffff81056e9b
      2013-05-05 12:30:43 <d> ffff88089c436700 0000000000000009 0000000000016700 0000000000000009
      2013-05-05 12:30:43 Call Trace:
      2013-05-05 12:30:43  <IRQ>
      2013-05-05 12:30:43  [<ffffffff81013783>] ? native_sched_clock+0x13/0x80
      2013-05-05 12:30:43 BUG: unable to handle kernel paging request at 000000000001182f
      2013-05-05 12:30:43 IP: [<ffffffff8100f4dd>] print_context_stack+0xad/0x140
      2013-05-05 12:30:43 PGD 106a964067 PUD 106a825067 PMD 0
      2013-05-05 12:30:43 Thread overran stack, or stack corrupted
      2013-05-05 12:30:43 Oops: 0000 [#2] SMP
      2013-05-05 12:30:43 last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:03:00.0/infiniband/mlx4_0/ports/1/pkeys/127
      2013-05-05 12:30:43 CPU 9
      

      Same as LU-3188 - continuous stack dumps until node crash

      Attachments

        1. console.iwc113
          103 kB
        2. console.iwc21
          79 kB
        3. console.iwc4
          93 kB

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: