Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3281

IO Fails - client stack overrun

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.4.0
    • Fix Version/s: Lustre 2.4.0, Lustre 2.8.0
    • Labels:
    • Environment:
      hyperion/LLNL
    • Severity:
      3
    • Rank (Obsolete):
      8119

      Description

      Testing fix for LU-3188 http://review.whamcloud.com/#change,6191
      Client panics immediately when running IOR

      2013-05-05 12:06:15 Lustre: DEBUG MARKER: == test iorssf: iorssf == 12:06:15
      2013-05-05 12:30:42 BUG: scheduling while atomic: ior/5692/0x10000002
      2013-05-05 12:30:42 BUG: unable to handle kernel paging request at 0000000315c2e000
      2013-05-05 12:30:42 IP: [<ffffffff810568e4>] update_curr+0x144/0x1f0
      2013-05-05 12:30:42 PGD 106a964067 PUD 0
      2013-05-05 12:30:42 Thread overran stack, or stack corrupted
      2013-05-05 12:30:42 Oops: 0000 [#1] SMP
      2013-05-05 12:30:42 last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:03:00.0/infiniband/mlx4_0/ports/1/pkeys/127
      2013-05-05 12:30:42 CPU 9
      2013-05-05 12:30:42 Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ko2iblnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) ipmi_devintf acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr mlx4_ib ib_sa ib_mad iw_cxgb4 iw_cxgb3 ib_core ext4 mbcache jbd2 dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm sg sd_mod crc_t10dif wmi dcdbas sb_edac edac_core ahci i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ioatdma nfs lockd fscache auth_rpcgss nfs_acl sunrpc mlx4_en mlx4_core igb dca ptp pps_core be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: cpufreq_ondemand]
      2013-05-05 12:30:43
      
      2013-05-05 12:30:43 Pid: 5692, comm: ior Not tainted 2.6.32-358.2.1.el6.x86_64 #1 Dell Inc. PowerEdge C6220/0HYFFG
      2013-05-05 12:30:43 RIP: 0010:[<ffffffff810568e4>]  [<ffffffff810568e4>] update_curr+0x144/0x1f0
      2013-05-05 12:30:43 RSP: 0018:ffff88089c423db8  EFLAGS: 00010086
      2013-05-05 12:30:43 RAX: ffff880840d79540 RBX: 0000000072806048 RCX: ffff880877f101c0
      2013-05-05 12:30:43 RDX: 00000000000192d8 RSI: 0000000000000000 RDI: ffff880840d79578
      2013-05-05 12:30:43 RBP: ffff88089c423de8 R08: ffffffff8160bb65 R09: 0000000000000007
      2013-05-05 12:30:43 R10: 0000000000000010 R11: 0000000000000007 R12: ffff88089c436768
      2013-05-05 12:30:43 R13: 00000000007c9fa8 R14: 0000082565f22284 R15: ffff880840d79540
      2013-05-05 12:30:43 FS:  00002aaaafebf8c0(0000) GS:ffff88089c420000(0000) knlGS:0000000000000000
      2013-05-05 12:30:43 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      2013-05-05 12:30:43 CR2: 0000000315c2e000 CR3: 000000106aea6000 CR4: 00000000000407e0
      2013-05-05 12:30:43 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      2013-05-05 12:30:43 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      2013-05-05 12:30:43 Process ior (pid: 5692, threadinfo ffff880872806000, task ffff880840d79540)
      2013-05-05 12:30:43 Stack:
      2013-05-05 12:30:43  ffff88089c423dc8 ffffffff81013783 ffff880840d79578 ffff88089c436768
      2013-05-05 12:30:43 <d> 0000000000000000 0000000000000000 ffff88089c423e18 ffffffff81056e9b
      2013-05-05 12:30:43 <d> ffff88089c436700 0000000000000009 0000000000016700 0000000000000009
      2013-05-05 12:30:43 Call Trace:
      2013-05-05 12:30:43  <IRQ>
      2013-05-05 12:30:43  [<ffffffff81013783>] ? native_sched_clock+0x13/0x80
      2013-05-05 12:30:43 BUG: unable to handle kernel paging request at 000000000001182f
      2013-05-05 12:30:43 IP: [<ffffffff8100f4dd>] print_context_stack+0xad/0x140
      2013-05-05 12:30:43 PGD 106a964067 PUD 106a825067 PMD 0
      2013-05-05 12:30:43 Thread overran stack, or stack corrupted
      2013-05-05 12:30:43 Oops: 0000 [#2] SMP
      2013-05-05 12:30:43 last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:03:00.0/infiniband/mlx4_0/ports/1/pkeys/127
      2013-05-05 12:30:43 CPU 9
      

      Same as LU-3188 - continuous stack dumps until node crash

        Attachments

        1. console.iwc113
          103 kB
        2. console.iwc21
          79 kB
        3. console.iwc4
          93 kB

          Issue Links

            Activity

              People

              • Assignee:
                bobijam Zhenyu Xu
                Reporter:
                cliffw Cliff White (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: