Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7777

toss 3 client kernel panic in ll_get_dir_page()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.5.5
    • toss-release-3.0-36alpha.ch6 lustre 2.5.5-3chaos-CHANGED-3.10.0-327.0.0.1chaos.ch6.x86_64
    • 3
    • 9223372036854775807

    Description

      We've started some client testing of TOSS 3 alpha against servers running toss-release-2.4-2.ch5.4.

      This is more of a heads up, perhaps there are already newer builds of the client in action somewhere? If not and you cannot reproduce I'll start an Intel issue.

      We can repeatably cause a kernel panic by doing active I/O to one file system (cp -ar <something large>) while simultaneously running ls -l in the other mounted file system, or occasionally in the same file system from a different session or node. This is regardless of the backing fs of the file system.

      Here's a representative trace:
      12:27:45 [10713.441784] BUG: unable to handle kernel paging request at 000000011df5000a
      12:27:45 [10713.442712] IP: [<ffffffffa0d20ed5>] ll_get_dir_page+0x3e5/0x1650 [lustre]
      12:27:45 [10713.442712] PGD 40dacd067 PUD 0
      12:27:45 [10713.442712] Oops: 0000 1 SMP
      12:27:45 [10713.442712] Modules linked in: lmv(OE) fld(OE) mgc(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) fid(OE) ksocklnd(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic crct10dif_common ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa iTCO_wdt iTCO_vendor_support ipmi_devintf coretemp kvm radeon dcdbas ib_mthca ttm i5000_edac drm_kms_helper ib_mad lpc_ich edac_core mfd_core ib_core drm ipmi_si serio_raw pcspkr ib_addr ipmi_msghandler i5k_amb shpchp acpi_cpufreq binfmt_misc ip_tables nfsv4 dns_resolver nfsv3 nfs fscache lockd grace nfs_acl sunrpc broadcom tg3 bnx2 igb dca i2c_algo_bit i2c_core e1000e ptp pps_core e1000
      12:27:45 [10713.442712] CPU: 2 PID: 15157 Comm: bash Tainted: G OE ------------ 3.10.0-327.0.0.1chaos.ch6.x86_64 #1
      12:27:45 [10713.442712] Hardware name: Dell Inc. PowerEdge 1950/0NK937, BIOS 1.1.0 06/21/2006
      12:27:45 [10713.442712] task: ffff8803fd255080 ti: ffff8801310e8000 task.ti: ffff8801310e8000
      12:27:45 [10713.442712] RIP: 0010:[<ffffffffa0d20ed5>] [<ffffffffa0d20ed5>] ll_get_dir_page+0x3e5/0x1650 [lustre]
      12:27:45 [10713.442712] RSP: 0018:ffff8801310ebcb0 EFLAGS: 00010002
      12:27:45 [10713.442712] RAX: 0000000000000001 RBX: ffff880409560108 RCX: 0000000000000000
      12:27:45 [10713.442712] RDX: 000000011df5000a RSI: ffff8801310ebc60 RDI: 0000000000000000
      12:27:45 [10713.442712] RBP: ffff8801310ebdc8 R08: 0000000000000000 R09: ffffffffffffffff
      12:27:45 [10713.442712] R10: 0000000000000000 R11: 0000000000000220 R12: 0000000000000000
      12:27:45 [10713.442712] R13: fffffffffffffffe R14: 000000011df5000a R15: ffff880409560258
      12:27:45 [10713.442712] FS: 00002aaaaab086c0(0000) GS:ffff88042fc80000(0000) knlGS:0000000000000000
      12:27:45 [10713.442712] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      12:27:45 [10713.442712] CR2: 000000011df5000a CR3: 0000000361820000 CR4: 00000000000007e0
      12:27:45 [10713.442712] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      12:27:45 [10713.442712] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      12:27:45 [10713.442712] Stack:
      12:27:45 [10713.442712] ffff8801310ebce8 ffff8801310ebdf0 ffff8801310ebcd0 ffff880409560270
      12:27:46 [10713.442712] ffff880409560258 ffff880409560390 ffff8801310ebce8 a707e6e818891384
      12:27:46 [10713.442712] 0000000000000000 ffffffffa0d4191a ffff8801310ebd28 ffffffffa0d6b520
      12:27:46 [10713.442712] Call Trace:
      12:27:46 [10713.442712] [<ffffffffa0d4191a>] ? __ll_inode_revalidate_it+0x1ea/0xc00 [lustre]
      12:27:46 [10713.442712] [<ffffffffa0d6b520>] ? ll_symlink+0x1d0/0x1d0 [lustre]
      12:27:46 [10713.442712] [<ffffffff8120836e>] ? mntput_no_expire+0x3e/0x120
      12:27:46 [10713.442712] [<ffffffff81208474>] ? mntput+0x24/0x40
      12:27:46 [10713.442712] [<ffffffffa0d221e5>] ll_dir_read+0xa5/0x380 [lustre]
      12:27:46 [10713.442712] [<ffffffff811fb990>] ? fillonedir+0xf0/0xf0
      12:27:46 [10713.442712] [<ffffffff81647f9a>] ? __slab_free+0x10e/0x277
      12:27:46 [10713.442712] [<ffffffffa0d2258b>] ll_readdir+0xcb/0x360 [lustre]
      12:27:46 [10713.442712] [<ffffffff811fb990>] ? fillonedir+0xf0/0xf0
      12:27:46 [10713.442712] [<ffffffff811fb990>] ? fillonedir+0xf0/0xf0
      12:27:46 [10713.442712] [<ffffffff811fb870>] vfs_readdir+0xb0/0xe0
      12:27:46 [10713.442712] [<ffffffff811fbce5>] SyS_getdents+0x95/0x130
      12:27:46 [10713.442712] [<ffffffff8165c6c9>] system_call_fastpath+0x16/0x1b
      12:27:46 [10713.442712] Code: 89 85 00 ff ff ff e8 6b 29 93 e0 49 8d 7f 08 b9 01 00 00 00 4c 89 ea 4c 89 f6 e8 87 53 5e e0 85 c0 0f 8e bf 05 00 00 4c 8b 75 88 <49> 8b 06 f6 c4 80 0f 85 27 11 00 00 f0 41 ff 46 1c 66 66 66 66
      12:27:46 [10713.442712] RIP [<ffffffffa0d20ed5>] ll_get_dir_page+0x3e5/0x1650 [lustre]
      12:27:46 [10713.442712] RSP <ffff8801310ebcb0>
      12:27:46 [10713.442712] CR2: 000000011df5000a
      12:27:46 [10713.442712] --[ end trace f520f054f8c4c1cb ]--
      12:27:46 [10713.442712] Kernel panic - not syncing: Fatal exception

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              ruth.klundt@gmail.com Ruth Klundt (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: