Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5460

Lustre client crash

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 2.4.1
    • None
    • 3
    • 15209

    Description

      Hi,

      One of our clients which exports Lustre over NFS crashed, dumped and rebooted overnight. I'm including the vmcore-dmesg here in case there is anything useful for you. I don't think we've seen this one before so it must be rare. Full vmcore available on request.

      <4>general protection fault: 0000 [#1] SMP 
      <4>last sysfs file: /sys/devices/system/node/node1/numastat
      <4>CPU 21 
      <4>Modules linked in: tcp_diag inet_diag mptctl mptbase ipmi_devintf dell_rbu nfsd exportfs autofs4 lmv(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic crc32c_intel libcfs(U) nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc ipv6 uinput raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx power_meter sg bnx2x libcrc32c mdio bnx2 dcdbas microcode serio_raw iTCO_wdt iTCO_vendor_support i7core_edac edac_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mpt2sas scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      <4>
      <4>Pid: 17387, comm: ldlm_bl_40 Not tainted 2.6.32-358.18.1.el6_lustre.x86_64 #1 Dell Inc. PowerEdge R610/0F0XJ6
      <4>RIP: 0010:[<ffffffffa058ba3e>]  [<ffffffffa058ba3e>] cl_lock_mutex_get+0x2e/0xd0 [obdclass]
      <4>RSP: 0018:ffff88046ec63c30  EFLAGS: 00010203
      <4>RAX: 5a5a5a5a5a5a5a5a RBX: ffff880572210d10 RCX: ffff880a789650b8
      <4>RDX: ffff8808e88c5448 RSI: ffff880a58758a18 RDI: ffff880572210d10
      <4>RBP: ffff88046ec63c50 R08: ffffffffa05ab7ee R09: 0000000000000000
      <4>R10: 5a5a5a5a5a5a5a5a R11: 5a5a5a5a5a5a5a5a R12: ffff880a58758a18
      <4>R13: ffff88079d568678 R14: ffff880952923b70 R15: ffff880a58758a18
      <4>FS:  00007ff150e51700(0000) GS:ffff880028340000(0000) knlGS:0000000000000000
      <4>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      <4>CR2: 00007ffbd6c4a9d4 CR3: 0000000c235b4000 CR4: 00000000000007e0
      <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>Process ldlm_bl_40 (pid: 17387, threadinfo ffff88046ec62000, task ffff880bb62b0aa0)
      <4>Stack:
      <4> ffff880b4f421740 ffff880572210d10 ffff880a58758a18 ffff88079d568678
      <4><d> ffff88046ec63c70 ffffffffa0a014b9 ffff880572210d10 ffff8808e88c5420
      <4><d> ffff88046ec63cc0 ffffffffa0a01c39 ffff880b4f421740 ffff8808e88c5448
      <4>Call Trace:
      <4> [<ffffffffa0a014b9>] lovsub_parent_lock+0x49/0x120 [lov]
      <4> [<ffffffffa0a01c39>] lovsub_lock_state+0x79/0x1b0 [lov]
      <4> [<ffffffffa0589718>] cl_lock_state_signal+0x68/0x160 [obdclass]
      <4> [<ffffffffa0589865>] cl_lock_state_set+0x55/0x190 [obdclass]
      <4> [<ffffffffa058a8b3>] cl_lock_delete0+0x53/0x1d0 [obdclass]
      <4> [<ffffffffa058ab83>] cl_lock_delete+0x153/0x1a0 [obdclass]
      <4> [<ffffffffa0968ac6>] osc_ldlm_blocking_ast+0x146/0x350 [osc]
      <4> [<ffffffffa06b91bc>] ldlm_cancel_callback+0x6c/0x1a0 [ptlrpc]
      <4> [<ffffffffa06d341a>] ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc]
      <4> [<ffffffffa06d670e>] ldlm_cli_cancel_list_local+0xee/0x290 [ptlrpc]
      <4> [<ffffffffa06dc1b0>] ldlm_bl_thread_main+0x100/0x3d0 [ptlrpc]
      <4> [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      <4> [<ffffffffa06dc0b0>] ? ldlm_bl_thread_main+0x0/0x3d0 [ptlrpc]
      <4> [<ffffffff8100c0ca>] child_rip+0xa/0x20
      <4> [<ffffffffa06dc0b0>] ? ldlm_bl_thread_main+0x0/0x3d0 [ptlrpc]
      <4> [<ffffffffa06dc0b0>] ? ldlm_bl_thread_main+0x0/0x3d0 [ptlrpc]
      <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      <4>Code: e5 41 55 41 54 53 48 83 ec 08 0f 1f 44 00 00 65 48 8b 04 25 c0 cb 00 00 48 39 86 90 00 00 00 48 89 fb 49 89 f4 74 56 48 8b 46 28 <4c> 8b 28 e8 ba 58 ff ff 41 0f b6 b5 96 00 00 00 85 f6 74 23 8b 
      <1>RIP  [<ffffffffa058ba3e>] cl_lock_mutex_get+0x2e/0xd0 [obdclass]
      <4> RSP <ffff88046ec63c30>
      

      Attachments

        Activity

          People

            jay Jinshan Xiong (Inactive)
            daire Daire Byrne (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: