Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4886

Kernel Panic "cl_lock_put"

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.1.5
    • None
    • Linux 2.6.32-279.19.1.el6_lustre.x86_64 #1 SMP
    • 3
    • 13522

    Description

      We have a kernel crash on Lustre Client 2.1.5 with the following log:

      [root@r01 ~]# crash /usr/lib/debug/lib/modules/2.6.32-279.19.1.el6_lustre.x86_64/vmlinux /var/crash/127.0.0.1-2014-04-11-17\:36\:14/vmcore

      crash 6.0.4-2.el6
      Copyright (C) 2002-2012 Red Hat, Inc.
      Copyright (C) 2004, 2005, 2006 IBM Corporation
      Copyright (C) 1999-2006 Hewlett-Packard Co
      Copyright (C) 2005, 2006 Fujitsu Limited
      Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
      Copyright (C) 2005 NEC Corporation
      Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
      Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
      This program is free software, covered by the GNU General Public License,
      and you are welcome to change it and/or distribute copies of it under
      certain conditions. Enter "help copying" to see the conditions.
      This program has absolutely no warranty. Enter "help warranty" for details.

      GNU gdb (GDB) 7.3.1
      Copyright (C) 2011 Free Software Foundation, Inc.
      License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law. Type "show copying"
      and "show warranty" for details.
      This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-279.19.1.el6_lustre.x86_64/vmlinux
      DUMPFILE: /var/crash/127.0.0.1-2014-04-11-17:36:14/vmcore [PARTIAL DUMP]
      CPUS: 16
      DATE: Fri Apr 11 17:35:10 2014
      UPTIME: 12 days, 06:09:56
      LOAD AVERAGE: 1.22, 1.18, 1.57
      TASKS: 604
      NODENAME: r01
      RELEASE: 2.6.32-279.19.1.el6_lustre.x86_64
      VERSION: #1 SMP Wed Mar 20 16:37:18 PDT 2013
      MACHINE: x86_64 (2400 Mhz)
      MEMORY: 12 GB
      PANIC: ""
      PID: 28331
      COMMAND: "ldlm_bl_00"
      TASK: ffff880334904aa0 [THREAD_INFO: ffff88023c770000]
      CPU: 6
      STATE: TASK_RUNNING (PANIC)

      crash> log

      ...

      Pid: 28331, comm: ldlm_bl_00 Not tainted 2.6.32-279.19.1.el6_lustre.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH
      RIP: 0010:[<ffffffffa04de478>] [<ffffffffa04de478>] cl_lock_put+0x118/0x490 [obdclass]
      RSP: 0018:ffff88023c771c40 EFLAGS: 00010246
      RAX: 0000000000000001 RBX: 5a5a5a5a5a5a5a5a RCX: ffff8801410d37b8
      RDX: ffffffffa04ff485 RSI: 5a5a5a5a5a5a5a5a RDI: ffff880181fc3930
      RBP: ffff88023c771c70 R08: ffffffffa04ef540 R09: 00000000000002f4
      R10: 00000000deadbeef R11: 0000000000000000 R12: ffff880181fc3930
      R13: ffff880181fc3930 R14: ffff88018be93420 R15: ffff88023c771ca0
      FS: 00007f63a402f700(0000) GS:ffff8801c5840000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000003b1faabc30 CR3: 00000001cf1a7000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process ldlm_bl_00 (pid: 28331, threadinfo ffff88023c770000, task ffff880334904aa0)
      Stack:
      ffff88023c771c70 ffff880132aaf000 ffff88018be93420 ffff880181fc3930
      <d> ffff88018be93420 ffff88023c771ca0 ffff88023c771ce0 ffffffffa0953b30
      <d> ffff880200000000 ffff8801410d37b8 ffff8801410d37b8 00000002a0393092
      Call Trace:
      [<ffffffffa0953b30>] osc_ldlm_blocking_ast+0xb0/0x380 [osc]
      [<ffffffffa05e4cc0>] ldlm_cancel_callback+0x60/0x100 [ptlrpc]
      [<ffffffffa05ff14b>] ldlm_cli_cancel_local+0x7b/0x380 [ptlrpc]
      [<ffffffffa0602fd8>] ldlm_cli_cancel+0x58/0x3a0 [ptlrpc]
      [<ffffffffa0952af1>] osc_lock_cancel+0xe1/0x1b0 [osc]
      [<ffffffffa04d544d>] ? cl_env_nested_get+0x5d/0xc0 [obdclass]
      [<ffffffffa04db225>] cl_lock_cancel0+0x75/0x160 [obdclass]
      [<ffffffffa04dbf0b>] cl_lock_cancel+0x13b/0x140 [obdclass]
      [<ffffffffa0953bba>] osc_ldlm_blocking_ast+0x13a/0x380 [osc]
      [<ffffffffa0606123>] ldlm_handle_bl_callback+0x123/0x2e0 [ptlrpc]
      [<ffffffffa0606561>] ldlm_bl_thread_main+0x281/0x3d0 [ptlrpc]
      [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20
      [<ffffffffa06062e0>] ? ldlm_bl_thread_main+0x0/0x3d0 [ptlrpc]
      [<ffffffff8100c0ca>] child_rip+0xa/0x20
      [<ffffffffa06062e0>] ? ldlm_bl_thread_main+0x0/0x3d0 [ptlrpc]
      [<ffffffffa06062e0>] ? ldlm_bl_thread_main+0x0/0x3d0 [ptlrpc]
      [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Code: 00 00 00 00 00 c7 05 5c db 06 00 01 00 00 00 48 c7 c7 a0 bf 54 a0 8b 13 4c 8b 45 08 31 c0 e8 d0 d9 ea ff eb 12 66 0f 1f 44 00 00 <48> 8b 43 28 48 8b 40 08 4c 8b 68 18 f0 ff 0b 0f 94 c0 84 c0 74
      RIP [<ffffffffa04de478>] cl_lock_put+0x118/0x490 [obdclass]
      RSP <ffff88023c771c40>

      Cluster configuration:

      Lustre Server MGS/MDS - mmp-2
      Lustre Servers OSS - n11, n12, n13, n14, n15, n21, n22, n23, n24, n25
      Lustre Clients - r01, r02, r03, r04, mmp-1, vn-1, cln01, cln02, cln03, cln04

      (refer to the diagram "20140113 - Hardware Diagram v0.1_R3.gif" in attachment)

      Environment:
      Linux 2.6.32-279.19.1.el6_lustre.x86_64 #1 SMP

      Mount points:
      OSS:
      /dev/md11 on /lustre/ost type lustre (rw,noauto,_netdev,abort_recov)

      MGS/MDS:
      /dev/lustre_mgs on /lustre/mgs type lustre (rw,noauto,_netdev,abort_recov)
      /dev/lustre_mdt1 on /lustre/mdt1 type lustre (rw,noauto,_netdev,abort_recov)

      Clients (r01, r02, r03, r04, mmp-1, vn-1):
      mmp-2@tcp:mmp-1@tcp:/lustre1 on /array1 type lustre (rw,noauto,_netdev,noflock,abort_recov,lazystatfs)

      Clients (cln01, cln02, cln03, cln04):
      mmp-2@tcp:mmp-1@tcp:/lustre1 on /array1 type lustre (rw,noauto,_netdev,localflock,abort_recov,lazystatfs)

      Stripe config:
      [root@mmp-1 ~]# lfs getstripe /array1/.
      /array1/.
      stripe_count: 1 stripe_size: 1048576 stripe_offset: -1

      kdump config:
      core_collector makedumpfile -c --message-level 1 -d 31

      We have a crash dump file, and if you need it for analysis, we are ready to upload it.

      Attachments

        Activity

          People

            wc-triage WC Triage
            rustequal Rustem Bikboulatov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: