Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1854

system crash when reading the file /proc/fs/lustre/ost/OSS/ost_create/req_history

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.4.0
    • Lustre 2.3.0
    • lustre 2.2.93
      bullxlinux distribution (based on redhat 6.2)
      kernel 2.6.32-220
    • 3
    • 4442

    Description

      The lustre version is 2.2.93.

      When reading the file /proc/fs/lustre/ost/OSS/ost_create/req_history, the system crashed with LBUG ASSERTION( !list_empty(&svcpt->scp_hist_reqs).

      Here are some information from the core dump

            KERNEL: /usr/lib/debug/lib/modules/2.6.32-220.23.1.bl6.Bull.28.8.x86_64/vmlinux
          DUMPFILE: /var/crash/127.0.0.1-2012-09-07-09:51:01/vmcore  [PARTIAL DUMP]
              CPUS: 16
              DATE: Fri Sep  7 09:50:45 2012
            UPTIME: 1 days, 19:58:49
      LOAD AVERAGE: 0.05, 0.05, 0.05
             TASKS: 1006
          NODENAME: mo88
           RELEASE: 2.6.32-220.23.1.bl6.Bull.28.8.x86_64
           VERSION: #1 SMP Thu Jul 5 17:34:18 CEST 2012
           MACHINE: x86_64  (2199 Mhz)
            MEMORY: 32 GB
             PANIC: "Kernel panic - not syncing: LBUG"
               PID: 29617
           COMMAND: "cat"
              TASK: ffff8806e65437d0  [THREAD_INFO: ffff8804dbf1c000]
               CPU: 9
             STATE: TASK_RUNNING (PANIC)
      
      crash> bt
      PID: 29617  TASK: ffff8806e65437d0  CPU: 9   COMMAND: "cat"
       #0 [ffff8804dbf1fbf0] machine_kexec at ffffffff8102895b
       #1 [ffff8804dbf1fc50] crash_kexec at ffffffff810a4622
       #2 [ffff8804dbf1fd20] panic at ffffffff81484647
       #3 [ffff8804dbf1fda0] lbug_with_loc at ffffffffa0680f6b [libcfs]
       #4 [ffff8804dbf1fdc0] ptlrpc_lprocfs_svc_req_history_seek at ffffffffa0c30104 [ptlrpc]
       #5 [ffff8804dbf1fdd0] ptlrpc_lprocfs_svc_req_history_next at ffffffffa0c301e1 [ptlrpc]
       #6 [ffff8804dbf1fe20] seq_read at ffffffff81185e9a
       #7 [ffff8804dbf1fea0] proc_reg_read at ffffffff811c84ee
       #8 [ffff8804dbf1fef0] vfs_read at ffffffff81163a15
       #9 [ffff8804dbf1ff30] sys_read at ffffffff81163b51
      #10 [ffff8804dbf1ff80] system_call_fastpath at ffffffff810030f2
          RIP: 0000003dc64d83f0  RSP: 00007fff6cb0c9e0  RFLAGS: 00010206
          RAX: 0000000000000000  RBX: ffffffff810030f2  RCX: 00000000024a7030
          RDX: 0000000000008000  RSI: 000000000249f000  RDI: 0000000000000003
          RBP: 000000000249f000   R8: 0000000000000003   R9: 0000000001000000
          R10: 0000000000008fff  R11: 0000000000000246  R12: ffffffffffff8000
          R13: 0000000000000003  R14: 0000000000008000  R15: 0000000000000003
          ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b
      
      crash> dmesg | tail -n 50
      Lustre: fsperf-OST0005: Now serving fsperf-OST0005 on /dev/dm-11 with recovery enabled
      Lustre: 27386:0:(ldlm_lib.c:2110:target_recovery_init()) RECOVERY: service fsperf-OST000a, 1 recoverable clients, last_transno 1340929
      Lustre: 27386:0:(ldlm_lib.c:2110:target_recovery_init()) Skipped 3 previous similar messages
      Lustre: fsperf-OST000a: Now serving fsperf-OST000a on /dev/dm-26 with recovery enabled
      Lustre: Skipped 3 previous similar messages
      Lustre: 27419:0:(ldlm_lib.c:2110:target_recovery_init()) RECOVERY: service fsperf-OST0001, 1 recoverable clients, last_transno 1340929
      Lustre: 27419:0:(ldlm_lib.c:2110:target_recovery_init()) Skipped 6 previous similar messages
      Lustre: fsperf-OST0001: Now serving fsperf-OST0001 on /dev/dm-16 with recovery enabled
      Lustre: Skipped 6 previous similar messages
      LustreError: 137-5: UUID 'fsperf-OST000f_UUID' is not available for connect (no target)
      Lustre: fsperf-OST0001: Will be in recovery for at least 5:00, or until 1 client reconnects
      Lustre: fsperf-OST000b: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted.
      Lustre: fsperf-OST000b: received MDS connection from 32.0.0.39@o2ib1
      Lustre: Skipped 14 previous similar messages
      Lustre: fsperf-OST0006: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted.
      Lustre: fsperf-OST000e: received MDS connection from 32.0.0.39@o2ib1
      Lustre: Skipped 13 previous similar messages
      Lustre: Echo OBD driver; http://www.lustre.org/
      mlx4_core 0000:04:00.0: vpd r/w failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update.
      mlx4_core 0000:82:00.0: vpd r/w failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update.
      process `cat' is using deprecated sysctl (syscall) net.ipv6.neigh.default.retrans_time; Use net.ipv6.neigh.default.retrans_time_ms instead.
      LustreError: 29617:0:(lproc_ptlrpc.c:431:ptlrpc_lprocfs_svc_req_history_seek()) ASSERTION( !list_empty(&svcpt->scp_hist_reqs) ) failed: 
      LustreError: 29617:0:(lproc_ptlrpc.c:431:ptlrpc_lprocfs_svc_req_history_seek()) LBUG
      Pid: 29617, comm: cat
      
      Call Trace:
       [<ffffffffa0680905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa0680f17>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa0c30104>] ptlrpc_lprocfs_svc_req_history_seek+0xf4/0x100 [ptlrpc]
       [<ffffffffa0c301e1>] ptlrpc_lprocfs_svc_req_history_next+0x71/0x1b0 [ptlrpc]
       [<ffffffff81185e9a>] seq_read+0x24a/0x3f0
       [<ffffffff811c84ee>] proc_reg_read+0x7e/0xc0
       [<ffffffff81163a15>] vfs_read+0xb5/0x1a0
       [<ffffffff810c0e1a>] ? audit_syscall_entry+0x26a/0x290
       [<ffffffff81163b51>] sys_read+0x51/0x90
       [<ffffffff810030f2>] system_call_fastpath+0x16/0x1b
      
      Kernel panic - not syncing: LBUG
      Pid: 29617, comm: cat Not tainted 2.6.32-220.23.1.bl6.Bull.28.8.x86_64 #1
      Call Trace:
       [<ffffffff81484640>] ? panic+0x78/0x143
       [<ffffffffa0680f6b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
       [<ffffffffa0c30104>] ? ptlrpc_lprocfs_svc_req_history_seek+0xf4/0x100 [ptlrpc]
       [<ffffffffa0c301e1>] ? ptlrpc_lprocfs_svc_req_history_next+0x71/0x1b0 [ptlrpc]
       [<ffffffff81185e9a>] ? seq_read+0x24a/0x3f0
       [<ffffffff811c84ee>] ? proc_reg_read+0x7e/0xc0
       [<ffffffff81163a15>] ? vfs_read+0xb5/0x1a0
       [<ffffffff810c0e1a>] ? audit_syscall_entry+0x26a/0x290
       [<ffffffff81163b51>] ? sys_read+0x51/0x90
       [<ffffffff810030f2>] ? system_call_fastpath+0x16/0x1b
      
      crash> files
      PID: 29617  TASK: ffff8806e65437d0  CPU: 9   COMMAND: "cat"
      ROOT: /    CWD: /root
       FD       FILE            DENTRY           INODE       TYPE PATH
        0 ffff88045cba6180 ffff880217b5b800 ffff88046a2e89c8 FIFO 
        1 ffff88045cba6600 ffff880217b5bbc0 ffff8802ace6e148 FIFO 
        2 ffff88045cba6cc0 ffff880217b5b380 ffff88023fd6a048 FIFO 
        3 ffff880872bc5240 ffff880519514480 ffff880874370d38 REG  /proc/fs/lustre/ost/OSS/ost_create/req_history
      
      
      

      I can provide additional information from the dump if needed.

      Attachments

        Activity

          People

            liang Liang Zhen (Inactive)
            pichong Gregoire Pichon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: