Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.3.0
-
lustre 2.2.93
bullxlinux distribution (based on redhat 6.2)
kernel 2.6.32-220
-
3
-
4442
Description
The lustre version is 2.2.93.
When reading the file /proc/fs/lustre/ost/OSS/ost_create/req_history, the system crashed with LBUG ASSERTION( !list_empty(&svcpt->scp_hist_reqs).
Here are some information from the core dump
KERNEL: /usr/lib/debug/lib/modules/2.6.32-220.23.1.bl6.Bull.28.8.x86_64/vmlinux DUMPFILE: /var/crash/127.0.0.1-2012-09-07-09:51:01/vmcore [PARTIAL DUMP] CPUS: 16 DATE: Fri Sep 7 09:50:45 2012 UPTIME: 1 days, 19:58:49 LOAD AVERAGE: 0.05, 0.05, 0.05 TASKS: 1006 NODENAME: mo88 RELEASE: 2.6.32-220.23.1.bl6.Bull.28.8.x86_64 VERSION: #1 SMP Thu Jul 5 17:34:18 CEST 2012 MACHINE: x86_64 (2199 Mhz) MEMORY: 32 GB PANIC: "Kernel panic - not syncing: LBUG" PID: 29617 COMMAND: "cat" TASK: ffff8806e65437d0 [THREAD_INFO: ffff8804dbf1c000] CPU: 9 STATE: TASK_RUNNING (PANIC) crash> bt PID: 29617 TASK: ffff8806e65437d0 CPU: 9 COMMAND: "cat" #0 [ffff8804dbf1fbf0] machine_kexec at ffffffff8102895b #1 [ffff8804dbf1fc50] crash_kexec at ffffffff810a4622 #2 [ffff8804dbf1fd20] panic at ffffffff81484647 #3 [ffff8804dbf1fda0] lbug_with_loc at ffffffffa0680f6b [libcfs] #4 [ffff8804dbf1fdc0] ptlrpc_lprocfs_svc_req_history_seek at ffffffffa0c30104 [ptlrpc] #5 [ffff8804dbf1fdd0] ptlrpc_lprocfs_svc_req_history_next at ffffffffa0c301e1 [ptlrpc] #6 [ffff8804dbf1fe20] seq_read at ffffffff81185e9a #7 [ffff8804dbf1fea0] proc_reg_read at ffffffff811c84ee #8 [ffff8804dbf1fef0] vfs_read at ffffffff81163a15 #9 [ffff8804dbf1ff30] sys_read at ffffffff81163b51 #10 [ffff8804dbf1ff80] system_call_fastpath at ffffffff810030f2 RIP: 0000003dc64d83f0 RSP: 00007fff6cb0c9e0 RFLAGS: 00010206 RAX: 0000000000000000 RBX: ffffffff810030f2 RCX: 00000000024a7030 RDX: 0000000000008000 RSI: 000000000249f000 RDI: 0000000000000003 RBP: 000000000249f000 R8: 0000000000000003 R9: 0000000001000000 R10: 0000000000008fff R11: 0000000000000246 R12: ffffffffffff8000 R13: 0000000000000003 R14: 0000000000008000 R15: 0000000000000003 ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b crash> dmesg | tail -n 50 Lustre: fsperf-OST0005: Now serving fsperf-OST0005 on /dev/dm-11 with recovery enabled Lustre: 27386:0:(ldlm_lib.c:2110:target_recovery_init()) RECOVERY: service fsperf-OST000a, 1 recoverable clients, last_transno 1340929 Lustre: 27386:0:(ldlm_lib.c:2110:target_recovery_init()) Skipped 3 previous similar messages Lustre: fsperf-OST000a: Now serving fsperf-OST000a on /dev/dm-26 with recovery enabled Lustre: Skipped 3 previous similar messages Lustre: 27419:0:(ldlm_lib.c:2110:target_recovery_init()) RECOVERY: service fsperf-OST0001, 1 recoverable clients, last_transno 1340929 Lustre: 27419:0:(ldlm_lib.c:2110:target_recovery_init()) Skipped 6 previous similar messages Lustre: fsperf-OST0001: Now serving fsperf-OST0001 on /dev/dm-16 with recovery enabled Lustre: Skipped 6 previous similar messages LustreError: 137-5: UUID 'fsperf-OST000f_UUID' is not available for connect (no target) Lustre: fsperf-OST0001: Will be in recovery for at least 5:00, or until 1 client reconnects Lustre: fsperf-OST000b: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted. Lustre: fsperf-OST000b: received MDS connection from 32.0.0.39@o2ib1 Lustre: Skipped 14 previous similar messages Lustre: fsperf-OST0006: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted. Lustre: fsperf-OST000e: received MDS connection from 32.0.0.39@o2ib1 Lustre: Skipped 13 previous similar messages Lustre: Echo OBD driver; http://www.lustre.org/ mlx4_core 0000:04:00.0: vpd r/w failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update. mlx4_core 0000:82:00.0: vpd r/w failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update. process `cat' is using deprecated sysctl (syscall) net.ipv6.neigh.default.retrans_time; Use net.ipv6.neigh.default.retrans_time_ms instead. LustreError: 29617:0:(lproc_ptlrpc.c:431:ptlrpc_lprocfs_svc_req_history_seek()) ASSERTION( !list_empty(&svcpt->scp_hist_reqs) ) failed: LustreError: 29617:0:(lproc_ptlrpc.c:431:ptlrpc_lprocfs_svc_req_history_seek()) LBUG Pid: 29617, comm: cat Call Trace: [<ffffffffa0680905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0680f17>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0c30104>] ptlrpc_lprocfs_svc_req_history_seek+0xf4/0x100 [ptlrpc] [<ffffffffa0c301e1>] ptlrpc_lprocfs_svc_req_history_next+0x71/0x1b0 [ptlrpc] [<ffffffff81185e9a>] seq_read+0x24a/0x3f0 [<ffffffff811c84ee>] proc_reg_read+0x7e/0xc0 [<ffffffff81163a15>] vfs_read+0xb5/0x1a0 [<ffffffff810c0e1a>] ? audit_syscall_entry+0x26a/0x290 [<ffffffff81163b51>] sys_read+0x51/0x90 [<ffffffff810030f2>] system_call_fastpath+0x16/0x1b Kernel panic - not syncing: LBUG Pid: 29617, comm: cat Not tainted 2.6.32-220.23.1.bl6.Bull.28.8.x86_64 #1 Call Trace: [<ffffffff81484640>] ? panic+0x78/0x143 [<ffffffffa0680f6b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [<ffffffffa0c30104>] ? ptlrpc_lprocfs_svc_req_history_seek+0xf4/0x100 [ptlrpc] [<ffffffffa0c301e1>] ? ptlrpc_lprocfs_svc_req_history_next+0x71/0x1b0 [ptlrpc] [<ffffffff81185e9a>] ? seq_read+0x24a/0x3f0 [<ffffffff811c84ee>] ? proc_reg_read+0x7e/0xc0 [<ffffffff81163a15>] ? vfs_read+0xb5/0x1a0 [<ffffffff810c0e1a>] ? audit_syscall_entry+0x26a/0x290 [<ffffffff81163b51>] ? sys_read+0x51/0x90 [<ffffffff810030f2>] ? system_call_fastpath+0x16/0x1b crash> files PID: 29617 TASK: ffff8806e65437d0 CPU: 9 COMMAND: "cat" ROOT: / CWD: /root FD FILE DENTRY INODE TYPE PATH 0 ffff88045cba6180 ffff880217b5b800 ffff88046a2e89c8 FIFO 1 ffff88045cba6600 ffff880217b5bbc0 ffff8802ace6e148 FIFO 2 ffff88045cba6cc0 ffff880217b5b380 ffff88023fd6a048 FIFO 3 ffff880872bc5240 ffff880519514480 ffff880874370d38 REG /proc/fs/lustre/ost/OSS/ost_create/req_history
I can provide additional information from the dump if needed.