Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.1.2
-
None
-
OSS and MDS
2.6.32-220.17.1.el6_lustre.x86_64
lustre-source-2.1.2-2.6.32_220.17.1.el6_lustre.x86_64.x86_64
lustre-2.1.2-2.6.32_220.17.1.el6_lustre.x86_64.x86_64
kernel-2.6.32-220.17.1.el6_lustre.x86_64
kernel-ib-1.5.3-2.6.32_220.17.1.el6_lustre.x86_64.x86_64
kernel-firmware-2.6.32-220.17.1.el6_lustre.x86_64
kernel-ib-devel-1.5.3-2.6.32_220.17.1.el6_lustre.x86_64.x86_64
lustre-tests-2.1.2-2.6.32_220.17.1.el6_lustre.x86_64.x86_64
kernel-headers-2.6.32-220.17.1.el6_lustre.x86_64
lustre-ldiskfs-3.3.0-2.6.32_220.17.1.el6_lustre.x86_64.x86_64
kernel-mft-2.7.1-2.6.32_220.17.1.el6_lustre.x86_64.x86_64
kernel-devel-2.6.32-220.17.1.el6_lustre.x86_64
lustre-modules-2.1.2-2.6.32_220.17.1.el6_lustre.x86_64.x86_64
Client side
2.6.32-220.23.1.el6.x86_64
ustre-client-2.1.2-2.6.32_220.23.1.el6.x86_64.x86_64
lustre-client-modules-2.1.2-2.6.32_220.23.1.el6.x86_64.x86_64
OSS and MDS 2.6.32-220.17.1.el6_lustre.x86_64 lustre-source-2.1.2-2.6.32_220.17.1.el6_lustre.x86_64.x86_64 lustre-2.1.2-2.6.32_220.17.1.el6_lustre.x86_64.x86_64 kernel-2.6.32-220.17.1.el6_lustre.x86_64 kernel-ib-1.5.3-2.6.32_220.17.1.el6_lustre.x86_64.x86_64 kernel-firmware-2.6.32-220.17.1.el6_lustre.x86_64 kernel-ib-devel-1.5.3-2.6.32_220.17.1.el6_lustre.x86_64.x86_64 lustre-tests-2.1.2-2.6.32_220.17.1.el6_lustre.x86_64.x86_64 kernel-headers-2.6.32-220.17.1.el6_lustre.x86_64 lustre-ldiskfs-3.3.0-2.6.32_220.17.1.el6_lustre.x86_64.x86_64 kernel-mft-2.7.1-2.6.32_220.17.1.el6_lustre.x86_64.x86_64 kernel-devel-2.6.32-220.17.1.el6_lustre.x86_64 lustre-modules-2.1.2-2.6.32_220.17.1.el6_lustre.x86_64.x86_64 Client side 2.6.32-220.23.1.el6.x86_64 ustre-client-2.1.2-2.6.32_220.23.1.el6.x86_64.x86_64 lustre-client-modules-2.1.2-2.6.32_220.23.1.el6.x86_64.x86_64
-
3
-
4211
Description
Production client crashed when running user job with following LBUG, log dump is quite big so I have attached it in a file.
LustreError: 3260:0:(events.c:419:ptlrpc_master_callback()) ASSERTION(callback == request_out_callback || callback == reply_in_callback || callback == client_bulk_callback || callback == request_in_callback || callback == reply_out_call
back || callback == server_bulk_callback) failed
LustreError: 3260:0:(events.c:419:ptlrpc_master_callback()) LBUG
Aug 10 16:21:47 Pid: 3260, comm: kiblnd_sd_07
sand-1-12 kernel
: LustreError: 3Call Trace:
260:0:(events.c: [<ffffffffa044c855>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
419:ptlrpc_maste [<ffffffffa044ce95>] lbug_with_loc+0x75/0xe0 [libcfs]
r_callback()) ASSERTION(callback [<ffffffffa0457d86>] libcfs_assertion_failed+0x66/0x70 [libcfs]
== request_out_ [<ffffffffa06473c6>] ptlrpc_master_callback+0xb6/0xc0 [ptlrpc]
callback || call [<ffffffffa04c0a8c>] lnet_enq_event_locked+0x6c/0xc0 [lnet]
back == reply_in [<ffffffffa04c0b7c>] lnet_finalize+0x9c/0x280 [lnet]
callback || callback == client [<ffffffffa07523ca>] kiblnd_recv+0x10a/0x580 [ko2iblnd]
bulk_callback || [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20
callback == request_in_callback [<ffffffffa04c4188>] lnet_ni_recv+0xd8/0x350 [lnet]
callback == [<ffffffffa04c44e6>] lnet_recv_put+0xe6/0x120 [lnet] reply_out_callba [<ffffffffa04cae1f>] lnet_parse+0x135f/0x1a80 [lnet] ck |
callback = [<ffffffffa0752afb>] kiblnd_handle_rx+0x2bb/0x5f0 [ko2iblnd] = server_bulk_callback) failed A [<ffffffff8104da6d>] ? check_preempt_curr+0x6d/0x90 ug 10 16:21:47 s [<ffffffff8105e89c>] ? try_to_wake_up+0x24c/0x3e0 and-1-12 kernel: [<ffffffffa0753723>] kiblnd_rx_complete+0x2a3/0x3e0 [ko2iblnd] LustreError: 32 [<ffffffff8105ea42>] ? default_wake_function+0x12/0x20 A [<ffffffff8104da6d>] ? check_preempt_curr+0x6d/0x90 ug 10 16:21:47 s [<ffffffff8105e89c>] ? try_to_wake_up+0x24c/0x3e0 and-1-12 kernel: [<ffffffffa0753723>] kiblnd_rx_complete+0x2a3/0x3e0 [ko2iblnd] LustreError: 32 [<ffffffff8105ea42>] ? default_wake_function+0x12/0x20 60:0:(events.c:4 [<ffffffff8104cab9>] ? __wake_up_common+0x59/0x90 19:ptlrpc_master [<ffffffffa07538c2>] kiblnd_complete+0x62/0xe0 [ko2iblnd] _callback()) LBU [<ffffffffa0753c3d>] kiblnd_scheduler+0x2fd/0x770 [ko2iblnd] G Aug 10 16:21:4 [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20 7 sand-1-12 kern [<ffffffffa0753940>] ? kiblnd_scheduler+0x0/0x770 [ko2iblnd] el: Pid: 3260, c [<ffffffff8100c14a>] child_rip+0xa/0x20 omm: kiblnd_sd_0 [<ffffffffa0753940>] ? kiblnd_scheduler+0x0/0x770 [ko2iblnd] 7 Aug 10 16:21:4 [<ffffffffa0753940>] ? kiblnd_scheduler+0x0/0x770 [ko2iblnd] 7 sand-1-12 kern [<ffffffff8100c140>] ? child_rip+0x0/0x20 el: Aug 10 16:2 1:47 sand-1-12 kLustreError: dumping log to /tmp/lustre-log.1344612108.3260 ernel: Call Trace: Aug 10 16:21:47 sand-1-12 kernel: [<ffffffffa044c855>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Aug 10 16:21:47 sand-1-12 kernel: [<ffffffffa044ce95>] lbug_with_loc+0x75/0xe0 [libcfs] Aug 10 16:21:47 sand-1-12 kernel: [<ffffffffa0457d86>] libcfs_assertion_failed+0x66/0x70 [libcfs] Aug 10 16:21:47 sand-1-12 kernel: [<ffffffffa06473c6>] ptlrpc_master_callback+0xb6/0xc0 [ptlrpc] Aug 10 16:21:47 sand-1-12 kernel: [<ffffffffa04c0a8c>] lnet_enq_event_locked+0x6c/0xc0 [lnet] Aug 10 16:21:47 sand-1-12 kernel: [<ffffffffa04c0b7c>] lnet_finalize+0x9c/0x280 [lnet] Aug 10 16:21:47 sand-1-12 kernel: [<ffffffffa07523ca>] kiblnd_recv+0x10a/0x580 [ko2iblnd] Aug 10 16:21:47 sand-1-12 kernel: [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20 Aug 10 16:21:47 sand-1-12 kernel: [<ffffffffa04c4188>] lnet_ni_recv+0xd8/0x350 [lnet] Aug 10 16:21:47 sand-1-12 kernel: [<ffffffffa04c44e6>] lnet_recv_put+0xe6/0x120 [lnet] Aug 10 16:21:47 sand-1-12 kernel: [<ffffffffa0753723>] kiblnd_rx_complete+0x2a3/0x3e0 [ko2iblnd] Aug 10 16:21:48 sand-1-12 kernel: [<ffffffff8105ea42>] ? default_wake_function+0x12/0x20 Aug 10 16:21:48 sand-1-12 kernel: [<ffffffff8104cab9>] ? __wake_up_common+0x59/0x90 Aug 10 16:21:48 sand-1-12 kernel: [<ffffffffa07538c2>] kiblnd_complete+0x62/0xe0 [ko2iblnd] Aug 10 16:21:48 sand-1-12 kernel: [<ffffffffa0753c3d>] kiblnd_scheduler+0x2fd/0x770 [ko2iblnd] Aug 10 16:21:48 sand-1-12 kernel: [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20 Aug 10 16:21:48 sand-1-12 kernel: [<ffffffffa0753940>] ? kiblnd_scheduler+0x0/0x770 [ko2iblnd] Aug 10 16:21:48 sand-1-12 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20 Aug 10 16:21:48 sand-1-12 kernel: [<ffffffffa0753940>] ? kiblnd_scheduler+0x0/0x770 [ko2iblnd] Aug 10 16:21:48 sand-1-12 kernel: [<ffffffffa0753940>] ? kiblnd_scheduler+0x0/0x770 [ko2iblnd] Aug 10 16:21:48 sand-1-12 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20 Aug 10 16:21:48 sand-1-12 kernel: Aug 10 16:21:48 sand-1-12 kernel: LustreError: dumping log to /tmp/lustre-log.1344612108.3260 BUG: soft lockup - CPU#0 stuck for 67s! [kiblnd_sd_04:3257] Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfsd exportfs acpi_cpufreq BUG: soft lockup - CPU#1 stuck for 67s! [kiblnd_sd_08:3261] 801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ioatdma ipv6 sd_mod crc_t10dif ahci igb dca nfs lockd fscache nfs_acl auth_rpcgss sunrpc [last unloaded: scsi_wait_scan] CPU 1 801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ioatdma ipv6 sd_mod crc_t10dif ahci igb dca nfs lockd fscache nfs_acl auth_rpcgss sunrpc [last unloaded: scsi_wait_scan] |
---|
Pid: 3261, comm: kiblnd_sd_08 Tainted: G W ---------------- 2.6.32-220.23.1.el6.x86_64 #1 Dell Inc. PowerEdge C6220/0WTH3T
RIP: 0010:[<ffffffff814efb41>] [<ffffffff814efb41>] _spin_lock+0x21/0x30
RSP: 0018:ffff88086a909be0 EFLAGS: 00000293
RAX: 000000000000081f RBX: ffff88086a909be0 RCX: ffff8807bb5f14e0
RDX: 000000000000081d RSI: 0000000000000050 RDI: ffffffffa04df340
RBP: ffffffff8100bc0e R08: 0000000000000246 R09: 0000000000000012
R10: 0000000000000000 R11: 0000000000000400 R12: 0000000000000000
R13: ffff8806dcf599d0 R14: ffff88086a909cac R15: ffff88086a909ca8
FS: 00007febefe5d700(0000) GS:ffff880044620000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fba30b90008 CR3: 0000000952c82000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kiblnd_sd_08 (pid: 3261, threadinfo ffff88086a908000, task ffff88086a85aa80)
Stack:
ffff88086a909ce0 ffffffffa04c9ee6 ffff88086a909d00 0000000300000002
<0> ffff88086a909dd8 0000000000000400 0000000000000001 0000000000000000
<0> 0000000000000002 ffff880044675fe8 ffff880044676018 ffff880044675fe8
Attachments
Issue Links
- duplicates
-
LU-3010 client crashes on RHEL6 with Lustre 1.8.8
-
- Resolved
-