Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.5.3
-
MDS/OSS:
- RHEL 6.6 w/ Bull kernel 2.6.32-504.16.2.el6.Bull.74.x86_64
- Lustre 2.5.3.90
- OFED 3.12
Routers/Clients:
- RHEL 7.1
- Lustre 2.7.0
- MLNX_OFED 2.3
-
3
-
9223372036854775807
Description
The LustreError: (mdt_xattr.c:131:mdt_getxattr_one()) getxattr failed: -2 triggers the following kernel BUG on the MDS of one of our filesystem.
2015-06-30 13:40:01 2015-06-30 13:45:01 Lustre: DEBUG MARKER: Tue Jun 30 13:45:01 2015 2015-06-30 13:45:01 2015-06-30 13:50:01 Lustre: DEBUG MARKER: Tue Jun 30 13:50:01 2015 2015-06-30 13:50:01 2015-06-30 13:50:32 LustreError: 20168:0:(mdt_xattr.c:131:mdt_getxattr_one()) getxattr failed: -2 2015-06-30 13:50:32 BUG: unable to handle kernel NULL pointer dereference at (null) 2015-06-30 13:50:32 IP: [<ffffffff8129c452>] sg_next+0x2/0x30 2015-06-30 13:50:32 PGD 0 2015-06-30 13:50:32 Oops: 0000 [#1] SMP 2015-06-30 13:50:32 last sysfs file: /sys/devices/pci0000:80/0000:80:05.0/0000:85:00.0/host8/rport-8:0-0/target8:0:0/8:0:0:0/state 2015-06-30 13:50:32 CPU 23 2015-06-30 13:50:32 Modules linked in: osp(U) mdd(U) lfsck(U) lod(U) mdt(U) mgs(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) ldiskfs(U) lustre(U) lov(U) osc(U) mdc(U) lquota(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic crc32c_intel libcfs(U) nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bonding 8021q garp stp llc rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ipv6 ib_uverbs(U) ib_umad(U) mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U) mlx4_core(U) dm_round_robin scsi_dh_rdac dm_multipath mic(U) uinput ipmi_devintf ipmi_si ipmi_msghandler sg lpc_ich mfd_core ioatdma lpfc scsi_transport_fc scsi_tgt igb dca i2c_algo_bit i2c_core ptp pps_core compat(U) ext4 jbd2 mbcache sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod megaraid_sas [last unloaded: scsi_wait_scan] 2015-06-30 13:50:32 2015-06-30 13:50:32 Pid: 20168, comm: mdt03_049 Not tainted 2.6.32-504.16.2.el6.Bull.74.x86_64 #1 BULL bullx super-node 2015-06-30 13:50:32 RIP: 0010:[<ffffffff8129c452>] [<ffffffff8129c452>] sg_next+0x2/0x30 2015-06-30 13:50:32 RSP: 0018:ffff880847f518c8 EFLAGS: 00010246 2015-06-30 13:50:32 RAX: 0000000000000000 RBX: ffff88046e524000 RCX: 0000000000000000 2015-06-30 13:50:32 RDX: 0000000000000101 RSI: ffffc900175fb240 RDI: 0000000000000000 2015-06-30 13:50:32 RBP: ffff880847f51940 R08: ffffea003958cb98 R09: 0000000000000301 2015-06-30 13:50:32 R10: 0000000000001000 R11: 0000000000000000 R12: ffff880c78a28940 2015-06-30 13:50:32 R13: ffff88046e536000 R14: ffffc900175fb240 R15: ffff880c7980c090 2015-06-30 13:50:32 FS: 0000000000000000(0000) GS:ffff880c8e540000(0000) knlGS:0000000000000000 2015-06-30 13:50:32 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b 2015-06-30 13:50:32 CR2: 0000000000000000 CR3: 0000000001a85000 CR4: 00000000000007e0 2015-06-30 13:50:32 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2015-06-30 13:50:32 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 2015-06-30 13:50:32 Process mdt03_049 (pid: 20168, threadinfo ffff880847f50000, task ffff88084585aab0) 2015-06-30 13:50:32 Stack: 2015-06-30 13:50:32 ffffffffa09cb46e ffff880847f51fd8 ffff88084585aab0 ffffffff00000101 2015-06-30 13:50:32 <d> ffffffff81a9ac80 ffff880c78a289c0 0000030100000001 ffff880847f51920 2015-06-30 13:50:32 <d> ffffffff8105e100 ffff880876a8d558 ffff880c796ffb80 ffffc900175fb240 2015-06-30 13:50:32 Call Trace: 2015-06-30 13:50:32 [<ffffffffa09cb46e>] ? kiblnd_map_tx+0x19e/0x540 [ko2iblnd] 2015-06-30 13:50:32 [<ffffffff8105e100>] ? __dequeue_entity+0x30/0x50 2015-06-30 13:50:32 [<ffffffffa09cbe0a>] kiblnd_setup_rd_iov+0x13a/0x2b0 [ko2iblnd] 2015-06-30 13:50:32 [<ffffffffa09d151a>] kiblnd_send+0x5da/0x9b0 [ko2iblnd] 2015-06-30 13:50:32 [<ffffffffa05e6d6b>] lnet_ni_send+0x4b/0xf0 [lnet] 2015-06-30 13:50:32 [<ffffffffa05eafa5>] lnet_send+0x655/0xb80 [lnet] 2015-06-30 13:50:32 [<ffffffffa05ec00a>] LNetPut+0x31a/0x860 [lnet] 2015-06-30 13:50:32 [<ffffffffa07fbc40>] ptl_send_buf+0x1e0/0x550 [ptlrpc] 2015-06-30 13:50:32 [<ffffffffa081b8b8>] ? at_measured+0x108/0x380 [ptlrpc] 2015-06-30 13:50:32 [<ffffffffa083c445>] ? null_authorize+0x75/0x100 [ptlrpc] 2015-06-30 13:50:32 [<ffffffffa07fc22b>] ptlrpc_send_reply+0x27b/0x7f0 [ptlrpc] 2015-06-30 13:50:32 [<ffffffffa07c7054>] target_send_reply_msg+0x54/0x190 [ptlrpc] 2015-06-30 13:50:32 [<ffffffffa07c7576>] target_send_reply+0x3e6/0x720 [ptlrpc] 2015-06-30 13:50:32 [<ffffffffa0ea2df9>] mdt_handle_common+0x5d9/0x1470 [mdt] 2015-06-30 13:50:32 [<ffffffffa0edf645>] mds_regular_handle+0x15/0x20 [mdt] 2015-06-30 13:50:32 [<ffffffffa0811ee5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc] 2015-06-30 13:50:32 [<ffffffffa053e4ce>] ? cfs_timer_arm+0xe/0x10 [libcfs] 2015-06-30 13:50:32 [<ffffffffa054f7d5>] ? lc_watchdog_touch+0x65/0x170 [libcfs] 2015-06-30 13:50:32 [<ffffffffa080a919>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc] 2015-06-30 13:50:32 [<ffffffff81057819>] ? __wake_up_common+0x59/0x90 2015-06-30 13:50:32 [<ffffffffa081466d>] ptlrpc_main+0xaed/0x1770 [ptlrpc] 2015-06-30 13:50:32 [<ffffffffa0813b80>] ? ptlrpc_main+0x0/0x1770 [ptlrpc] 2015-06-30 13:50:32 [<ffffffff8109e71e>] kthread+0x9e/0xc0 2015-06-30 13:50:32 [<ffffffff8100c20a>] child_rip+0xa/0x20 2015-06-30 13:50:32 [<ffffffff8109e680>] ? kthread+0x0/0xc0 2015-06-30 13:50:32 [<ffffffff8100c200>] ? child_rip+0x0/0x20 2015-06-30 13:50:32 Code: 5c 41 5d 41 5e 41 5f c9 c3 55 48 c7 c2 30 cc 29 81 be 80 00 00 00 48 89 e5 e8 6b ff ff ff c9 c3 66 0f 1f 84 00 00 00 00 00 31 c0 <f6> 07 02 55 48 89 e5 75 0d 48 8b 57 20 48 8d 47 20 f6 c2 01 75 2015-06-30 13:50:32 RIP [<ffffffff8129c452>] sg_next+0x2/0x30 2015-06-30 13:50:32 RSP <ffff880847f518c8> 2015-06-30 13:50:32 CR2: 0000000000000000
We hit this issue on our new filesystem only. This is a dedicated filesystem for our Lustre 2.7 clients/routers. We already had 4 occurrences since the beginning of this week.
This LustreError is also reported in the console of the MDS of a second filesystem, but there is no kernel BUG. The main difference is that the clients/routers of this second FS are running the same software stack than the servers (RHEL6.6/Lustre 2.5.3.90/OFED3.12).
Here are some traces from the crash (bt/bt -f):
KERNEL: /usr/lib/debug/lib/modules/2.6.32-504.16.2.el6.Bull.74.x86_64/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 32 DATE: Tue Jun 30 13:50:31 2015 UPTIME: 19:57:11 LOAD AVERAGE: 0.10, 0.06, 0.10 TASKS: 1685 NODENAME: mds2 RELEASE: 2.6.32-504.16.2.el6.Bull.74.x86_64 VERSION: #1 SMP Tue Apr 28 01:43:42 CEST 2015 MACHINE: x86_64 (2266 Mhz) MEMORY: 64 GB PANIC: "Oops: 0000 [#1] SMP " (check log for details) PID: 20168 COMMAND: "mdt03_049" TASK: ffff88084585aab0 [THREAD_INFO: ffff880847f50000] CPU: 23 STATE: TASK_RUNNING (PANIC) crash> bt PID: 20168 TASK: ffff88084585aab0 CPU: 23 COMMAND: "mdt03_049" #0 [ffff880847f514b0] machine_kexec at ffffffff8103b71b #1 [ffff880847f51510] crash_kexec at ffffffff810c9942 #2 [ffff880847f515e0] oops_end at ffffffff8152f070 #3 [ffff880847f51610] no_context at ffffffff8104c80b #4 [ffff880847f51660] __bad_area_nosemaphore at ffffffff8104ca95 #5 [ffff880847f516b0] bad_area_nosemaphore at ffffffff8104cb63 #6 [ffff880847f516c0] __do_page_fault at ffffffff8104d25c #7 [ffff880847f517e0] do_page_fault at ffffffff81530fbe #8 [ffff880847f51810] page_fault at ffffffff8152e375 [exception RIP: sg_next+2] RIP: ffffffff8129c452 RSP: ffff880847f518c8 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88046e524000 RCX: 0000000000000000 RDX: 0000000000000101 RSI: ffffc900175fb240 RDI: 0000000000000000 RBP: ffff880847f51940 R8: ffffea003958cb98 R9: 0000000000000301 R10: 0000000000001000 R11: 0000000000000000 R12: ffff880c78a28940 R13: ffff88046e536000 R14: ffffc900175fb240 R15: ffff880c7980c090 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff880847f518c8] kiblnd_map_tx at ffffffffa09cb46e [ko2iblnd] #10 [ffff880847f51948] kiblnd_setup_rd_iov at ffffffffa09cbe0a [ko2iblnd] #11 [ffff880847f519a8] kiblnd_send at ffffffffa09d151a [ko2iblnd] #12 [ffff880847f51a48] lnet_ni_send at ffffffffa05e6d6b [lnet] #13 [ffff880847f51a68] lnet_send at ffffffffa05eafa5 [lnet] #14 [ffff880847f51ad8] LNetPut at ffffffffa05ec00a [lnet] #15 [ffff880847f51b38] ptl_send_buf at ffffffffa07fbc40 [ptlrpc] #16 [ffff880847f51be8] ptlrpc_send_reply at ffffffffa07fc22b [ptlrpc] #17 [ffff880847f51c68] target_send_reply_msg at ffffffffa07c7054 [ptlrpc] #18 [ffff880847f51c98] target_send_reply at ffffffffa07c7576 [ptlrpc] #19 [ffff880847f51d08] mdt_handle_common at ffffffffa0ea2df9 [mdt] #20 [ffff880847f51d58] mds_regular_handle at ffffffffa0edf645 [mdt] #21 [ffff880847f51d68] ptlrpc_server_handle_request at ffffffffa0811ee5 [ptlrpc] #22 [ffff880847f51e48] ptlrpc_main at ffffffffa081466d [ptlrpc] #23 [ffff880847f51ee8] kthread at ffffffff8109e71e #24 [ffff880847f51f48] kernel_thread at ffffffff8100c20a crash> dis -rl ffffffffa09cb46e 0xffffffffa09cb2d0 <kiblnd_map_tx>: push %rbp 0xffffffffa09cb2d1 <kiblnd_map_tx+1>: mov %rsp,%rbp 0xffffffffa09cb2d4 <kiblnd_map_tx+4>: push %r15 0xffffffffa09cb2d6 <kiblnd_map_tx+6>: push %r14 0xffffffffa09cb2d8 <kiblnd_map_tx+8>: push %r13 0xffffffffa09cb2da <kiblnd_map_tx+10>: push %r12 0xffffffffa09cb2dc <kiblnd_map_tx+12>: push %rbx 0xffffffffa09cb2dd <kiblnd_map_tx+13>: sub $0x48,%rsp 0xffffffffa09cb2e1 <kiblnd_map_tx+17>: nopl 0x0(%rax,%rax,1) 0xffffffffa09cb2e6 <kiblnd_map_tx+22>: mov %ecx,-0x44(%rbp) 0xffffffffa09cb2e9 <kiblnd_map_tx+25>: mov 0x10(%rsi),%rax 0xffffffffa09cb2ed <kiblnd_map_tx+29>: mov %rdx,%rbx 0xffffffffa09cb2f0 <kiblnd_map_tx+32>: mov 0x50(%rdi),%rdi 0xffffffffa09cb2f4 <kiblnd_map_tx+36>: mov %rsi,%r14 0xffffffffa09cb2f7 <kiblnd_map_tx+39>: mov 0x40(%rax),%r12 0xffffffffa09cb2fb <kiblnd_map_tx+43>: mov %rdi,-0x50(%rbp) 0xffffffffa09cb2ff <kiblnd_map_tx+47>: cmp %rdx,0x80(%rsi) 0xffffffffa09cb306 <kiblnd_map_tx+54>: setne %al 0xffffffffa09cb309 <kiblnd_map_tx+57>: movzbl %al,%edx 0xffffffffa09cb30c <kiblnd_map_tx+60>: add $0x1,%edx 0xffffffffa09cb30f <kiblnd_map_tx+63>: mov %edx,-0x48(%rbp) 0xffffffffa09cb312 <kiblnd_map_tx+66>: mov %edx,0xb0(%r14) 0xffffffffa09cb319 <kiblnd_map_tx+73>: mov %ecx,0x88(%rsi) 0xffffffffa09cb31f <kiblnd_map_tx+79>: mov 0x8(%r12),%rdi 0xffffffffa09cb324 <kiblnd_map_tx+84>: mov 0x90(%rsi),%r13 0xffffffffa09cb32b <kiblnd_map_tx+91>: mov 0x2b8(%rdi),%rax 0xffffffffa09cb332 <kiblnd_map_tx+98>: test %rax,%rax 0xffffffffa09cb335 <kiblnd_map_tx+101>: je 0xffffffffa09cb430 <kiblnd_map_tx+352> 0xffffffffa09cb33b <kiblnd_map_tx+107>: mov %edx,%ecx 0xffffffffa09cb33d <kiblnd_map_tx+109>: mov %r13,%rsi 0xffffffffa09cb340 <kiblnd_map_tx+112>: mov -0x44(%rbp),%edx 0xffffffffa09cb343 <kiblnd_map_tx+115>: callq *0x28(%rax) 0xffffffffa09cb346 <kiblnd_map_tx+118>: xor %edx,%edx 0xffffffffa09cb348 <kiblnd_map_tx+120>: xor %r15d,%r15d 0xffffffffa09cb34b <kiblnd_map_tx+123>: test %eax,%eax 0xffffffffa09cb34d <kiblnd_map_tx+125>: mov %eax,0x4(%rbx) 0xffffffffa09cb350 <kiblnd_map_tx+128>: jne 0xffffffffa09cb3b9 <kiblnd_map_tx+233> 0xffffffffa09cb352 <kiblnd_map_tx+130>: jmpq 0xffffffffa09cb3f0 <kiblnd_map_tx+288> 0xffffffffa09cb357 <kiblnd_map_tx+135>: nopw 0x0(%rax,%rax,1) 0xffffffffa09cb360 <kiblnd_map_tx+144>: mov %edx,-0x60(%rbp) 0xffffffffa09cb363 <kiblnd_map_tx+147>: mov %rcx,-0x68(%rbp) 0xffffffffa09cb367 <kiblnd_map_tx+151>: callq *0x40(%rax) 0xffffffffa09cb36a <kiblnd_map_tx+154>: mov -0x60(%rbp),%edx 0xffffffffa09cb36d <kiblnd_map_tx+157>: mov -0x68(%rbp),%rcx 0xffffffffa09cb371 <kiblnd_map_tx+161>: lea 0x0(%r13,%r13,2),%rsi 0xffffffffa09cb376 <kiblnd_map_tx+166>: mov %eax,0x8(%rbx,%rsi,4) 0xffffffffa09cb37a <kiblnd_map_tx+170>: mov 0x8(%r12),%rdi 0xffffffffa09cb37f <kiblnd_map_tx+175>: mov %rcx,%rsi 0xffffffffa09cb382 <kiblnd_map_tx+178>: add 0x90(%r14),%rsi 0xffffffffa09cb389 <kiblnd_map_tx+185>: mov 0x2b8(%rdi),%rax 0xffffffffa09cb390 <kiblnd_map_tx+192>: test %rax,%rax 0xffffffffa09cb393 <kiblnd_map_tx+195>: je 0xffffffffa09cb3e8 <kiblnd_map_tx+280> 0xffffffffa09cb395 <kiblnd_map_tx+197>: mov %edx,-0x60(%rbp) 0xffffffffa09cb398 <kiblnd_map_tx+200>: callq *0x38(%rax) 0xffffffffa09cb39b <kiblnd_map_tx+203>: mov -0x60(%rbp),%edx 0xffffffffa09cb39e <kiblnd_map_tx+206>: lea 0x0(%r13,%r13,2),%rcx 0xffffffffa09cb3a3 <kiblnd_map_tx+211>: add $0x1,%edx 0xffffffffa09cb3a6 <kiblnd_map_tx+214>: shl $0x2,%rcx 0xffffffffa09cb3aa <kiblnd_map_tx+218>: mov %rax,0xc(%rcx,%rbx,1) 0xffffffffa09cb3af <kiblnd_map_tx+223>: add 0x8(%rbx,%rcx,1),%r15d 0xffffffffa09cb3b4 <kiblnd_map_tx+228>: cmp %edx,0x4(%rbx) 0xffffffffa09cb3b7 <kiblnd_map_tx+231>: jbe 0xffffffffa09cb3f0 <kiblnd_map_tx+288> 0xffffffffa09cb3b9 <kiblnd_map_tx+233>: mov 0x8(%r12),%rdi 0xffffffffa09cb3be <kiblnd_map_tx+238>: movslq %edx,%r13 0xffffffffa09cb3c1 <kiblnd_map_tx+241>: mov %r13,%rcx 0xffffffffa09cb3c4 <kiblnd_map_tx+244>: shl $0x5,%rcx 0xffffffffa09cb3c8 <kiblnd_map_tx+248>: mov 0x2b8(%rdi),%rax 0xffffffffa09cb3cf <kiblnd_map_tx+255>: mov %rcx,%rsi 0xffffffffa09cb3d2 <kiblnd_map_tx+258>: add 0x90(%r14),%rsi 0xffffffffa09cb3d9 <kiblnd_map_tx+265>: test %rax,%rax 0xffffffffa09cb3dc <kiblnd_map_tx+268>: jne 0xffffffffa09cb360 <kiblnd_map_tx+144> 0xffffffffa09cb3de <kiblnd_map_tx+270>: mov 0x18(%rsi),%eax 0xffffffffa09cb3e1 <kiblnd_map_tx+273>: jmp 0xffffffffa09cb371 <kiblnd_map_tx+161> 0xffffffffa09cb3e3 <kiblnd_map_tx+275>: nopl 0x0(%rax,%rax,1) 0xffffffffa09cb3e8 <kiblnd_map_tx+280>: mov 0x10(%rsi),%rax 0xffffffffa09cb3ec <kiblnd_map_tx+284>: jmp 0xffffffffa09cb39e <kiblnd_map_tx+206> 0xffffffffa09cb3ee <kiblnd_map_tx+286>: xchg %ax,%ax 0xffffffffa09cb3f0 <kiblnd_map_tx+288>: mov %rbx,%rsi 0xffffffffa09cb3f3 <kiblnd_map_tx+291>: mov %r12,%rdi 0xffffffffa09cb3f6 <kiblnd_map_tx+294>: callq 0xffffffffa09be470 <kiblnd_find_rd_dma_mr> 0xffffffffa09cb3fb <kiblnd_map_tx+299>: test %rax,%rax 0xffffffffa09cb3fe <kiblnd_map_tx+302>: je 0xffffffffa09cb492 <kiblnd_map_tx+450> 0xffffffffa09cb404 <kiblnd_map_tx+308>: cmp %rbx,0x80(%r14) 0xffffffffa09cb40b <kiblnd_map_tx+315>: je 0xffffffffa09cb58e <kiblnd_map_tx+702> 0xffffffffa09cb411 <kiblnd_map_tx+321>: mov 0x1c(%rax),%eax 0xffffffffa09cb414 <kiblnd_map_tx+324>: mov %eax,(%rbx) 0xffffffffa09cb416 <kiblnd_map_tx+326>: xor %r8d,%r8d 0xffffffffa09cb419 <kiblnd_map_tx+329>: add $0x48,%rsp 0xffffffffa09cb41d <kiblnd_map_tx+333>: mov %r8d,%eax 0xffffffffa09cb420 <kiblnd_map_tx+336>: pop %rbx 0xffffffffa09cb421 <kiblnd_map_tx+337>: pop %r12 0xffffffffa09cb423 <kiblnd_map_tx+339>: pop %r13 0xffffffffa09cb425 <kiblnd_map_tx+341>: pop %r14 0xffffffffa09cb427 <kiblnd_map_tx+343>: pop %r15 0xffffffffa09cb429 <kiblnd_map_tx+345>: leaveq 0xffffffffa09cb42a <kiblnd_map_tx+346>: retq 0xffffffffa09cb42b <kiblnd_map_tx+347>: nopl 0x0(%rax,%rax,1) 0xffffffffa09cb430 <kiblnd_map_tx+352>: mov (%rdi),%r15 0xffffffffa09cb433 <kiblnd_map_tx+355>: test %r15,%r15 0xffffffffa09cb436 <kiblnd_map_tx+358>: je 0xffffffffa09cb596 <kiblnd_map_tx+710> 0xffffffffa09cb43c <kiblnd_map_tx+364>: mov 0x1c0(%r15),%rcx 0xffffffffa09cb443 <kiblnd_map_tx+371>: test %rcx,%rcx 0xffffffffa09cb446 <kiblnd_map_tx+374>: mov %rcx,-0x58(%rbp) 0xffffffffa09cb44a <kiblnd_map_tx+378>: je 0xffffffffa09cb596 <kiblnd_map_tx+710> 0xffffffffa09cb450 <kiblnd_map_tx+384>: mov -0x44(%rbp),%r9d 0xffffffffa09cb454 <kiblnd_map_tx+388>: test %r9d,%r9d 0xffffffffa09cb457 <kiblnd_map_tx+391>: jle 0xffffffffa09cb476 <kiblnd_map_tx+422> 0xffffffffa09cb459 <kiblnd_map_tx+393>: mov %r13,%rax 0xffffffffa09cb45c <kiblnd_map_tx+396>: xor %edx,%edx 0xffffffffa09cb45e <kiblnd_map_tx+398>: xchg %ax,%ax 0xffffffffa09cb460 <kiblnd_map_tx+400>: add $0x1,%edx 0xffffffffa09cb463 <kiblnd_map_tx+403>: mov %rax,%rdi 0xffffffffa09cb466 <kiblnd_map_tx+406>: mov %edx,-0x60(%rbp) 0xffffffffa09cb469 <kiblnd_map_tx+409>: callq 0xffffffff8129c450 <sg_next> 0xffffffffa09cb46e <kiblnd_map_tx+414>: mov -0x60(%rbp),%edx crash> bt -f PID: 20168 TASK: ffff88084585aab0 CPU: 23 COMMAND: "mdt03_049" #0 [ffff880847f514b0] machine_kexec at ffffffff8103b71b ffff880847f514b8: 00000000030a1000 ffff8800030a1000 ffff880847f514c8: 00000000030a0000 ffff880847f51818 ffff880847f514d8: 8800000000000000 000000000000ffff ffff880847f514e8: ffff880847f51818 ffff880847f51518 ffff880847f514f8: 0000000000000009 ffff88084585aab0 ffff880847f51508: ffff880847f515d8 ffffffff810c9942 #1 [ffff880847f51510] crash_kexec at ffffffff810c9942 ffff880847f51518: ffff880c7980c090 ffffc900175fb240 ffff880847f51528: ffff88046e536000 ffff880c78a28940 ffff880847f51538: ffff880847f51940 ffff88046e524000 ffff880847f51548: 0000000000000000 0000000000001000 ffff880847f51558: 0000000000000301 ffffea003958cb98 ffff880847f51568: 0000000000000000 0000000000000000 ffff880847f51578: 0000000000000101 ffffc900175fb240 ffff880847f51588: 0000000000000000 ffffffffffffffff ffff880847f51598: ffffffff8129c452 0000000000000010 ffff880847f515a8: 0000000000010246 ffff880847f518c8 ffff880847f515b8: 0000000000000018 ffff880847f51618 ffff880847f515c8: 0000000000000246 ffff880847f51818 ffff880847f515d8: ffff880847f51608 ffffffff8152f070 #2 [ffff880847f515e0] oops_end at ffffffff8152f070 ffff880847f515e8: 0000000000000000 ffff880847f51818 ffff880847f515f8: 0000000000000000 0000000000000009 ffff880847f51608: ffff880847f51658 ffffffff8104c80b #3 [ffff880847f51610] no_context at ffffffff8104c80b ffff880847f51618: ffffffff81531036 000000000000000a ffff880847f51628: ffff88047fe9be00 0000000000000000 ffff880847f51638: 0000000000000000 ffff880847f51818 ffff880847f51648: ffff88084585aab0 0000000000030001 ffff880847f51658: ffff880847f516a8 ffffffff8104ca95 #4 [ffff880847f51660] __bad_area_nosemaphore at ffffffff8104ca95 ffff880847f51668: ffffffff8134382e ffff88047fe9be00 ffff880847f51678: 0000000000000000 0000000000000028 ffff880847f51688: 0000000000000000 0000000000000000 ffff880847f51698: ffffc900175fb240 ffff88084585aab0 ffff880847f516a8: ffff880847f516b8 ffffffff8104cb63 #5 [ffff880847f516b0] bad_area_nosemaphore at ffffffff8104cb63 ffff880847f516b8: ffff880847f517d8 ffffffff8104d25c #6 [ffff880847f516c0] __do_page_fault at ffffffff8104d25c ffff880847f516c8: 000000000001da70 000000000000004e ffff880847f516d8: ffff880847f51818 0000000000000000 ffff880847f516e8: 0000000000000068 0000000000000000 ffff880847f516f8: ffffffff8100bb8e ffff880847f51810 ffff880847f51708: 0000000000000000 0000000000000001 ffff880847f51718: ffffffff816490a0 0000000000000000 ffff880847f51728: 0000000000010500 0000000000001ba9 ffff880847f51738: ffff880c8e540000 0000000000000046 ffff880847f51748: 0000000000000246 ffffffffffffff10 ffff880847f51758: ffffffff81075d81 0000000000000010 ffff880847f51768: 0000000000000246 ffff880847f51780 ffff880847f51778: 0000000000000018 ffffc90017a2d032 ffff880847f51788: 0000000000000246 ffffffff00000017 ffff880847f51798: ffffffff8115c6dd 0000000000000010 ffff880847f517a8: 0000000000000286 ffff880847f51818 ffff880847f517b8: 0000000000000000 0000000000000000 ffff880847f517c8: ffffc900175fb240 ffff880c7980c090 ffff880847f517d8: ffff880847f51808 ffffffff81530fbe #7 [ffff880847f517e0] do_page_fault at ffffffff81530fbe ffff880847f517e8: 0000000000000001 ffff880c78a28940 ffff880847f517f8: ffff88046e536000 ffffc900175fb240 ffff880847f51808: ffff880847f51940 ffffffff8152e375 #8 [ffff880847f51810] page_fault at ffffffff8152e375 [exception RIP: sg_next+2] RIP: ffffffff8129c452 RSP: ffff880847f518c8 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88046e524000 RCX: 0000000000000000 RDX: 0000000000000101 RSI: ffffc900175fb240 RDI: 0000000000000000 RBP: ffff880847f51940 R8: ffffea003958cb98 R9: 0000000000000301 R10: 0000000000001000 R11: 0000000000000000 R12: ffff880c78a28940 R13: ffff88046e536000 R14: ffffc900175fb240 R15: ffff880c7980c090 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 ffff880847f51818: ffff880c7980c090 ffffc900175fb240 ffff880847f51828: ffff88046e536000 ffff880c78a28940 ffff880847f51838: ffff880847f51940 ffff88046e524000 ffff880847f51848: 0000000000000000 0000000000001000 ffff880847f51858: 0000000000000301 ffffea003958cb98 ffff880847f51868: 0000000000000000 0000000000000000 ffff880847f51878: 0000000000000101 ffffc900175fb240 ffff880847f51888: 0000000000000000 ffffffffffffffff ffff880847f51898: ffffffff8129c452 0000000000000010 ffff880847f518a8: 0000000000010246 ffff880847f518c8 ffff880847f518b8: 0000000000000018 ffff880847f51940 ffff880847f518c8: ffffffffa09cb46e #9 [ffff880847f518c8] kiblnd_map_tx at ffffffffa09cb46e [ko2iblnd] ffff880847f518d0: ffff880847f51fd8 ffff88084585aab0 ffff880847f518e0: ffffffff00000101 ffffffff81a9ac80 ffff880847f518f0: ffff880c78a289c0 0000030100000001 ffff880847f51900: ffff880847f51920 ffffffff8105e100 ffff880847f51910: ffff880876a8d558 ffff880c796ffb80 <=== HERE rbx: ffff880c796ffb80 ffff880847f51920: ffffc900175fb240 ffff88046e524000 ffff880847f51930: 0000000000000000 ffff8810639ee640 ffff880847f51940: ffff880847f519a0 ffffffffa09cbe0a #10 [ffff880847f51948] kiblnd_setup_rd_iov at ffffffffa09cbe0a [ko2iblnd] ffff880847f51950: ffff880800000378 ffff880c002ffef8 ffff880847f51960: ffff88046e53c000 ffffc90048a88000 ffff880847f51970: 0005000221056e78 ffff88105fff6200 ffff880847f51980: 0000000000300270 ffffc900175fb240 ffff880847f51990: 0005000221056e78 0000000000000001 ffff880847f519a0: ffff880847f51a40 ffffffffa09d151a #11 [ffff880847f519a8] kiblnd_send at ffffffffa09d151a [ko2iblnd] ffff880847f519b0: 0000000000300270 ffff8810638aa000 ffff880847f519c0: ffffc90048788488 0000000000000000 ffff880847f519d0: ffff881000000001 000000006714f9c0 ffff880847f519e0: ffff8810639ee630 ffff880c796ffb80 ffff880847f519f0: 000000000000ec08 ffff88105fff6200 ffff880847f51a00: 0005000221056e78 0000000000003039 ffff880847f51a10: ffff880847f51a60 ffff880c796ffb80 ffff880847f51a20: ffff88105fff6200 ffff88105fff6200 <=== HERE r12: ffff88105fff6200 ffff880847f51a30: 0000000000000000 0000000000050003 ffff880847f51a40: ffff880847f51a60 ffffffffa05e6d6b #12 [ffff880847f51a48] lnet_ni_send at ffffffffa05e6d6b [lnet] ffff880847f51a50: ffff880c796ffb80 0000000000000000 ffff880847f51a60: ffff880847f51ad0 ffffffffa05eafa5 #13 [ffff880847f51a68] lnet_send at ffffffffa05eafa5 [lnet] ffff880847f51a70: ffff880847f51ac0 0000000000000000 ffff880847f51a80: 000500030a64b972 0005000221056e03 ffff880847f51a90: ffff880847f51ad0 ffff8804780fd180 ffff880847f51aa0: 000500030a64b972 ffff88105fff6200 ffff880847f51ab0: ffff8810639ee5c0 0000000000000003 ffff880847f51ac0: 00000000002c3f4b 0000000000000001 ffff880847f51ad0: ffff880847f51b30 ffffffffa05ec00a #14 [ffff880847f51ad8] LNetPut at ffffffffa05ec00a [lnet] ffff880847f51ae0: 0000000147f51b30 0005000221056e03 ffff880847f51af0: 000500030a64b972 0000000000003039 ffff880847f51b00: ffff880847f51bd0 ffff880459fb2240 ffff880847f51b10: 0000000000000000 ffffc90048788108 ffff880847f51b20: 0000000000000023 0000000000000001 ffff880847f51b30: ffff880847f51be0 ffffffffa07fbc40 #15 [ffff880847f51b38] ptl_send_buf at ffffffffa07fbc40 [ptlrpc] ffff880847f51b40: 00055926b072ae94 00000001000000c0 ffff880847f51b50: 0000000000000000 ffffc90048788000 ffff880847f51b60: 0000000000000023 ffffffffa081b8b8 ffff880847f51b70: 0030027047f51b88 ffffc90048788070 ffff880847f51b80: ffffc90048788108 0000000100300270 ffff880847f51b90: 0000000048788108 ffffc90048788000 ffff880847f51ba0: 0000000000000023 ffffffffa083c445 ffff880847f51bb0: ffff88105ac1ec00 ffff88105ac1ec00 ffff880847f51bc0: ffffc90048788000 ffff880459fb2240 ffff880847f51bd0: 0000000000000000 0000000000000001 ffff880847f51be0: ffff880847f51c60 ffffffffa07fc22b #16 [ffff880847f51be8] ptlrpc_send_reply at ffffffffa07fc22b [ptlrpc] ffff880847f51bf0: ffff88100000000a 00055926b072ae94 ffff880847f51c00: ffff8808000000c0 ffff8804573e5e40 ffff880847f51c10: ffff8804573e5e58 ffff8804573e5e70 ffff880847f51c20: ffff880847f51c70 ffff880850ce0a80 ffff880847f51c30: ffff880847f51c30 ffff88105ac1ec00 ffff880847f51c40: 0000000000000000 ffffffffa0f18580 ffff880847f51c50: ffff88106aa76000 000000000000030c ffff880847f51c60: ffff880847f51c90 ffffffffa07c7054 #17 [ffff880847f51c68] target_send_reply_msg at ffffffffa07c7054 [ptlrpc] ffff880847f51c70: ffff880847f51ce0 ffffc90048788000 ffff880847f51c80: ffff88105ac1ec00 ffffffffa0f18580 ffff880847f51c90: ffff880847f51d00 ffffffffa07c7576 #18 [ffff880847f51c98] target_send_reply at ffffffffa07c7576 [ptlrpc] ffff880847f51ca0: ffff8810653a0880 0000000000001001 ffff880847f51cb0: ffffc90048788108 0000000d0012ef0c ffff880847f51cc0: ffff880847f51ce0 00000000a0802b6c ffff880847f51cd0: ffffc9004027b9e8 ffff88105ac1ec00 ffff880847f51ce0: ffff881062860000 ffffffffa0f18580 ffff880847f51cf0: ffff88105ac1efa0 0000000000000000 ffff880847f51d00: ffff880847f51d50 ffffffffa0ea2df9 #19 [ffff880847f51d08] mdt_handle_common at ffffffffa0ea2df9 [mdt] ffff880847f51d10: ffff880847f51d30 ffffffff00000002 ffff880847f51d20: ffff880847f51d60 ffff88105ac1ec00 ffff880847f51d30: ffff880850ce0a80 ffff88106aa76000 ffff880847f51d40: ffff881067ecd940 ffff88105ac1ef40 ffff880847f51d50: ffff880847f51d60 ffffffffa0edf645 #20 [ffff880847f51d58] mds_regular_handle at ffffffffa0edf645 [mdt] ffff880847f51d60: ffff880847f51e40 ffffffffa0811ee5 #21 [ffff880847f51d68] ptlrpc_server_handle_request at ffffffffa0811ee5 [ptlrpc] ffff880847f51d70: ffff880847f51d80 ffffffffa053e4ce ffff880847f51d80: ffff880847f51da0 ffffffffa054f7d5 ffff880847f51d90: ffff881067ecd940 ffff88106aa76000 ffff880847f51da0: ffff880847f51e40 ffffffffa080a919 ffff880847f51db0: ffff880847f51e00 ffffffff81057819 ffff880847f51dc0: ffff88106af7bdc0 0000000300000000 ffff880847f51dd0: ffff88106af7bdc0 ffff88106aa76080 ffff880847f51de0: 0000000000000282 0000000000000014 ffff880847f51df0: 0000000000000001 0000000000000282 ffff880847f51e00: 0000000055928287 00000000000884c1 ffff880847f51e10: ffff880847f51e40 ffff881067ecd940 ffff880847f51e20: ffff88106aa76000 0000000000000040 ffff880847f51e30: 0000000000000005 ffff880850ce0a80 ffff880847f51e40: ffff880847f51ee0 ffffffffa081466d #22 [ffff880847f51e48] ptlrpc_main at ffffffffa081466d [ptlrpc] ffff880847f51e50: ffff880847f51e60 ffff88106aa76204 ffff880847f51e60: ffff88106aa76080 0000000047f51fd8 ffff880847f51e70: ffff880850ce0a80 ffff8810670b4800 ffff880847f51e80: 000000004585aab0 ffff88106aa76068 ffff880847f51e90: ffff88106aa76048 ffff881067ecd978 ffff880847f51ea0: ffff8810670b4800 ffff88106aa76030 ffff880847f51eb0: ffff880847f51ee0 ffff881062805780 ffff880847f51ec0: ffff880847f51ef8 ffffffffa0813b80 ffff880847f51ed0: ffff881067ecd940 ffff880479c40ab0 ffff880847f51ee0: ffff880847f51f40 ffffffff8109e71e #23 [ffff880847f51ee8] kthread at ffffffff8109e71e ffff880847f51ef0: ffffffff00000000 5a5a5a5a00000000 ffff880847f51f00: 5a5a5a5a00000000 ffff880847f51f08 ffff880847f51f10: ffff880847f51f08 0000000000000000 ffff880847f51f20: ffff881062805780 ffffffff81ebb4c8 ffff880847f51f30: ffff880479c40ab0 0000000000000000 ffff880847f51f40: ffff880479c47f40 ffffffff8100c20a #24 [ffff880847f51f48] kernel_thread at ffffffff8100c20a
Could you help me to troubleshoot this issue? Is there anything I should look into the crash dump?
Attachments
Issue Links
- duplicates
-
LU-6158 always shrink_capsule in mdt_getxattr_all
- Resolved