[LU-5678] kernel crash due to NULL pointer dereference in kiblnd_pool_alloc_node() Created: 29/Sep/14  Updated: 14/Jun/18  Resolved: 09/Jul/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.8.0, Lustre 2.5.4
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Major
Reporter: Li Xi (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: p4b, patch

Issue Links:
Related
is related to LU-7099 Crash in kiblnd_pool_alloc_node Resolved
Severity: 3
Rank (Obsolete): 15902

 Description   

After OSTs were umounted, when running lustre_rmmod, two OSSs crashed with same reason.

<4>Lustre: Failing over lustre-OST0019
<4>Lustre: Skipped 2 previous similar messages
<4>Lustre: server umount lustre-OST0019 complete
<4>Lustre: Skipped 2 previous similar messages
<4>Lustre: 28870:0:(client.c:1907:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1404434289/real 1404434289] req@ffff881ff8a42c00 x1472650587225264/t0(0) o251->MGC10.0.10.151@o2ib@10.0.10.151@o2ib:26/25 lens 224/224 e 0 to 1 dl 1404434295 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
<4>Lustre: server umount lustre-OST0014 complete
<4>Lustre: Skipped 4 previous similar messages
<3>LNetError: 32984:0:(lib-move.c:1937:lnet_parse()) 10.0.8.230@o2ib, src 10.0.8.230@o2ib: Dropping PUT (error -108 looking up sender)
<1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
<1>IP: [<ffffffffa09e2409>] kiblnd_pool_alloc_node+0x49/0x2a0 [ko2iblnd]
<4>PGD 0
<4>Oops: 0000 1 SMP
<4>last sysfs file: /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/host7/target7:0:0/7:0:0:66/state
<4>CPU 0
<4>Modules linked in: jbd2 ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic crc32c_intel libcfs(U) dm_round_robin autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_srp(U) scsi_transport_srp(U) 8021q garp stp llc acpi_cpufreq freq_table mperf rdma_ucm(U) ib_ucm(U) rdma_cm(U) iw_cm(U) ib_ipoib(U) ib_cm(U) ib_uverbs(U) ib_umad(U) mlx5_ib(U) mlx5_core(U) mlx4_en(U) mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) ipv6 mlx4_core(U) compat(U) dm_multipath vhost_net macvtap macvlan tun kvm knem(U) uinput microcode iTCO_wdt iTCO_vendor_support dcdbas power_meter ses enclosure sg shpchp tg3 ptp pps_core lpc_ich mfd_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif ahci wmi megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: fld]
<4>
<4>Pid: 32983, comm: kiblnd_sd_00_02 Not tainted 2.6.32-431.17.1.el6_lustre.2.5.18.ddn2.x86_64 #1 Dell Inc. PowerEdge R620/01W23F
<4>RIP: 0010:[<ffffffffa09e2409>] [<ffffffffa09e2409>] kiblnd_pool_alloc_node+0x49/0x2a0 [ko2iblnd]
<4>RSP: 0018:ffff88100bfa5a30 EFLAGS: 00010207
<4>RAX: 0000000000000000 RBX: ffff88202619ad80 RCX: 000000000000003f
<4>RDX: 0000000000000010 RSI: 0000000000000002 RDI: ffff88202619ad80
<4>RBP: ffff88100bfa5a70 R08: 4730000000000000 R09: 3980000000000000
<4>R10: 0000000000000000 R11: 0000000000000000 R12: ffff88202619adb0
<4>R13: ffff88100bfa5a38 R14: ffff88202619ad90 R15: 0000000000000012
<4>FS: 0000000000000000(0000) GS:ffff880061c00000(0000) knlGS:0000000000000000
<4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 0000000000000010 CR3: 0000001025add000 CR4: 00000000001407f0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process kiblnd_sd_00_02 (pid: 32983, threadinfo ffff88100bfa4000, task ffff88100a87cae0)
<4>Stack:
<4> 00000000000007ed 0000000000000000 ffff88100bfa5a70 ffff88100b7874f0
<4><d> ffff880ff19b2ea8 ffff880ff19b2eb8 ffff88100d33c680 0000000000000012
<4><d> ffff88100bfa5a90 ffffffffa09ed149 ffff88100b382000 ffff880ff19b2e00
<4>Call Trace:
<4> [<ffffffffa09ed149>] kiblnd_get_idle_tx+0x29/0x2c0 [ko2iblnd]
<4> [<ffffffffa09f0765>] kiblnd_check_sends+0x425/0x610 [ko2iblnd]
<4> [<ffffffffa09f2d7e>] kiblnd_post_rx+0x15e/0x3b0 [ko2iblnd]
<4> [<ffffffffa09f30e6>] kiblnd_recv+0x116/0x560 [ko2iblnd]
<4> [<ffffffffa057ae9b>] lnet_ni_recv+0xbb/0x320 [lnet]
<4> [<ffffffffa057bcc1>] lnet_drop_message+0x81/0xa0 [lnet]
<4> [<ffffffffa05805a2>] lnet_parse+0x1b2/0x18c0 [lnet]
<4> [<ffffffff8106a39b>] ? enqueue_task_fair+0xfb/0x100
<4> [<ffffffff81061b6e>] ? try_to_wake_up+0x24e/0x3e0
<4> [<ffffffffa03a3bf4>] ? mlx5_ib_poll_cq+0x1b4/0xbf0 [mlx5_ib]
<4> [<ffffffffa09f37fb>] kiblnd_handle_rx+0x2cb/0x640 [ko2iblnd]
<4> [<ffffffff81061d12>] ? default_wake_function+0x12/0x20
<4> [<ffffffffa09f44e3>] kiblnd_rx_complete+0x2d3/0x420 [ko2iblnd]
<4> [<ffffffffa09f4692>] kiblnd_complete+0x62/0xe0 [ko2iblnd]
<4> [<ffffffffa09f4a4a>] kiblnd_scheduler+0x33a/0x7b0 [ko2iblnd]
<4> [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
<4> [<ffffffffa09f4710>] ? kiblnd_scheduler+0x0/0x7b0 [ko2iblnd]
<4> [<ffffffff8109ab56>] kthread+0x96/0xa0
<4> [<ffffffff8100c20a>] child_rip+0xa/0x20
<4> [<ffffffff8109aac0>] ? kthread+0x0/0xa0
<4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
<4>Code: 44 00 00 4c 8d 67 30 4c 8d 77 10 4c 8d 6d c8 48 89 fb 48 89 df e8 48 86 b4 e0 48 8b 43 30 4c 39 e0 48 89 45 c8 74 77 48 8d 50 10 <48> 39 50 10 74 61 83 40 30 01 48 8b 15 a6 b4 22 e1 48 8b 45 c8
<1>RIP [<ffffffffa09e2409>] kiblnd_pool_alloc_node+0x49/0x2a0 [ko2iblnd]
<4> RSP <ffff88100bfa5a30>
<pre>
<4>CR2: 0000000000000010
</pre>

Following is some output of crash command:
crash /usr/lib/debug/lib/modules/2.6.32-431.17.1.el6_lustre.2.5.18.ddn2.x86_64/vmlinux /var/crash/127.0.0.1-2014-07-04-09\:38\:48/vmcore
crash> mod -s ko2iblnd
crash> disas kiblnd_pool_alloc_node
Dump of assembler code for function kiblnd_pool_alloc_node:
0xffffffffa09db3c0 <+0>: push %rbp
0xffffffffa09db3c1 <+1>: mov %rsp,%rbp
0xffffffffa09db3c4 <+4>: sub $0x40,%rsp
0xffffffffa09db3c8 <+8>: mov %rbx,-0x28(%rbp)
0xffffffffa09db3cc <+12>: mov %r12,-0x20(%rbp)
0xffffffffa09db3d0 <+16>: mov %r13,-0x18(%rbp)
0xffffffffa09db3d4 <+20>: mov %r14,-0x10(%rbp)
0xffffffffa09db3d8 <+24>: mov %r15,-0x8(%rbp)
0xffffffffa09db3dc <+28>: nopl 0x0(%rax,%rax,1)
0xffffffffa09db3e1 <+33>: lea 0x30(%rdi),%r12
0xffffffffa09db3e5 <+37>: lea 0x10(%rdi),%r14
0xffffffffa09db3e9 <+41>: lea -0x38(%rbp),%r13
0xffffffffa09db3ed <+45>: mov %rdi,%rbx
0xffffffffa09db3f0 <+48>: mov %rbx,%rdi
0xffffffffa09db3f3 <+51>: callq 0xffffffff8152aa40 <_spin_lock>
0xffffffffa09db3f8 <+56>: mov 0x30(%rbx),%rax
0xffffffffa09db3fc <+60>: cmp %r12,%rax
0xffffffffa09db3ff <+63>: mov %rax,-0x38(%rbp)
0xffffffffa09db403 <+67>: je 0xffffffffa09db47c <kiblnd_pool_alloc_node+188>
0xffffffffa09db405 <+69>: lea 0x10(%rax),%rdx
0xffffffffa09db409 <+73>: cmp %rdx,0x10(%rax)
– Crash here
0xffffffffa09db40d <+77>: je 0xffffffffa09db470 <kiblnd_pool_alloc_node+176>
0xffffffffa09db40f <+79>: addl $0x1,0x30(%rax)
0xffffffffa09db413 <+83>: mov -0x1edcdb5a(%rip),%rdx # 0xffffffff81c0d8c0
0xffffffffa09db41a <+90>: mov -0x38(%rbp),%rax
0xffffffffa09db41e <+94>: add $0x493e0,%rdx
0xffffffffa09db425 <+101>: mov %rdx,0x28(%rax)
0xffffffffa09db429 <+105>: mov -0x38(%rbp),%rax
0xffffffffa09db42d <+109>: mov 0x10(%rax),%r12
0xffffffffa09db431 <+113>: mov %r12,%rdi
0xffffffffa09db434 <+116>: callq 0xffffffff81294630 <list_del>
0xffffffffa09db439 <+121>: mov 0x78(%rbx),%rax
0xffffffffa09db43d <+125>: test %rax,%rax
0xffffffffa09db440 <+128>: je 0xffffffffa09db44b <kiblnd_pool_alloc_node+139>
0xffffffffa09db442 <+130>: mov %r12,%rsi
0xffffffffa09db445 <+133>: mov -0x38(%rbp),%rdi
0xffffffffa09db449 <+137>: callq *%rax
0xffffffffa09db44b <+139>: incw (%rbx)
0xffffffffa09db44e <+142>: data32 xchg %ax,%ax
0xffffffffa09db451 <+145>: mov %r12,%rax
0xffffffffa09db454 <+148>: mov -0x28(%rbp),%rbx
0xffffffffa09db458 <+152>: mov -0x20(%rbp),%r12
0xffffffffa09db45c <+156>: mov -0x18(%rbp),%r13
0xffffffffa09db460 <+160>: mov -0x10(%rbp),%r14
0xffffffffa09db464 <+164>: mov -0x8(%rbp),%r15
0xffffffffa09db468 <+168>: leaveq
0xffffffffa09db469 <+169>: retq
0xffffffffa09db46a <+170>: nopw 0x0(%rax,%rax,1)
0xffffffffa09db470 <+176>: mov (%rax),%rax
0xffffffffa09db473 <+179>: cmp %r12,%rax
0xffffffffa09db476 <+182>: mov %rax,-0x38(%rbp)
0xffffffffa09db47a <+186>: jne 0xffffffffa09db405 <kiblnd_pool_alloc_node+69>
0xffffffffa09db47c <+188>: mov 0x58(%rbx),%r15d
0xffffffffa09db480 <+192>: test %r15d,%r15d
0xffffffffa09db483 <+195>: jne 0xffffffffa09db558 <kiblnd_pool_alloc_node+408>
0xffffffffa09db489 <+201>: mov -0x1edcdbd0(%rip),%rax # 0xffffffff81c0d8c0
0xffffffffa09db490 <+208>: cmp 0x50(%rbx),%rax
0xffffffffa09db494 <+212>: js 0xffffffffa09db645 <kiblnd_pool_alloc_node+645>
0xffffffffa09db49a <+218>: movl $0x1,0x58(%rbx)
0xffffffffa09db4a1 <+225>: incw (%rbx)
0xffffffffa09db4a4 <+228>: data32 xchg %ax,%ax
0xffffffffa09db4a7 <+231>: testb $0x2,-0x4deee9(%rip) # 0xffffffffa04fc5c5
0xffffffffa09db4ae <+238>: je 0xffffffffa09db510 <kiblnd_pool_alloc_node+336>
0xffffffffa09db4b0 <+240>: testb $0x8,-0x4deef6(%rip) # 0xffffffffa04fc5c1
0xffffffffa09db4b7 <+247>: je 0xffffffffa09db510 <kiblnd_pool_alloc_node+336>
0xffffffffa09db4b9 <+249>: mov %r14,%rdx
0xffffffffa09db4bc <+252>: mov $0xffffffffa09f42c8,%rsi
0xffffffffa09db4c3 <+259>: mov $0xffffffffa0a01760,%rdi
0xffffffffa09db4ca <+266>: xor %eax,%eax
0xffffffffa09db4cc <+268>: movl $0x800,0x2629a(%rip) # 0xffffffffa0a01770 <msgdata.69576+16>
0xffffffffa09db4d6 <+278>: movq $0xffffffffa09f3528,0x2627f(%rip) # 0xffffffffa0a01760 <msgdata.69576>
0xffffffffa09db4e1 <+289>: movq $0xffffffffa09f2660,0x2627c(%rip) # 0xffffffffa0a01768 <msgdata.69576+8>
0xffffffffa09db4ec <+300>: movl $0x763,0x2627e(%rip) # 0xffffffffa0a01774 <msgdata.69576+20>
0xffffffffa09db4f6 <+310>: movq $0x0,0x2627f(%rip) # 0xffffffffa0a01780 <msgdata.69576+32>
0xffffffffa09db501 <+321>: movl $0x200,0x2626d(%rip) # 0xffffffffa0a01778 <msgdata.69576+24>
0xffffffffa09db50b <+331>: callq 0xffffffffa04db8d0
0xffffffffa09db510 <+336>: mov 0x5c(%rbx),%esi
0xffffffffa09db513 <+339>: mov %r13,%rdx
0xffffffffa09db516 <+342>: mov %rbx,%rdi
0xffffffffa09db519 <+345>: callq *0x68(%rbx)
0xffffffffa09db51c <+348>: mov %rbx,%rdi
0xffffffffa09db51f <+351>: mov %eax,%r15d
0xffffffffa09db522 <+354>: callq 0xffffffff8152aa40 <_spin_lock>
0xffffffffa09db527 <+359>: test %r15d,%r15d
0xffffffffa09db52a <+362>: movl $0x0,0x58(%rbx)
0xffffffffa09db531 <+369>: jne 0xffffffffa09db5d8 <kiblnd_pool_alloc_node+536>
0xffffffffa09db537 <+375>: mov 0x38(%rbx),%rsi
0xffffffffa09db53b <+379>: mov -0x38(%rbp),%rdi
0xffffffffa09db53f <+383>: mov %r12,%rdx
0xffffffffa09db542 <+386>: callq 0xffffffff812946d0 <__list_add>
0xffffffffa09db547 <+391>: incw (%rbx)
0xffffffffa09db54a <+394>: data32 xchg %ax,%ax
0xffffffffa09db54d <+397>: jmpq 0xffffffffa09db3f0 <kiblnd_pool_alloc_node+48>
0xffffffffa09db552 <+402>: nopw 0x0(%rax,%rax,1)
0xffffffffa09db558 <+408>: incw (%rbx)
0xffffffffa09db55b <+411>: data32 xchg %ax,%ax
0xffffffffa09db55e <+414>: testb $0x2,-0x4defa0(%rip) # 0xffffffffa04fc5c5
0xffffffffa09db565 <+421>: je 0xffffffffa09db5c7 <kiblnd_pool_alloc_node+519>
0xffffffffa09db567 <+423>: testb $0x8,-0x4defad(%rip) # 0xffffffffa04fc5c1
0xffffffffa09db56e <+430>: je 0xffffffffa09db5c7 <kiblnd_pool_alloc_node+519>
0xffffffffa09db570 <+432>: mov %r14,%rdx
0xffffffffa09db573 <+435>: mov $0xffffffffa09f4280,%rsi
0xffffffffa09db57a <+442>: mov $0xffffffffa0a017a0,%rdi
0xffffffffa09db581 <+449>: xor %eax,%eax
0xffffffffa09db583 <+451>: movl $0x800,0x26223(%rip) # 0xffffffffa0a017b0 <msgdata.69574+16>
0xffffffffa09db58d <+461>: movq $0xffffffffa09f3528,0x26208(%rip) # 0xffffffffa0a017a0 <msgdata.69574>
0xffffffffa09db598 <+472>: movq $0xffffffffa09f2660,0x26205(%rip) # 0xffffffffa0a017a8 <msgdata.69574+8>
0xffffffffa09db5a3 <+483>: movl $0x755,0x26207(%rip) # 0xffffffffa0a017b4 <msgdata.69574+20>
0xffffffffa09db5ad <+493>: movq $0x0,0x26208(%rip) # 0xffffffffa0a017c0 <msgdata.69574+32>
0xffffffffa09db5b8 <+504>: movl $0x200,0x261f6(%rip) # 0xffffffffa0a017b8 <msgdata.69574+24>
0xffffffffa09db5c2 <+514>: callq 0xffffffffa04db8d0
0xffffffffa09db5c7 <+519>: callq 0xffffffff81527c70 <schedule>
0xffffffffa09db5cc <+524>: jmpq 0xffffffffa09db3f0 <kiblnd_pool_alloc_node+48>
0xffffffffa09db5d1 <+529>: nopl 0x0(%rax)
0xffffffffa09db5d8 <+536>: mov -0x1edcdd1f(%rip),%rax # 0xffffffff81c0d8c0
0xffffffffa09db5df <+543>: mov %r14,%rdx
0xffffffffa09db5e2 <+546>: mov $0xffffffffa09f4248,%rsi
0xffffffffa09db5e9 <+553>: mov $0xffffffffa0a01720,%rdi
0xffffffffa09db5f0 <+560>: add $0x3e8,%rax
0xffffffffa09db5f6 <+566>: mov %rax,0x50(%rbx)
0xffffffffa09db5fa <+570>: xor %eax,%eax
0xffffffffa09db5fc <+572>: movl $0x800,0x2612a(%rip) # 0xffffffffa0a01730 <msgdata.69578+16>
0xffffffffa09db606 <+582>: movq $0xffffffffa09f3528,0x2610f(%rip) # 0xffffffffa0a01720 <msgdata.69578>
0xffffffffa09db611 <+593>: movq $0xffffffffa09f2660,0x2610c(%rip) # 0xffffffffa0a01728 <msgdata.69578+8>
0xffffffffa09db61c <+604>: movl $0x76e,0x2610e(%rip) # 0xffffffffa0a01734 <msgdata.69578+20>
0xffffffffa09db626 <+614>: movq $0xffffffffa0a01750,0x2610f(%rip) # 0xffffffffa0a01740 <msgdata.69578+32>
0xffffffffa09db631 <+625>: movl $0x20000,0x260fd(%rip) # 0xffffffffa0a01738 <msgdata.69578+24>
0xffffffffa09db63b <+635>: callq 0xffffffffa04db8d0
0xffffffffa09db640 <+640>: jmpq 0xffffffffa09db547 <kiblnd_pool_alloc_node+391>
0xffffffffa09db645 <+645>: incw (%rbx)
0xffffffffa09db648 <+648>: data32 xchg %ax,%ax
0xffffffffa09db64b <+651>: xor %r12d,%r12d
0xffffffffa09db64e <+654>: jmpq 0xffffffffa09db451 <kiblnd_pool_alloc_node+145>
End of assembler dump.

It seems ps->ps_pool_list was broken since it had a NULL entry.



 Comments   
Comment by Li Xi (Inactive) [ 29/Sep/14 ]

There might be better solutions, but following patch helps us to aviod this crash.

http://review.whamcloud.com/12104

Comment by Peter Jones [ 29/Sep/14 ]

Amir

Could you please review this patch?

Thanks

Peter

Comment by Liang Zhen (Inactive) [ 29/Sep/14 ]

I think probably we should not send NOOP anymore in o2iblnd when LNet is shutting down.

Comment by Liang Zhen (Inactive) [ 01/Oct/14 ]

thanks lixi, I have reviewed this patch, it's on the right direction, but can be simplified to two lines.

Comment by Liang Zhen (Inactive) [ 10/Nov/14 ]

Lixi, I updated your patch due to problem found by Isaac, could you check it?

Comment by Liang Zhen (Inactive) [ 25/Nov/14 ]

I think I found the real issue here, in kiblnd_post_rx():

        kib_conn_t         *conn = rx->rx_conn;
        ......
        rc = ib_post_recv(conn->ibc_cmid->qp, &rx->rx_wrq, &bad_wrq);
        if (rc != 0) {
                CERROR("Can't post rx for %s: %d, bad_wrq: %p\n",
                       libcfs_nid2str(conn->ibc_peer->ibp_nid), rc, bad_wrq);
                rx->rx_nob = 0;
        }

At this point, because we have posted RX, so we don't own it anymore, and we don't own rx_conn as well because another thread may poll this RX again, then drop it and release reference on connection, which means all below code lines can refer to an already destroyed connection.

        spin_lock(&conn->ibc_lock);
        if (credit == IBLND_POSTRX_PEER_CREDIT)
                conn->ibc_outstanding_credits++;
        else
                conn->ibc_reserved_credits++;
        spin_unlock(&conn->ibc_lock);

        kiblnd_check_sends(conn);

I will post a new patch to address this issue.

Comment by Gerrit Updater [ 25/Nov/14 ]

Liang Zhen (liang.zhen@intel.com) uploaded a new patch: http://review.whamcloud.com/12852
Subject: LU-5678 o2iblnd: connection refcount fix
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8d865f36efeb2ad3eeb1fa5094683661fe463f35

Comment by Isaac Huang (Inactive) [ 02/Dec/14 ]

BTW, looks like we had a similar issue a very long time ago:
https://projectlava.xyratex.com/show_bug.cgi?id=21911

And there seemed to be some work undone there, which was why I kept it open. I'll look into that.

Comment by Liang Zhen (Inactive) [ 02/Dec/14 ]

Isaac, could you take a look at this patch when you have time? http://review.whamcloud.com/#/c/12718/
it should be able to improve lock usage for this piece of code, although I'd like to have above patch patch in our releases first.

Comment by Isaac Huang (Inactive) [ 03/Dec/14 ]

Comments posted.

Comment by Gerrit Updater [ 10/Jan/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12852/
Subject: LU-5678 o2iblnd: connection refcount fix for kiblnd_post_rx
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 01eb2aefde36892d7a9576b6c6dd18da8529933f

Comment by Jodi Levi (Inactive) [ 15/Jan/15 ]

Patch landed to Master. Patches for other versions tracked externally.

Comment by Sebastien Piechurski [ 02/Apr/15 ]

We still hit this bug after applying patch http://review.whamcloud.com/12852/ on the 2.5.3 lustre version.
What is the status of patch http://review.whamcloud.com/#/c/12718/ ?

Comment by Liang Zhen (Inactive) [ 07/Apr/15 ]

Sebastien, I will recheck this problem

Comment by Peter Jones [ 23/Apr/15 ]

Amir

Could you please look into this issue?

Thanks

Peter

Comment by Amir Shehata (Inactive) [ 30/Apr/15 ]

Sebastien, I want to clarify, have you tried to apply: http://review.whamcloud.com/#/c/12718/ and did it fix your issue?

Comment by Sebastien Buisson (Inactive) [ 30/Apr/15 ]

Hi Amir,

We get the following error when trying to access http://review.whamcloud.com/#/c/12718/ : "The page you requested was not found, or you do not have permission to view this page."
So we were not able to access this patch in the first place.

Cheers,
Sebastien.

Comment by Peter Jones [ 30/Apr/15 ]

Sebastien

Do you have a reliable reproducer for this issue?

Peter

Comment by Amir Shehata (Inactive) [ 30/Apr/15 ]

What would be helpful is if you can attach the vmcore and vmlinux of the crash, so I can investigate further.

Comment by Sebastien Piechurski [ 26/May/15 ]

Hi Amir,

I have uploaded a tarball with a dump and the matching vmlinux and module files on the FTP site under /uploads/LU-5678/LU-5678-2015-04-01.tar.xz

Regards,

Sebastien.

Comment by Amir Shehata (Inactive) [ 10/Jun/15 ]

Hi Sebastien,

Can you please let me know the version of Lustre you're running, and list all the patches that have been applied to it.

Also regarding the core, is it possible to upload the System.map file.

Have you ever tried out this patch: http://review.whamcloud.com/12104

I also want to confirm that you see this crash on shutdown? Is it consistently reproducible?

thanks

Comment by Sebastien Piechurski [ 10/Jun/15 ]

Hi Amir,

The version of Lustre is based on a 2.5.3 plus some patches. I don't have a complete mapping of these patches to the review.whamcloud.com site, but I uploaded those patches on the ftp site under uploads/LU-5678/lustre-bullpatches.tar.gz. Most of them have references to Jira number and/or review pages. Others are specific to our distribution.
I also uploaded the System.map file in the same ftp directory.

The crash is seen when unmounting a target, either on OSS or MDS, but this is not consistently reproducible.
We can see this frequently when unmounting a large number of targets on a large number of servers which have run for quite some time.

Comment by Amir Shehata (Inactive) [ 11/Jun/15 ]

Thanks for the extra info. I'm currently investigating and will update the bug as soon as I make more progress.

Comment by Amir Shehata (Inactive) [ 12/Jun/15 ]
crash> bt
PID: 9622   TASK: ffff881066c50080  CPU: 1   COMMAND: "kiblnd_sd_00_02"
 #0 [ffff880f23ee3630] machine_kexec at ffffffff8103b71b
 #1 [ffff880f23ee3690] crash_kexec at ffffffff810c9852
 #2 [ffff880f23ee3760] oops_end at ffffffff8152ec30
 #3 [ffff880f23ee3790] no_context at ffffffff8104c80b
 #4 [ffff880f23ee37e0] __bad_area_nosemaphore at ffffffff8104ca95
 #5 [ffff880f23ee3830] bad_area_nosemaphore at ffffffff8104cb63
 #6 [ffff880f23ee3840] __do_page_fault at ffffffff8104d2bf
 #7 [ffff880f23ee3960] do_page_fault at ffffffff81530b7e
 #8 [ffff880f23ee3990] page_fault at ffffffff8152df35
    [exception RIP: kiblnd_pool_alloc_node+73]
    RIP: ffffffffa0b77439  RSP: ffff880f23ee3a40  RFLAGS: 00010207
    RAX: 0000000000000000  RBX: ffff880fec59ce40  RCX: 000000000000003f
    RDX: 0000000000000010  RSI: 0000000000000002  RDI: ffff880fec59ce40
    RBP: ffff880f23ee3a80   R8: 72f8000000000000   R9: 97c0000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff880fec59ce70
    R13: ffff880f23ee3a48  R14: ffff880fec59ce50  R15: 0000000000000012
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff880f23ee3a88] kiblnd_get_idle_tx at ffffffffa0b81fa9 [ko2iblnd]
#10 [ffff880f23ee3aa8] kiblnd_check_sends at ffffffffa0b857b5 [ko2iblnd]
#11 [ffff880f23ee3b08] kiblnd_post_rx at ffffffffa0b87dd8 [ko2iblnd]
#12 [ffff880f23ee3b58] kiblnd_recv at ffffffffa0b882c6 [ko2iblnd]
#13 [ffff880f23ee3be8] lnet_ni_recv at ffffffffa05f9ecb [lnet]
#14 [ffff880f23ee3c38] lnet_drop_message at ffffffffa05facf1 [lnet]
#15 [ffff880f23ee3c78] lnet_parse at ffffffffa05ff672 [lnet]
#16 [ffff880f23ee3d58] kiblnd_handle_rx at ffffffffa0b889db [ko2iblnd]
#17 [ffff880f23ee3da8] kiblnd_rx_complete at ffffffffa0b896c3 [ko2iblnd]
#18 [ffff880f23ee3df8] kiblnd_complete at ffffffffa0b89872 [ko2iblnd]
#19 [ffff880f23ee3e08] kiblnd_scheduler at ffffffffa0b89c2a [ko2iblnd]
#20 [ffff880f23ee3ee8] kthread at ffffffff8109e66e
#21 [ffff880f23ee3f48] kernel_thread at ffffffff8100c20a
crash> 

(gdb) l *kiblnd_pool_alloc_node+73
0x3469 is in kiblnd_pool_alloc_node (/home/ashehata/LU-5678/lnet/klnds/o2iblnd/o2iblnd.c:1855).
1850            int                    rc;
1851
1852     again:
1853            spin_lock(&ps->ps_lock);
1854            cfs_list_for_each_entry(pool, &ps->ps_pool_list, po_list) {
1855                    if (cfs_list_empty(&pool->po_free_list))
1856                            continue;
1857
1858                    pool->po_allocated ++;
1859                    pool->po_deadline = cfs_time_shift(IBLND_POOL_DEADLINE);
(gdb) 

We're looking at the possibility that this might not be the same race condition as the one addressed by the previous patch.

Comment by Amir Shehata (Inactive) [ 15/Jun/15 ]

Is it possible to enable net and malloc debugging:

lctl set_param debug=+net
lctl set_param debug=+malloc

And try to reproduce the issue? I'm hoping to get some insight into the system state before the crash occurs.

Comment by Peter Jones [ 09/Jul/15 ]

Given that the fix that has landed to master seems to have met the needs of the original reporter I suggest that we recluse this ticket and that Bull open a new ticket to track any similar issue that is still outstanding.

Generated at Sat Feb 10 01:53:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.