[LU-2967] list_del corruption - client crashes Created: 14/Mar/13  Updated: 19/Jun/13  Resolved: 19/Jun/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.5
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Cliff White (Inactive) Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: mq213
Environment:

Hyperion/LLNL - SWL testing


Issue Links:
Related
is related to LU-3461 Kernel update [RHEL6.4 2.6.32-358.11.... Resolved
is related to LU-2473 ldiskfs RHEL6.4 support Resolved
Severity: 3
Rank (Obsolete): 7232

 Description   

After multiple hours of SWL runs, multiple client crashes.
Example one

2013-03-14 06:13:47 ------------[ cut here ]------------
2013-03-14 06:13:47 WARNING: at lib/list_debug.c:51 list_del+0x8d/0xa0() (Tainted: G        W  ---------------   )
2013-03-14 06:13:47 Hardware name: XS23-TY
2013-03-14 06:13:47 list_del corruption. next->prev should be ffff8801aee8bc50, but was 0504000006000001
2013-03-14 06:13:47 Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa mlx4_ib ib_mad ib_core dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm dcdbas i2c_i801 i2c_core ahci iTCO_wdt iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc mlx4_en mlx4_core e1000e [last unloaded: cpufreq_ondemand]
2013-03-14 06:13:47 Pid: 3160, comm: ipoib Tainted: G        W  ---------------    2.6.32-279.22.1.el6.x86_64 #1
2013-03-14 06:13:47 Call Trace:
2013-03-14 06:13:47  [<ffffffff8106a2a7>] ? warn_slowpath_common+0x87/0xc0
2013-03-14 06:13:47  [<ffffffff8106a396>] ? warn_slowpath_fmt+0x46/0x50
2013-03-14 06:13:47  [<ffffffff81279f0d>] ? list_del+0x8d/0xa0
2013-03-14 06:13:47  [<ffffffffa0347619>] ? ipoib_cm_tx_reap+0xc9/0x510 [ib_ipoib]
2013-03-14 06:13:47  [<ffffffffa0347550>] ? ipoib_cm_tx_reap+0x0/0x510 [ib_ipoib]
2013-03-14 06:13:47  [<ffffffff8108b370>] ? worker_thread+0x170/0x2a0
2013-03-14 06:13:47  [<ffffffff81090be0>] ? autoremove_wake_function+0x0/0x40
2013-03-14 06:13:47  [<ffffffff8108b200>] ? worker_thread+0x0/0x2a0
2013-03-14 06:13:47  [<ffffffff81090876>] ? kthread+0x96/0xa0
2013-03-14 06:13:47  [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
2013-03-14 06:13:47  [<ffffffff810907e0>] ? kthread+0x0/0xa0
2013-03-14 06:13:47  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
2013-03-14 06:13:47 ---[ end trace e1288d85056fd00d ]---
2013-03-14 06:13:47 BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
2013-03-14 06:13:47 IP: [<ffffffff81279e9b>] list_del+0x1b/0xa0
2013-03-14 06:13:47 PGD 174282067 PUD 145d8f067 PMD 0
2013-03-14 06:13:47 Oops: 0000 [#1] SMP
2013-03-14 06:13:47 last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/net/eth1/statistics/tx_errors
2013-03-14 06:13:47 CPU 2
2013-03-14 06:13:47 Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa mlx4_ib ib_mad ib_core dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm dcdbas i2c_i801 i2c_core ahci iTCO_wdt iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc mlx4_en mlx4_core e1000e [last unloaded: cpufreq_ondemand]
2013-03-14 06:13:47
2013-03-14 06:13:47 Pid: 3160, comm: ipoib Tainted: G        W  ---------------    2.6.32-279.22.1.el6.x86_64 #1 Dell        XS23-TY     /XS23-TY
2013-03-14 06:13:47 RIP: 0010:[<ffffffff81279e9b>]  [<ffffffff81279e9b>] list_del+0x1b/0xa0
2013-03-14 06:13:47 RSP: 0018:ffff880339053db0  EFLAGS: 00010046
2013-03-14 06:13:47 RAX: 0000000000000000 RBX: ffff8801b082f8d0 RCX: 0000000000004aef
2013-03-14 06:13:47 RDX: 0000000000000246 RSI: ffff8801bb8444d0 RDI: ffff8801b082f8d0
2013-03-14 06:13:47 RBP: ffff880339053dc0 R08: ffff8801b082f8d0 R09: 0000000000000000
2013-03-14 06:13:47 R10: ffff8801c0065680 R11: 0000000000000000 R12: ffff8801ba034020
2013-03-14 06:13:47 R13: 0000000000000246 R14: ffff8801ba697e80 R15: ffff8801ba0346e0
2013-03-14 06:13:47 FS:  0000000000000000(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
2013-03-14 06:13:47 CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
2013-03-14 06:13:47 CR2: 0000000000000008 CR3: 00000001a4639000 CR4: 00000000000006e0
2013-03-14 06:13:47 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2013-03-14 06:13:47 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
2013-03-14 06:13:47 Process ipoib (pid: 3160, threadinfo ffff880339052000, task ffff880339256040)
2013-03-14 06:13:47 Stack:
2013-03-14 06:13:47  0000000109b77ac5 ffff8801b082f8c0 ffff880339053e30 ffffffffa0347619
2013-03-14 06:13:47 <d> ffff88033c1acaa0 ffff880339256040 ffff8801ba0352e8 ffff8801ba034340
2013-03-14 06:13:47 <d> ffff880339053e30 ffffffff00000002 ffffe8fe62609a40 ffffe8fe62609a40
2013-03-14 06:13:47 Call Trace:
2013-03-14 06:13:47  [<ffffffffa0347619>] ipoib_cm_tx_reap+0xc9/0x510 [ib_ipoib]
2013-03-14 06:13:47  [<ffffffffa0347550>] ? ipoib_cm_tx_reap+0x0/0x510 [ib_ipoib]
2013-03-14 06:13:47  [<ffffffff8108b370>] worker_thread+0x170/0x2a0
2013-03-14 06:13:47  [<ffffffff81090be0>] ? autoremove_wake_function+0x0/0x40
2013-03-14 06:13:47  [<ffffffff8108b200>] ? worker_thread+0x0/0x2a0
2013-03-14 06:13:47  [<ffffffff81090876>] kthread+0x96/0xa0
2013-03-14 06:13:47  [<ffffffff8100c0ca>] child_rip+0xa/0x20
2013-03-14 06:13:47  [<ffffffff810907e0>] ? kthread+0x0/0xa0
2013-03-14 06:13:47  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
2013-03-14 06:13:47 Code: 4c 8b ad e8 fe ff ff e9 db fd ff ff 90 90 90 90 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 47 08 4c 8b 00 4c 39 c7 75 39 48 8b 03 <4c> 8b 40 08 4c 39 c3 75 4c 48 8b 53 08 48 89 50 08 48 89 02 48
2013-03-14 06:13:47 RIP  [<ffffffff81279e9b>] list_del+0x1b/0xa0
2013-03-14 06:13:47  RSP <ffff880339053db0>
2013-03-14 06:13:47 CR2: 0000000000000008

Second Example:

2013-03-14 07:15:50 ------------[ cut here ]------------
2013-03-14 07:15:50 WARNING: at lib/list_debug.c:30 __list_add+0x8f/0xa0() (Tainted: G        W  ---------------   )
2013-03-14 07:15:50 Hardware name: XS23-TY
2013-03-14 07:15:50 list_add corruption. prev->next should be next (ffff8801af5ed2d0), but was ffff88033b3addd0. (prev=ffff8801ba25f2e8).
2013-03-14 07:15:50 Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa mlx4_ib ib_mad ib_core dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm dcdbas iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core ahci i7core_edac edac_core ioatdma dca shpchp ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc mlx4_en mlx4_core e1000e [last unloaded: cpufreq_ondemand]
2013-03-14 07:15:50 Pid: 4328, comm: kiblnd_sd_07 Tainted: G        W  ---------------    2.6.32-279.22.1.el6.x86_64 #1
2013-03-14 07:15:50 Call Trace:
2013-03-14 07:15:50  <IRQ>  [<ffffffff8106a2a7>] ? warn_slowpath_common+0x87/0xc0
2013-03-14 07:15:50  [<ffffffff8106a396>] ? warn_slowpath_fmt+0x46/0x50
2013-03-14 07:15:50  [<ffffffff81279faf>] ? __list_add+0x8f/0xa0
2013-03-14 07:15:50  [<ffffffffa033fb7e>] ? ipoib_cm_destroy_tx+0x6e/0xc0 [ib_ipoib]
2013-03-14 07:15:50  [<ffffffffa0337b39>] ? ipoib_neigh_dtor+0x89/0xf0 [ib_ipoib]
2013-03-14 07:15:50  [<ffffffffa0337bc8>] ? ipoib_neigh_reclaim+0x28/0x30 [ib_ipoib]
2013-03-14 07:15:50  [<ffffffff810de635>] ? __rcu_process_callbacks+0x135/0x350
2013-03-14 07:15:50  [<ffffffff81012a69>] ? read_tsc+0x9/0x20
2013-03-14 07:15:50  [<ffffffff810de87b>] ? rcu_process_callbacks+0x2b/0x50
2013-03-14 07:15:50  [<ffffffff81072ac1>] ? __do_softirq+0xc1/0x1e0
2013-03-14 07:15:50  [<ffffffff81095760>] ? hrtimer_interrupt+0x140/0x250
2013-03-14 07:15:50  [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30
2013-03-14 07:15:50  [<ffffffff8100de05>] ? do_softirq+0x65/0xa0
2013-03-14 07:15:50  [<ffffffff810728a5>] ? irq_exit+0x85/0x90
2013-03-14 07:15:50  [<ffffffff814f2360>] ? smp_apic_timer_interrupt+0x70/0x9b
2013-03-14 07:15:50  [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
2013-03-14 07:15:50  <EOI>  [<ffffffff814ec947>] ? _spin_unlock_irqrestore+0x17/0x20
2013-03-14 07:15:50  [<ffffffffa0322a46>] ? mlx4_ib_poll_cq+0x2c6/0x7f0 [mlx4_ib]
2013-03-14 07:15:50  [<ffffffffa07a4478>] ? kiblnd_scheduler+0xf8/0x760 [ko2iblnd]
2013-03-14 07:15:50  [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20
2013-03-14 07:15:50  [<ffffffffa07a4380>] ? kiblnd_scheduler+0x0/0x760 [ko2iblnd]
2013-03-14 07:15:50  [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
2013-03-14 07:15:50  [<ffffffffa07a4380>] ? kiblnd_scheduler+0x0/0x760 [ko2iblnd]
2013-03-14 07:15:50  [<ffffffffa07a4380>] ? kiblnd_scheduler+0x0/0x760 [ko2iblnd]
2013-03-14 07:15:50  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
2013-03-14 07:15:50 ---[ end trace ceec6f0d4be48403 ]---
2013-03-14 07:15:50 general protection fault: 0000 [#1] SMP
2013-03-14 07:15:50 last sysfs file: /sys/devices/virtual/dmi/id/sys_vendor
2013-03-14 07:15:50 CPU 0
2013-03-14 07:15:50 Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa mlx4_ib ib_mad ib_core dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm dcdbas iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core ahci i7core_edac edac_core ioatdma dca shpchp ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc mlx4_en mlx4_core e1000e [last unloaded: cpufreq_ondemand]
2013-03-14 07:15:50
2013-03-14 07:15:50 Pid: 3208, comm: ipoib Tainted: G        W  ---------------    2.6.32-279.22.1.el6.x86_64 #1 Dell        XS23-TY     /XS23-TY
2013-03-14 07:15:50 RIP: 0010:[<ffffffff81279e9b>]  [<ffffffff81279e9b>] list_del+0x1b/0xa0
2013-03-14 07:15:50 RSP: 0018:ffff8801bba1ddb0  EFLAGS: 00010046
2013-03-14 07:15:50 RAX: dead000000100100 RBX: ffff8801af5ed2d0 RCX: 000000000000b9d4
2013-03-14 07:15:50 RDX: 0000000000000246 RSI: ffff8801bfe979d0 RDI: ffff8801af5ed2d0
2013-03-14 07:15:50 RBP: ffff8801bba1ddc0 R08: ffff8801af5ed2d0 R09: 0000000000000000
2013-03-14 07:15:50 R10: ffff8801c0065880 R11: 0000000000000000 R12: ffff8801ba25e020
2013-03-14 07:15:50 R13: 0000000000000246 R14: ffff8801ba021400 R15: ffff8801ba25e6e0
2013-03-14 07:15:50 FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
2013-03-14 07:15:50 CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
2013-03-14 07:15:50 CR2: 00002aaab80041f8 CR3: 0000000175615000 CR4: 00000000000006f0
2013-03-14 07:15:50 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2013-03-14 07:15:50 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
2013-03-14 07:15:50 Process ipoib (pid: 3208, threadinfo ffff8801bba1c000, task ffff8801bb536080)
2013-03-14 07:15:50 Stack:
2013-03-14 07:15:50  0000000109f05306 ffff8801af5ed2c0 ffff8801bba1de30 ffffffffa0340619
2013-03-14 07:15:50 <d> ffffffff81a8d020 ffff8801bb536080 ffff8801ba25f2e8 ffff8801ba25e340
2013-03-14 07:15:50 <d> 00000078bba1de30 0000000000000000 ffff8801bba1de10 ffffe8fe62609a40
2013-03-14 07:15:50 Call Trace:
2013-03-14 07:15:50  [<ffffffffa0340619>] ipoib_cm_tx_reap+0xc9/0x510 [ib_ipoib]
2013-03-14 07:15:50  [<ffffffffa0340550>] ? ipoib_cm_tx_reap+0x0/0x510 [ib_ipoib]
2013-03-14 07:15:50  [<ffffffff8108b370>] worker_thread+0x170/0x2a0
2013-03-14 07:15:50  [<ffffffff81090be0>] ? autoremove_wake_function+0x0/0x40
2013-03-14 07:15:50  [<ffffffff8108b200>] ? worker_thread+0x0/0x2a0
2013-03-14 07:15:50  [<ffffffff81090876>] kthread+0x96/0xa0
2013-03-14 07:15:50  [<ffffffff8100c0ca>] child_rip+0xa/0x20
2013-03-14 07:15:50  [<ffffffff810907e0>] ? kthread+0x0/0xa0
2013-03-14 07:15:50  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
2013-03-14 07:15:50 Code: 4c 8b ad e8 fe ff ff e9 db fd ff ff 90 90 90 90 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 47 08 4c 8b 00 4c 39 c7 75 39 48 8b 03 <4c> 8b 40 08 4c 39 c3 75 4c 48 8b 53 08 48 89 50 08 48 89 02 48
2013-03-14 07:15:50 RIP  [<ffffffff81279e9b>] list_del+0x1b/0xa0
2013-03-14 07:15:50  RSP <ffff8801bba1ddb0>


 Comments   
Comment by Oleg Drokin [ 15/Mar/13 ]

this is really a crash in o2ib driver itself, nothing to do with Lustre I suspect.

Comment by Andreas Dilger [ 15/Mar/13 ]

Unless the problem is due to memory corruption (use after free) or similar, though if it was in ipoib for all of the clients this is definitely not even related to o2iblnd.

Did we update OFED recently by any chance? I recall seeing some patches for OFED, but I have no idea if this is relevant for 2.1.5.

Maybe worthwhile to ask LLNL if there was some hiccup on the IB fabric and if they have seen this problem before?

Comment by Minh Diep [ 20/Mar/13 ]

Cliff, what is the last known good kernel that passed this? and what is it now?

Comment by Oleg Drokin [ 20/Mar/13 ]

Looking at the changelog for 279.22.1.el6 that introduced this I see:

BZ#880085
Previously, the IP over Infiniband (IPoIB) driver maintained state information about neighbors on the network by attaching it to the core network's neighbor structure. However, due to a race condition between the freeing of the core network neighbor struct and the freeing of the IPoIB network struct, a use after free condition could happen, resulting in either a kernel oops or 4 or 8 bytes of kernel memory being zeroed when it was not supposed to be. These patches decouple the IPoIB neighbor struct from the core networking stack's neighbor struct so that there is no race between the freeing of one and the freeing of the other.

So this must be it, the failure is in neighbor handling code, but I do not have enough permissions in RH bz to check the patch.
I think it's tiem to file a bug for RH.
We first hit it going from lnxrel="279.14.1.el6" to lnxrel="279.22.1.el6"

Comment by Cliff White (Inactive) [ 20/Mar/13 ]

279.14.1 would be the last kernel that passed.

Comment by Minh Diep [ 20/Mar/13 ]

have you run the same test on master which has version 279.19.1?

Comment by Minh Diep [ 20/Mar/13 ]

the changes around that function in the ipoib_cm.c between 14.1 and 22.1 are

[root@fat-amd-4 infiniband]# diff ulp/ipoib/ipoib_cm.c /root/kernel14/linux-2.6.32-279.14.1.el6/drivers/infiniband/ulp/ipoib/ipoib_cm.c
812c812,814
< ipoib_neigh_free(neigh);

> if (neigh->ah)
> ipoib_put_ah(neigh->ah);
> ipoib_neigh_free(dev, neigh);
1229c1231,1233
< ipoib_neigh_free(neigh);

> if (neigh->ah)
> ipoib_put_ah(neigh->ah);
> ipoib_neigh_free(dev, neigh);
1276c1280
< tx->neigh->daddr + 4);

> tx->neigh->dgid.raw);
1301c1305
< qpn = IPOIB_QPN(neigh->daddr);

> qpn = IPOIB_QPN(neigh->neighbour->ha);
1317c1321,1323
< ipoib_neigh_free(neigh);

> if (neigh->ah)
> ipoib_put_ah(neigh->ah);
> ipoib_neigh_free(dev, neigh);

Comment by Cliff White (Inactive) [ 20/Mar/13 ]

Yes, the test failing is SWL which is run routinely.

Comment by Oleg Drokin [ 21/Mar/13 ]

RedHat bug (confirmed, with a reference to fix): https://bugzilla.redhat.com/show_bug.cgi?id=913645

Comment by Peter Jones [ 02/Apr/13 ]

Yangsheng

Please confirm when a kernel update exists which fixes this Red Hat bug

thanks

Peter

Comment by Yang Sheng [ 02/Apr/13 ]

The latest 2.6.32-358.2.1.el6 still not include the fix(upstream fa16ebed31f336e41970f3f0ea9e8279f6be2d27).

Comment by Oleg Drokin [ 05/Apr/13 ]

Change to pull in the upstream fix while RedHat waits for fix effectiveness confirmation master version is at http://review.whamcloud.com/5952

Also, I just realized that we are not really sure if master is good enough to withstand SWL run at this time, so I made a b2_1 patch too: http://review.whamcloud.com/5953 (it reverts back to the problematic commit that was used originally for this bugreport, but with the fix added on top).

Comment by Yang Sheng [ 19/Jun/13 ]

2.6.32-358.11.1.el6 update already included this fix(LU-3461). So close this one.

Generated at Sat Feb 10 01:29:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.