[LU-6596] GPF: RIP [<ffffffffa076924b>] ptlrpc_replay_next+0xdb/0x380 [ptlrpc] Created: 13/May/15 Updated: 02/Nov/15 Resolved: 02/Nov/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Bruno Travouillon (Inactive) | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
kernel 2.6.32-504.12.2.el6 |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Since its update from 2.5.3 to 2.5.3.90, one of our customer is hitting the following GPF on client nodes while the MDT is in recovery. <4>general protection fault: 0000 [0000001] SMP <4>last sysfs file: /sys/module/ipv6/initstate <4>CPU 21 <4>Modules linked in: iptable_mangle iptable_filter lmv(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) lquota(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic crc32c_intel libcfs(U) nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ipv6 ib_uverbs(U) ib_umad(U) mlx4_ib(U) ib_sa(U) mlx4_core(U) ib_mthca(U) ib_mad(U) ib_core(U) mic(U) uinput ipmi_si ipmi_msghandler sg compat(U) lpc_ich mfd_core ioatdma myri10ge igb dca i2c_algo_bit i2c_core ptp pps_core ext4 jbd2 mbcache ahci sd_mod crc_t10dif dm_mirror dm_region_hash dm_log dm_mod megaraid_sas [last unloaded: scsi_wait_scan] <4> <4>Pid: 10457, comm: ptlrpcd_rcv Not tainted 2.6.32-504.12.2.el6.Bull.72.x86_64 0000001 BULL bullx super-node <4>RIP: 0010:[<ffffffffa076924b>] [<ffffffffa076924b>] ptlrpc_replay_next+0xdb/0x380 [ptlrpc] <4>RSP: 0018:ffff88086d375bb0 EFLAGS: 00010296 <4>RAX: 5a5a5a5a5a5a5a5a RBX: ffff88107ad2d800 RCX: ffff8806e2f65d10 <4>RDX: ffff88107ad2d8b0 RSI: ffff88086d375c1c RDI: ffff88107ad2d800 <4>RBP: ffff88086d375be0 R08: 0000000000000000 R09: 0000000000000000 <4>R10: ffff88087ca2fe50 R11: 0000000000000000 R12: 0000000000000000 <4>R13: ffff88107ad2da90 R14: ffff88086d375c1c R15: ffff881e7f308000 <4>FS: 0000000000000000(0000) GS:ffff88089c540000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>CR2: 00002ba018b2c000 CR3: 0000000001a85000 CR4: 00000000000007e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process ptlrpcd_rcv (pid: 10457, threadinfo ffff88086d374000, task ffff88087b2b6040) <4>Stack: <4> 0000000000000000 ffff88107ad2d800 ffff880917d81800 ffff88107ad2da90 <4><d> 0000000000000000 ffff880d22697cc0 ffff88086d375c40 ffffffffa078e570 <4><d> 0000000000000000 ffff88107b316078 0000000000000000 ffff880d22697cc0 <4>Call Trace: <4> [<ffffffffa078e570>] ptlrpc_import_recovery_state_machine+0x360/0xc30 [ptlrpc] <4> [<ffffffffa078fc69>] ptlrpc_connect_interpret+0x779/0x21d0 [ptlrpc] <4> [<ffffffffa0784d6b>] ? ptlrpc_pinger_commit_expected+0x1b/0x90 [ptlrpc] <4> [<ffffffffa076605d>] ptlrpc_check_set+0x31d/0x1c20 [ptlrpc] <4> [<ffffffff81087fdb>] ? try_to_del_timer_sync+0x7b/0xe0 <4> [<ffffffffa0792613>] ptlrpcd_check+0x533/0x550 [ptlrpc] <4> [<ffffffffa0792b2b>] ptlrpcd+0x20b/0x370 [ptlrpc] <4> [<ffffffff81064b90>] ? default_wake_function+0x0/0x20 <4> [<ffffffffa0792920>] ? ptlrpcd+0x0/0x370 [ptlrpc] <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4>Code: c0 00 00 00 48 8b 00 48 39 c2 48 89 83 c0 00 00 00 75 18 eb 23 0f 1f 00 48 8b 00 48 39 c2 48 89 83 c0 00 00 00 0f 84 8c 00 00 00 <4c> 3b 60 f0 4c 8d b8 f0 fe ff ff 73 e0 4d 85 ff 74 7a f6 83 95 <1>RIP [<ffffffffa076924b>] ptlrpc_replay_next+0xdb/0x380 [ptlrpc] <4> RSP <ffff88086d375bb0> Our MDS is an active/passive HA cluster. This GPF can occur during failover or failback of the MDT. Occurred on 05/04, 05/07 and 05/12. During the last occurrence, we lost 200 compute nodes and 2 login nodes. The stack looks like LU-6022. Could you help us on this one? |
| Comments |
| Comment by Bruno Travouillon (Inactive) [ 13/May/15 ] |
|
FTR, our build include |
| Comment by Peter Jones [ 13/May/15 ] |
|
Bruno F Could you please advise on this one? thanks Peter |
| Comment by Bruno Faccini (Inactive) [ 18/May/15 ] |
|
Hello Bruno, |
| Comment by Sebastien Buisson (Inactive) [ 20/May/15 ] |
|
Hi Bruno, As for the patches that we apply on top of 2.5.3.90, here is the list you requested:
Cheers, |
| Comment by Bruno Travouillon (Inactive) [ 20/May/15 ] |
|
Hello Bruno, The customer is a blacksite. We can schedule a meeting to work on the crash-dump onsite if you want to. |
| Comment by Oleg Drokin [ 24/Jul/15 ] |
|
I just hit this (or so I think) during my regular testing of master. I have a crashdump, so that should give us quite a bit of extra info. <1>[26215.496409] BUG: unable to handle kernel paging request at ffffffffffffffe8 <1>[26215.496730] IP: [<ffffffffa16cf30b>] ptlrpc_replay_next+0xdb/0x370 [ptlrpc] <4>[26215.497079] PGD 1a27067 PUD 1a28067 PMD 0 <4>[26215.497358] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC <4>[26215.497650] last sysfs file: /sys/devices/system/cpu/possible <4>[26215.497954] CPU 7 <4>[26215.497996] Modules linked in: lustre ofd osp lod ost mdt mdd mgs osd_ldiskfs ldiskfs lquota lfsck obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass ksocklnd lnet libcfs zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl zlib_deflate exportfs jbd sha512_generic sha256_generic ext4 jbd2 mbcache virtio_console virtio_balloon i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs] <4>[26215.500111] <4>[26215.500111] Pid: 14755, comm: ptlrpcd_rcv Tainted: P --------------- 2.6.32-rhe6.6-debug #1 Red Hat KVM <4>[26215.500111] RIP: 0010:[<ffffffffa16cf30b>] [<ffffffffa16cf30b>] ptlrpc_replay_next+0xdb/0x370 [ptlrpc] <4>[26215.500111] RSP: 0018:ffff880098b55b10 EFLAGS: 00010286 <4>[26215.500111] RAX: 0000000000000000 RBX: ffff8800789617f0 RCX: 0000000000000000 <4>[26215.500111] RDX: ffff8800789618b8 RSI: 0000000000000000 RDI: ffff8800b051ef70 <4>[26215.500111] RBP: ffff880098b55b40 R08: 00000000fffffffb R09: 00000000fffffffe <4>[26215.500111] R10: 0000000000000000 R11: 000000000000005a R12: 0000000000000000 <4>[26215.500111] R13: ffff880078961ad0 R14: ffff880098b55b7c R15: ffff880075689ce8 <4>[26215.500111] FS: 0000000000000000(0000) GS:ffff8800063c0000(0000) knlGS:0000000000000000 <4>[26215.500111] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>[26215.500111] CR2: ffffffffffffffe8 CR3: 000000008a94b000 CR4: 00000000000006e0 <4>[26215.500111] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[26215.500111] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[26215.500111] Process ptlrpcd_rcv (pid: 14755, threadinfo ffff880098b54000, task ffff8800b43d02c0) <4>[26215.500111] Stack: <4>[26215.500111] 0000000000000000 ffff8800789617f0 0000000000000000 ffffffffa177a633 <4>[26215.500111] <d> ffff8800b03b9ce8 ffff880078961ad0 ffff880098b55ba0 ffffffffa16f3830 <4>[26215.500111] <d> 0000000000000010 ffff880098b55bb0 ffff880098b55b70 ffff8800789617f0 <4>[26215.500111] Call Trace: <4>[26215.500111] [<ffffffffa16f3830>] ptlrpc_import_recovery_state_machine+0x360/0xc20 [ptlrpc] <4>[26215.500111] [<ffffffffa16f6bac>] ptlrpc_connect_interpret+0xc4c/0x2570 [ptlrpc] <4>[26215.500111] [<ffffffffa16ec59a>] ? ptlrpc_update_next_ping+0x4a/0xd0 [ptlrpc] <4>[26215.500111] [<ffffffffa16cc2c3>] ptlrpc_check_set+0x613/0x1bf0 [ptlrpc] <4>[26215.500111] [<ffffffff81522574>] ? _spin_lock_irqsave+0x24/0x30 <4>[26215.500111] [<ffffffffa16fa2d3>] ptlrpcd_check+0x3e3/0x630 [ptlrpc] <4>[26215.500111] [<ffffffffa16fa83b>] ptlrpcd+0x31b/0x500 [ptlrpc] <4>[26215.500111] [<ffffffff81061630>] ? default_wake_function+0x0/0x20 <4>[26215.500111] [<ffffffffa16fa520>] ? ptlrpcd+0x0/0x500 [ptlrpc] <4>[26215.500111] [<ffffffff8109ce4e>] kthread+0x9e/0xc0 <4>[26215.500111] [<ffffffff8100c24a>] child_rip+0xa/0x20 <4>[26215.500111] [<ffffffff8109cdb0>] ? kthread+0x0/0xc0 <4>[26215.500111] [<ffffffff8100c240>] ? child_rip+0x0/0x20 <4>[26215.500111] Code: 48 8b 00 48 39 c2 48 89 83 d8 00 00 00 75 1c eb 24 0f 1f 80 00 00 00 00 48 8b 00 48 39 c2 48 89 83 d8 00 00 00 0f 84 7c 00 00 00 <4c> 3b 60 e8 4c 8d 78 80 73 e3 4d 85 ff 74 6d f6 83 f9 02 00 00 <1>[26215.500111] RIP [<ffffffffa16cf30b>] ptlrpc_replay_next+0xdb/0x370 [ptlrpc] <4>[26215.500111] RSP <ffff880098b55b10> <4>[26215.500111] CR2: ffffffffffffffe8 (gdb) l *(ptlrpc_replay_next+0xdb) 0x4a33b is in ptlrpc_replay_next (/home/green/git/lustre-release/lustre/ptlrpc/recover.c:129). 124 while (imp->imp_replay_cursor != 125 &imp->imp_committed_list) { 126 req = list_entry(imp->imp_replay_cursor, 127 struct ptlrpc_request, 128 rq_replay_list); 129 if (req->rq_transno > last_transno) 130 break; 131 132 req = NULL; 133 imp->imp_replay_cursor = (gdb) p (int)0xffffffffffffffe8 $1 = -24 (gdb) p &((struct ptlrpc_request *) 0x0)->rq_transno $1 = (__u64 *) 0x68 (gdb) p (int)0xffffffffffffffe8 - 0x68 $2 = -128 (gdb) quit [green@intelbox lustre-release]$ grep 128 ~/bk/linux-2.6.32-504.3.3.el6-debug/include/asm-generic/errno.h #define EKEYREVOKED 128 /* Key has been revoked */ of course this -128 might be total garbage from somewhere and also there are some differences in the code accumulated, so this might not be totally related or it well might be related too. Crashdump on my box in /exports/crashdumps/192.168.10.221-2015-07-23-18\:24\:10 tag in my tree master-20150723 |
| Comment by Bruno Faccini (Inactive) [ 23/Aug/15 ] |
|
Humm, according to crash-dump infos and related source/assembly code, seems that problem occurs during following piece of code in ptlrpc_replay_next( ) : 119 /* Since the imp_committed_list is immutable before
120 * all of it's requests being replayed, it's safe to
121 * use a cursor to accelerate the search */
122 imp->imp_replay_cursor = imp->imp_replay_cursor->next;
123
124 while (imp->imp_replay_cursor !=
125 &imp->imp_committed_list) {
126 req = list_entry(imp->imp_replay_cursor,
127 struct ptlrpc_request,
128 rq_replay_list);
129 if (req->rq_transno > last_transno)
130 break;
131
132 req = NULL;
133 imp->imp_replay_cursor =
134 imp->imp_replay_cursor->next;
135 }
because imp->imp_replay_cursor has become corrupted, likely due to a race. We may need to add some mutual exclusion protection to allw this code to become safe. |
| Comment by Bruno Faccini (Inactive) [ 24/Aug/15 ] |
|
To be complete, GPF occurs at line : 129 if (req->rq_transno > last_transno) when dereferencing req, which directly comes from the fact that imp->imp_replay_cursor has been corrupted at either loop initialization line : 122 imp->imp_replay_cursor = imp->imp_replay_cursor->next; or inside while loop, at lines : 133 imp->imp_replay_cursor = 134 imp->imp_replay_cursor->next; with LP_POISON (Bull's case) value when not already re-used or not overwritten, or either 0s (Oleg's case) when already re-allocated ... Previous imp->imp_replay_cursor pointer value can not be retrieved from the crash-dumps, according to routine assembly code, so we can presently only think about the possible and different corruption scenarios. Bruno, I have other questions/requests ... Do you still frequently experience this problem during MDT fail-over/back ? If yes, could it be possible for you to give a try to integrate 2x patches for |
| Comment by Bruno Faccini (Inactive) [ 28/Aug/15 ] |
|
I have spent sometimes on-site to have a look to a bunch of Lustre Client crash dumps taken during recovery after MDS/MDT failure/restart and here are more facts/details that can be listed : According to all of this I still feel pretty confident that, as I already indicated, the 2x patches for |
| Comment by Andreas Dilger [ 13/Oct/15 ] |
|
Should this be closed as a duplicate of |
| Comment by Jinshan Xiong (Inactive) [ 02/Nov/15 ] |