[LU-6022] replay-single test 73c hung: RIP: ptlrpc_replay_next+0xdb/0x380 [ptlrpc] Created: 12/Dec/14  Updated: 27/Jun/16

Status: Reopened
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0, Lustre 2.5.4
Fix Version/s: Lustre 2.5.4

Type: Bug Priority: Critical
Reporter: Jian Yu Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Duplicate
Related
is related to LU-6596 GPF: RIP [<ffffffffa076924b>] ptlrpc_... Resolved
Severity: 3
Rank (Obsolete): 16779

 Description   

While verifying patch http://review.whamcloud.com/13025 on Lustre b2_5 branch, replay-single test 73c hung.

On client node:

general protection fault: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/possible
CPU 0
Modules linked in: lustre(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) ext2 sha512_generic sha256_generic nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]

Pid: 32476, comm: ptlrpcd_rcv Not tainted 2.6.32-431.29.2.el6.x86_64 #1 Red Hat KVM
RIP: 0010:[<ffffffffa0d79f6b>]  [<ffffffffa0d79f6b>] ptlrpc_replay_next+0xdb/0x380 [ptlrpc] 
RSP: 0018:ffff88007cf75bb0  EFLAGS: 00010296 
RAX: 5a5a5a5a5a5a5a5a RBX: ffff88007d5d4000 RCX: 0000000000000000
RDX: ffff88007d5d40b0 RSI: 0000000000000000 RDI: ffff880037cc4dc0
RBP: ffff88007cf75be0 R08: 00000000ffffff0a R09: 00000000fffffffe
R10: 0000000000000000 R11: 00000000000000be R12: 0000000000000000
R13: ffff88007d5d4288 R14: ffff88007cf75c1c R15: ffff88007c94f400
FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000003df8cac9d0 CR3: 000000007d7a0000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ptlrpcd_rcv (pid: 32476, threadinfo ffff88007cf74000, task ffff88007bd00040)
Stack:
 ffffffffa0df7b40 ffff88007d5d4000 ffff88007c7a2400 ffff88007d5d4288
<d> 0000000000000000 ffff8800739304c0 ffff88007cf75c40 ffffffffa0d9e380
<d> 0000000000000010 ffff88007cf75c50 ffff88007cf75c10 ffff8800739304c0
Call Trace:
 [<ffffffffa0d9e380>] ptlrpc_import_recovery_state_machine+0x360/0xc30 [ptlrpc] 
 [<ffffffffa0d9fa79>] ptlrpc_connect_interpret+0x779/0x21d0 [ptlrpc] 
 [<ffffffffa0d94b9b>] ? ptlrpc_pinger_commit_expected+0x1b/0x90 [ptlrpc] 
 [<ffffffffa0d76d7d>] ptlrpc_check_set+0x31d/0x1c20 [ptlrpc] 
 [<ffffffff81084a1b>] ? try_to_del_timer_sync+0x7b/0xe0
 [<ffffffffa0da2423>] ptlrpcd_check+0x533/0x550 [ptlrpc] 
 [<ffffffffa0da293b>] ptlrpcd+0x20b/0x370 [ptlrpc] 
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0da2730>] ? ptlrpcd+0x0/0x370 [ptlrpc] 
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20Code: c0 00 00 00 48 8b 00 48 39 c2 48 89 83 c0 00 00 00 75 18 eb 23 0f 1f 00 48 8b 00 48 39 c2 48 89 83 c0 00 00 00 0f 84 8c 00 00 00 <4c> 3b 60 f0 4c 8d b8 f0 fe ff ff 73 e0 4d 85 ff 74 7a f6 83 8d
RIP  [<ffffffffa0d79f6b>] ptlrpc_replay_next+0xdb/0x380 [ptlrpc] 
 RSP <ffff88007cf75bb0>

Maloo report:
https://testing.hpdd.intel.com/test_sets/e4fc03f8-8160-11e4-b551-5254006e85c2


Generated at Sat Feb 10 01:56:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.