[LU-11281] general protection fault in ptlrpc_at_send_early_reply Created: 24/Aug/18  Updated: 15/Mar/19  Resolved: 16/Sep/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Minor
Reporter: Hongchao Zhang Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
2018-08-21 10:00:30 [430508.674512] general protection fault: 0000 [#1] SMP
2018-08-21 10:00:30 [430508.676381] Modules linked in: tcp_diag udp_diag inet_diag iptable_filter osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_ssse3 sha512_generic crypto_null libcfs(OE) dell_rbu dcdbas rpcrdma ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel ppdev i6300esb lrw gf128mul sg i2c_piix4 glue_helper ablk_helper cryptd parport_pc parport pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables ext4 mbcache jbd2 sd_mod sr_mod crc_t10dif cdrom crct10dif_generic ata_generic pata_acpi bochs_drm drm_kms_helper syscopyarea sysfillrect hfi1(OE) sysimgblt fb_sys_fops i2c_algo_bit ttm rdmavt(OE) ata_piix crct10dif_pclmul crct10dif_common drm ib_core libata crc32c_intel e1000 sfablkdriver(OE) igbvf serio_raw i2c_core floppy dm_mirror dm_region_hash dm_log dm_mod
2018-08-21 10:00:30 [430508.695579] CPU: 5 PID: 3488 Comm: ll_ost_create01 Tainted: G           OE  ------------   3.10.0-514.26.2.el7_lustre.2.7.21.1.ddn4.g3b21639.x86_64 #1
2018-08-21 10:00:30 [430508.699599] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
2018-08-21 10:00:30 [430508.701863] task: ffff8805b6a22f10 ti: ffff880d69f00000 task.ti: ffff880d69f00000
2018-08-21 10:00:30 [430508.703853] RIP: 0010:[<ffffffffa0aab2a7>]  [<ffffffffa0aab2a7>] ptlrpc_at_send_early_reply+0x27/0xd50 [ptlrpc]
2018-08-21 10:00:30 [430508.706181] RSP: 0018:ffff880d69f03da8  EFLAGS: 00010282
2018-08-21 10:00:30 [430508.708034] RAX: 5a5a5a5a5a5a5a5a RBX: ffff880e28b02050 RCX: dead000000000200
2018-08-21 10:00:30 [430508.709968] RDX: ffff880d69f03e80 RSI: 00000000c810c80e RDI: ffff880e28b02050
2018-08-21 10:00:30 [430508.711897] RBP: ffff880d69f03de0 R08: ffff880e28b020d0 R09: 0000000000000000
2018-08-21 10:00:30 [430508.713811] R10: 0000000000000000 R11: 0000000000000000 R12: 5a5a5a5a5a5a5a5a
2018-08-21 10:00:30 [430508.715767] R13: ffff88161c538a30 R14: ffff88161c4ea400 R15: ffff880e28b02050
2018-08-21 10:00:30 [430508.717664] FS:  0000000000000000(0000) GS:ffff881663f40000(0000) knlGS:0000000000000000
2018-08-21 10:00:30 [430508.719673] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2018-08-21 10:00:30 [430508.721443] CR2: 00007f3ab10d0660 CR3: 000000164afe7000 CR4: 00000000003407e0
2018-08-21 10:00:30 [430508.723375] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2018-08-21 10:00:30 [430508.725341] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2018-08-21 10:00:30 [430508.727285] Stack:
2018-08-21 10:00:30 [430508.728774]  ffff88161c538a30 ffff880d69f03dc0 ffff880e28b02050 ffff880e28b020d0
2018-08-21 10:00:30 [430508.730703]  ffff88161c538a30 ffff88161c4ea400 ffff880e28b02050 ffff880d69f03ec0
2018-08-21 10:00:30 [430508.732668]  ffffffffa0ab2707 ffff880d69f03e40 0000000119a50b9c 0000000119a50b9c
2018-08-21 10:00:30 [430508.734537] Call Trace:
2018-08-21 10:00:30 [430508.735978]  [<ffffffffa0ab2707>] ptlrpc_main+0x1257/0x1f60 [ptlrpc]
2018-08-21 10:00:30 [430508.737670]  [<ffffffffa0ab14b0>] ? ptlrpc_register_service+0x1070/0x1070 [ptlrpc]
2018-08-21 10:00:30 [430508.739417]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
2018-08-21 10:00:30 [430508.740939]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
2018-08-21 10:00:30 [430508.742586]  [<ffffffff81698598>] ret_from_fork+0x58/0x90
2018-08-21 10:00:30 [430508.744361]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
2018-08-21 10:00:30 [430508.746048] Code: ff ff 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 ec 10 48 8b 87 90 01 00 00 4c 8b a7 80 02 00 00 <4c> 8b 70 20 e8 80 e6 63 e0 f6 05 65 8f c7 ff 01 49 89 c5 74 0d
2018-08-21 10:00:30 [430508.750624] RIP  [<ffffffffa0aab2a7>] ptlrpc_at_send_early_reply+0x27/0xd50 [ptlrpc]
2018-08-21 10:00:30 [430508.752365]  RSP <ffff880d69f03da8>
crash> dis -l ptlrpc_at_send_early_reply
/usr/src/debug/lustre-2.7.21.3.ddn23.g2638075/lustre/ptlrpc/service.c: 1278
0xffffffffa0a9f280 <ptlrpc_at_send_early_reply>:        nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffa0a9f285 <ptlrpc_at_send_early_reply+5>:      push   %rbp
0xffffffffa0a9f286 <ptlrpc_at_send_early_reply+6>:      mov    %rsp,%rbp
0xffffffffa0a9f289 <ptlrpc_at_send_early_reply+9>:      push   %r15
0xffffffffa0a9f28b <ptlrpc_at_send_early_reply+11>:     push   %r14
0xffffffffa0a9f28d <ptlrpc_at_send_early_reply+13>:     push   %r13
0xffffffffa0a9f28f <ptlrpc_at_send_early_reply+15>:     push   %r12
0xffffffffa0a9f291 <ptlrpc_at_send_early_reply+17>:     push   %rbx
0xffffffffa0a9f292 <ptlrpc_at_send_early_reply+18>:     mov    %rdi,%rbx
0xffffffffa0a9f295 <ptlrpc_at_send_early_reply+21>:     sub    $0x10,%rsp
/usr/src/debug/lustre-2.7.21.3.ddn23.g2638075/lustre/ptlrpc/service.c: 1279
0xffffffffa0a9f299 <ptlrpc_at_send_early_reply+25>:     mov    0x190(%rdi),%rax
/usr/src/debug/lustre-2.7.21.3.ddn23.g2638075/lustre/ptlrpc/service.c: 1282
0xffffffffa0a9f2a0 <ptlrpc_at_send_early_reply+32>:     mov    0x280(%rdi),%r12
/usr/src/debug/lustre-2.7.21.3.ddn23.g2638075/lustre/ptlrpc/service.c: 1279
0xffffffffa0a9f2a7 <ptlrpc_at_send_early_reply+39>:     mov    0x20(%rax),%r14
/usr/src/debug/lustre-2.7.21.3.ddn23.g2638075/libcfs/include/libcfs/linux/linux-time.h: 159
0xffffffffa0a9f2ab <ptlrpc_at_send_early_reply+43>:     callq  0xffffffff810e9930 <get_seconds>
/usr/src/debug/lustre-2.7.21.3.ddn23.g2638075/libcfs/include/libcfs/libcfs_debug.h: 220
0xffffffffa0a9f2b0 <ptlrpc_at_send_early_reply+48>:     testb  $0x1,-0x39009b(%rip)        # 0xffffffffa070f21c <libcfs_debug>
/usr/src/debug/lustre-2.7.21.3.ddn23.g2638075/libcfs/include/libcfs/linux/linux-time.h: 159
0xffffffffa0a9f2b7 <ptlrpc_at_send_early_reply+55>:     mov    %rax,%r13
   1278 static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req)
   1279 {
   1280         struct ptlrpc_service_part *svcpt = req->rq_rqbd->rqbd_svcpt;
   1281         struct ptlrpc_request *reqcopy;
   1282         struct lustre_msg *reqmsg;
   1283         cfs_duration_t olddl = req->rq_deadline - cfs_time_current_sec();
   1284         time_t  newdl;
   1285         int rc;
   1286
   1287         ENTRY;
   1288
   1289         if (CFS_FAIL_CHECK(OBD_FAIL_TGT_REPLAY_RECONNECT)) {
   1290                 /* don't send early reply */
   1291                 RETURN(1);
   1292         }

The instruction was at "ptlrpc_at_send_early_reply+0x27/0xd50"

(gdb) l *(ptlrpc_at_send_early_reply+0x27)
0x602d7 is in ptlrpc_at_send_early_reply (/usr/src/debug/lustre-2.7.21.3.ddn23.g2638075/lustre/ptlrpc/service.c:1279).
1274	/usr/src/debug/lustre-2.7.21.3.ddn23.g2638075/lustre/ptlrpc/service.c: No such file or directory.


 Comments   
Comment by Gerrit Updater [ 24/Aug/18 ]

Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33071
Subject: LU-11281 ptlrpc: race in AT early reply
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: cd5c21ee748e3f38cbd8ae0b8247cb573e2cee5f

Comment by Gerrit Updater [ 16/Sep/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33071/
Subject: LU-11281 ptlrpc: race in AT early reply
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 48e409e65edda13dd647cf37458a7a8ae8218a23

Comment by Peter Jones [ 16/Sep/18 ]

Landed for 2.12

Generated at Sat Feb 10 02:42:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.