Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.10.0
-
None
-
3
-
9223372036854775807
Description
[ 3773.614054] BUG: unable to handle kernel paging request at ffff88007d283e58 [ 3773.614736] IP: [<ffffffffa02a6e41>] lnet_return_tx_credits_locked+0x1f1/0x480 [lnet] [ 3773.615880] PGD 2e75067 PUD bcc1a067 PMD bca30067 PTE 800000007d283060 [ 3773.616481] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 3773.617034] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) loop mbcache jbd2 rpcsec_gss_krb5 ata_generic syscopyarea pata_acpi sysfillrect sysimgblt ttm drm_kms_helper i2c_piix4 drm ata_piix serio_raw virtio_blk pcspkr i2c_core libata virtio_console virtio_balloon floppy nfsd ip_tables [last unloaded: libcfs] [ 3773.621922] CPU: 0 PID: 28885 Comm: socknal_sd00_01 Tainted: G OE ------------ 3.10.0-debug #1 [ 3773.622981] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 3773.623523] task: ffff880082b3aa00 ti: ffff880073cd0000 task.ti: ffff880073cd0000 [ 3773.624614] RIP: 0010:[<ffffffffa02a6e41>] [<ffffffffa02a6e41>] lnet_return_tx_credits_locked+0x1f1/0x480 [lnet] [ 3773.625715] RSP: 0018:ffff880073cd3cd0 EFLAGS: 00010282 [ 3773.626390] RAX: 0000000000000000 RBX: ffff88006433ee00 RCX: 000000000d6a0d68 [ 3773.626961] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006671b280 [ 3773.627526] RBP: ffff880073cd3d00 R08: 0000000000000000 R09: 0000000000000000 [ 3773.628099] R10: 0000000000000000 R11: ffff880082b3b2d8 R12: ffff88006a839e00 [ 3773.628664] R13: ffff880073c1ee00 R14: ffff88006a839ea8 R15: ffff88007d283e10 [ 3773.629237] FS: 0000000000000000(0000) GS:ffff8800bc600000(0000) knlGS:0000000000000000 [ 3773.643338] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 3773.643946] CR2: ffff88007d283e58 CR3: 000000002907b000 CR4: 00000000000006f0 [ 3773.644521] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3773.645100] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 3773.645674] Stack: [ 3773.646158] ffff88007d283e00 ffff88006433ee00 0000000000000000 ffff880020a2cf80 [ 3773.647203] 0000000000000000 ffff88006433ee10 ffff880073cd3d30 ffffffffa02995db [ 3773.648244] ffff88006433ee00 0000000000000000 ffff880020a2cf80 ffff880020a2cf88 [ 3773.649287] Call Trace: [ 3773.649789] [<ffffffffa02995db>] lnet_msg_decommit+0xeb/0x700 [lnet] [ 3773.651054] [<ffffffffa0299f89>] lnet_finalize+0x1e9/0x690 [lnet] [ 3773.651612] [<ffffffffa01c8fe5>] ksocknal_tx_done+0x85/0x1c0 [ksocklnd] [ 3773.652215] [<ffffffffa01cdc64>] ksocknal_scheduler+0x234/0x680 [ksocklnd] [ 3773.652784] [<ffffffff810a4090>] ? wake_up_atomic_t+0x30/0x30 [ 3773.653408] [<ffffffffa01cda30>] ? ksocknal_recv+0x2a0/0x2a0 [ksocklnd] [ 3773.654011] [<ffffffff810a2eda>] kthread+0xea/0xf0 [ 3773.654551] [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140 [ 3773.655145] [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90 [ 3773.655684] [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140 [ 3773.656497] Code: 01 0f 84 1d 02 00 00 0f b7 43 58 89 c1 66 41 33 4f 48 66 f7 c1 fe ff 0f 85 d5 00 00 00 48 8b 7d d0 be 01 00 00 00 e8 6f cc ff ff <41> 0f b7 47 48 89 c2 66 33 53 58 66 f7 c2 fe ff 0f 84 7c fe ff [ 3773.658705] RIP [<ffffffffa02a6e41>] lnet_return_tx_credits_locked+0x1f1/0x480 [lnet] [ 3773.659774] RSP <ffff880073cd3cd0> [ 3773.660318] CR2: ffff88007d283e58
Suspect code
1050 »·······»·······»·······if (msg2->msg_tx_cpt != msg->msg_tx_cpt) { 1051 »·······»·······»·······»·······lnet_net_unlock(msg->msg_tx_cpt); 1052 »·······»·······»·······»·······lnet_net_lock(msg2->msg_tx_cpt); 1053 »·······»·······»·······} 1054 (void) lnet_post_send_locked(msg2, 1); 1055 »·······»·······»·······if (msg2->msg_tx_cpt != msg->msg_tx_cpt) { 1056 »·······»·······»·······»·······lnet_net_unlock(msg2->msg_tx_cpt); 1057 »·······»·······»·······»·······lnet_net_lock(msg->msg_tx_cpt); 1058 »·······»·······»·······}
lnet_finalize() could've been called on msg2 resulting in it being freed. Subsequent access is illegal.
Attachments
Issue Links
- is related to
-
LU-10163 kernel NULL pointer dereference on heavy load
- Resolved