[LU-9064] NULL pointer dereference in ptlrpc_unregister_bulk Created: 31/Jan/17 Updated: 15/Oct/17 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Oleg Drokin | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Hit this in pretty much current master: [145518.621248] Lustre: DEBUG MARKER: == recovery-small test 115e: read: late Bulk MDunlink and no reply =================================== 07:11:49 (1485778309) [145518.734117] Lustre: *** cfs_fail_loc=510, val=0*** [145518.734829] BUG: unable to handle kernel NULL pointer dereference at 0000000000000060 [145518.736213] IP: [<ffffffffa0574484>] ptlrpc_unregister_bulk+0x134/0x7c0 [ptlrpc] [145518.737374] PGD 0 [145518.737966] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [145518.738565] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) loop mbcache jbd2 sha512_generic crypto_null rpcsec_gss_krb5 syscopyarea sysfillrect sysimgblt ata_generic ttm pata_acpi drm_kms_helper ata_piix drm i2c_piix4 pcspkr virtio_console virtio_blk virtio_balloon libata i2c_core serio_raw floppy nfsd ip_tables [last unloaded: libcfs] [145518.744132] CPU: 4 PID: 31858 Comm: ptlrpcd_00_04 Tainted: G OE ------------ 3.10.0-debug #1 [145518.745218] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [145518.745772] task: ffff880056bdc700 ti: ffff8800678d0000 task.ti: ffff8800678d0000 [145518.746860] RIP: 0010:[<ffffffffa0574484>] [<ffffffffa0574484>] ptlrpc_unregister_bulk+0x134/0x7c0 [ptlrpc] [145518.748037] RSP: 0018:ffff8800678d3b98 EFLAGS: 00010212 [145518.748587] RAX: 0000000000000000 RBX: ffff88001e021c40 RCX: 0000000000000000 [145518.749625] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff880065820b00 [145518.750655] RBP: ffff8800678d3c38 R08: 0000000000000000 R09: 0000000000000000 [145518.751721] R10: 00000000000000a0 R11: 0000000000000050 R12: 0000000000000000 [145518.752752] R13: 0000000000000000 R14: 00000000588f2eb2 R15: 0000000000000001 [145518.753785] FS: 0000000000000000(0000) GS:ffff8800bc700000(0000) knlGS:0000000000000000 [145518.754833] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [145518.755385] CR2: 0000000000000060 CR3: 00000000aec21000 CR4: 00000000000006e0 [145518.757144] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [145518.758177] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [145518.759210] Stack: [145518.759698] 0000000000000000 ffff880004ab9f00 ffff88001e021d90 0000000000000000 [145518.761059] 0000000000000000 ffff880004ab9f60 ffff8800678d3c28 ffffffffa01e5eb7 [145518.762120] 0000000000000010 ffff8800678d3c38 ffff8800678d3bf8 00000000381339d6 [145518.762859] Call Trace: [145518.763376] [<ffffffffa01e5eb7>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [145518.763997] [<ffffffffa056cb49>] ptlrpc_check_set.part.21+0x439/0x1e80 [ptlrpc] [145518.765071] [<ffffffffa056e5eb>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc] [145518.765685] [<ffffffffa059a34b>] ptlrpcd_check+0x4bb/0x570 [ptlrpc] [145518.766295] [<ffffffffa059a6bb>] ptlrpcd+0x2bb/0x580 [ptlrpc] [145518.766923] [<ffffffff810b7ce0>] ? wake_up_state+0x20/0x20 [145518.767507] [<ffffffffa059a400>] ? ptlrpcd_check+0x570/0x570 [ptlrpc] [145518.768133] [<ffffffff810a2eda>] kthread+0xea/0xf0 [145518.768722] [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140 [145518.769437] [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90 [145518.770135] [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140 [145518.770797] Code: 00 48 8b 55 d0 65 48 33 14 25 28 00 00 00 44 89 e0 0f 85 95 06 00 00 48 83 c4 78 5b 41 5c 41 5d 41 5e 41 5f 5d c3 90 48 8b 45 80 <48> 3b 58 60 0f 85 0e 06 00 00 48 8b 4d 80 8b 81 ec 00 00 00 4c [145518.773307] RIP [<ffffffffa0574484>] ptlrpc_unregister_bulk+0x134/0x7c0 [ptlrpc] [145518.774571] RSP <ffff8800678d3b98> [145518.775161] CR2: 0000000000000060 tag in my tree: master-20170130 Crashdump and modules: /exports/crashdumps/192.168.10.211-2017-01-30-07:11:58/ |
| Comments |
| Comment by Oleg Drokin [ 03/Jul/17 ] |
|
Just hit once more on 2.10rc1. exact same stacktrace in exact same place, same test. |
| Comment by Oleg Drokin [ 15/Oct/17 ] |
|
I am not hittign this semi-regularly on master-next runs. [38962.339231] Lustre: DEBUG MARKER: == recovery-small test 115e: read: late Bulk MDunlink and no reply =================================== 07:07:07 (1507720027) [38962.664478] BUG: unable to handle kernel NULL pointer dereference at 0000000000000060 [38962.667209] IP: [<ffffffffa05b8d44>] ptlrpc_unregister_bulk+0x114/0x7e0 [ptlrpc] [38962.668790] PGD 0 [38962.669449] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [38962.676703] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect sysimgblt ata_generic ttm pata_acpi drm_kms_helper ata_piix i2c_piix4 drm floppy virtio_console libata virtio_blk pcspkr virtio_balloon serio_raw i2c_core nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs] [38962.691966] CPU: 5 PID: 31695 Comm: ptlrpcd_00_15 Tainted: P OE ------------ 3.10.0-debug #2 [38962.694511] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [38962.696271] task: ffff8802db46ab80 ti: ffff8802c1190000 task.ti: ffff8802c1190000 [38962.698601] RIP: 0010:[<ffffffffa05b8d44>] [<ffffffffa05b8d44>] ptlrpc_unregister_bulk+0x114/0x7e0 [ptlrpc] [38962.702176] RSP: 0018:ffff8802c1193ba0 EFLAGS: 00010212 [38962.704035] RAX: 0000000000000000 RBX: ffff880090b7bc00 RCX: ffff8802db46b450 [38962.705967] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000206 [38962.706677] RBP: ffff8802c1193c40 R08: 0000000000000001 R09: 0000000000000000 [38962.708466] R10: 0000000000000080 R11: ffff8802db46b458 R12: 0000000000000000 [38962.709324] R13: 0000000000000000 R14: 0000000059ddfc88 R15: 0000000000000001 [38962.726828] FS: 0000000000000000(0000) GS:ffff88033e4a0000(0000) knlGS:0000000000000000 [38962.727720] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [38962.728221] CR2: 0000000000000060 CR3: 0000000001c0e000 CR4: 00000000000006e0 [38962.728720] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [38962.729240] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [38962.729864] Stack: [38962.730272] ffffffffa06a0da0 ffff8802ea835f00 ffff880090b7bc18 ffff880090b7bd80 [38962.731213] 0000000000000000 ffff8800a27eed80 ffff8802c1193c30 ffffffffa01efc07 [38962.732151] 0000000000000010 ffff8802c1193c40 ffff8802c1193c00 000000008683b156 [38962.733286] Call Trace: [38962.733948] [<ffffffffa01efc07>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [38962.734711] [<ffffffffa05b154e>] ptlrpc_check_set.part.20+0x3ce/0x1d20 [ptlrpc] [38962.736442] [<ffffffffa05b2efb>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc] [38962.746540] [<ffffffffa05df67b>] ptlrpcd_check+0x4ab/0x5a0 [ptlrpc] [38962.747294] [<ffffffffa05dfa1b>] ptlrpcd+0x2ab/0x570 [ptlrpc] [38962.747987] [<ffffffff810b7cc0>] ? wake_up_state+0x20/0x20 [38962.748692] [<ffffffffa05df770>] ? ptlrpcd_check+0x5a0/0x5a0 [ptlrpc] [38962.749372] [<ffffffff810a2eba>] kthread+0xea/0xf0 [38962.750046] [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140 [38962.750739] [<ffffffff8170fb98>] ret_from_fork+0x58/0x90 [38962.751386] [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140 [38962.752073] Code: 8b 55 d0 65 48 33 14 25 28 00 00 00 44 89 e0 0f 85 d1 06 00 00 48 83 c4 78 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 00 48 8b 45 80 <48> 3b 58 60 0f 85 48 06 00 00 48 8b 4d 80 8b 81 ec 00 00 00 4c [38962.754727] RIP [<ffffffffa05b8d44>] ptlrpc_unregister_bulk+0x114/0x7e0 [ptlrpc] [38962.755984] RSP <ffff8802c1193ba0> |