[LU-9064] NULL pointer dereference in ptlrpc_unregister_bulk Created: 31/Jan/17  Updated: 15/Oct/17

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Hit this in pretty much current master:

[145518.621248] Lustre: DEBUG MARKER: == recovery-small test 115e: read: late Bulk MDunlink and no reply =================================== 07:11:49 (1485778309)
[145518.734117] Lustre: *** cfs_fail_loc=510, val=0***
[145518.734829] BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
[145518.736213] IP: [<ffffffffa0574484>] ptlrpc_unregister_bulk+0x134/0x7c0 [ptlrpc]
[145518.737374] PGD 0 
[145518.737966] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[145518.738565] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) loop mbcache jbd2 sha512_generic crypto_null rpcsec_gss_krb5 syscopyarea sysfillrect sysimgblt ata_generic ttm pata_acpi drm_kms_helper ata_piix drm i2c_piix4 pcspkr virtio_console virtio_blk virtio_balloon libata i2c_core serio_raw floppy nfsd ip_tables [last unloaded: libcfs]
[145518.744132] CPU: 4 PID: 31858 Comm: ptlrpcd_00_04 Tainted: G           OE  ------------   3.10.0-debug #1
[145518.745218] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[145518.745772] task: ffff880056bdc700 ti: ffff8800678d0000 task.ti: ffff8800678d0000
[145518.746860] RIP: 0010:[<ffffffffa0574484>]  [<ffffffffa0574484>] ptlrpc_unregister_bulk+0x134/0x7c0 [ptlrpc]
[145518.748037] RSP: 0018:ffff8800678d3b98  EFLAGS: 00010212
[145518.748587] RAX: 0000000000000000 RBX: ffff88001e021c40 RCX: 0000000000000000
[145518.749625] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff880065820b00
[145518.750655] RBP: ffff8800678d3c38 R08: 0000000000000000 R09: 0000000000000000
[145518.751721] R10: 00000000000000a0 R11: 0000000000000050 R12: 0000000000000000
[145518.752752] R13: 0000000000000000 R14: 00000000588f2eb2 R15: 0000000000000001
[145518.753785] FS:  0000000000000000(0000) GS:ffff8800bc700000(0000) knlGS:0000000000000000
[145518.754833] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[145518.755385] CR2: 0000000000000060 CR3: 00000000aec21000 CR4: 00000000000006e0
[145518.757144] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[145518.758177] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[145518.759210] Stack:
[145518.759698]  0000000000000000 ffff880004ab9f00 ffff88001e021d90 0000000000000000
[145518.761059]  0000000000000000 ffff880004ab9f60 ffff8800678d3c28 ffffffffa01e5eb7
[145518.762120]  0000000000000010 ffff8800678d3c38 ffff8800678d3bf8 00000000381339d6
[145518.762859] Call Trace:
[145518.763376]  [<ffffffffa01e5eb7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[145518.763997]  [<ffffffffa056cb49>] ptlrpc_check_set.part.21+0x439/0x1e80 [ptlrpc]
[145518.765071]  [<ffffffffa056e5eb>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
[145518.765685]  [<ffffffffa059a34b>] ptlrpcd_check+0x4bb/0x570 [ptlrpc]
[145518.766295]  [<ffffffffa059a6bb>] ptlrpcd+0x2bb/0x580 [ptlrpc]
[145518.766923]  [<ffffffff810b7ce0>] ? wake_up_state+0x20/0x20
[145518.767507]  [<ffffffffa059a400>] ? ptlrpcd_check+0x570/0x570 [ptlrpc]
[145518.768133]  [<ffffffff810a2eda>] kthread+0xea/0xf0
[145518.768722]  [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140
[145518.769437]  [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90
[145518.770135]  [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140
[145518.770797] Code: 00 48 8b 55 d0 65 48 33 14 25 28 00 00 00 44 89 e0 0f 85 95 06 00 00 48 83 c4 78 5b 41 5c 41 5d 41 5e 41 5f 5d c3 90 48 8b 45 80 <48> 3b 58 60 0f 85 0e 06 00 00 48 8b 4d 80 8b 81 ec 00 00 00 4c 
[145518.773307] RIP  [<ffffffffa0574484>] ptlrpc_unregister_bulk+0x134/0x7c0 [ptlrpc]
[145518.774571]  RSP <ffff8800678d3b98>
[145518.775161] CR2: 0000000000000060

tag in my tree: master-20170130

Crashdump and modules: /exports/crashdumps/192.168.10.211-2017-01-30-07:11:58/



 Comments   
Comment by Oleg Drokin [ 03/Jul/17 ]

Just hit once more on 2.10rc1. exact same stacktrace in exact same place, same test.

Comment by Oleg Drokin [ 15/Oct/17 ]

I am not hittign this semi-regularly on master-next runs.

[38962.339231] Lustre: DEBUG MARKER: == recovery-small test 115e: read: late Bulk MDunlink and no reply =================================== 07:07:07 (1507720027)
[38962.664478] BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
[38962.667209] IP: [<ffffffffa05b8d44>] ptlrpc_unregister_bulk+0x114/0x7e0 [ptlrpc]
[38962.668790] PGD 0 
[38962.669449] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[38962.676703] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect sysimgblt ata_generic ttm pata_acpi drm_kms_helper ata_piix i2c_piix4 drm floppy virtio_console libata virtio_blk pcspkr virtio_balloon serio_raw i2c_core nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
[38962.691966] CPU: 5 PID: 31695 Comm: ptlrpcd_00_15 Tainted: P           OE  ------------   3.10.0-debug #2
[38962.694511] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[38962.696271] task: ffff8802db46ab80 ti: ffff8802c1190000 task.ti: ffff8802c1190000
[38962.698601] RIP: 0010:[<ffffffffa05b8d44>]  [<ffffffffa05b8d44>] ptlrpc_unregister_bulk+0x114/0x7e0 [ptlrpc]
[38962.702176] RSP: 0018:ffff8802c1193ba0  EFLAGS: 00010212
[38962.704035] RAX: 0000000000000000 RBX: ffff880090b7bc00 RCX: ffff8802db46b450
[38962.705967] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000206
[38962.706677] RBP: ffff8802c1193c40 R08: 0000000000000001 R09: 0000000000000000
[38962.708466] R10: 0000000000000080 R11: ffff8802db46b458 R12: 0000000000000000
[38962.709324] R13: 0000000000000000 R14: 0000000059ddfc88 R15: 0000000000000001
[38962.726828] FS:  0000000000000000(0000) GS:ffff88033e4a0000(0000) knlGS:0000000000000000
[38962.727720] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[38962.728221] CR2: 0000000000000060 CR3: 0000000001c0e000 CR4: 00000000000006e0
[38962.728720] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[38962.729240] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[38962.729864] Stack:
[38962.730272]  ffffffffa06a0da0 ffff8802ea835f00 ffff880090b7bc18 ffff880090b7bd80
[38962.731213]  0000000000000000 ffff8800a27eed80 ffff8802c1193c30 ffffffffa01efc07
[38962.732151]  0000000000000010 ffff8802c1193c40 ffff8802c1193c00 000000008683b156
[38962.733286] Call Trace:
[38962.733948]  [<ffffffffa01efc07>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[38962.734711]  [<ffffffffa05b154e>] ptlrpc_check_set.part.20+0x3ce/0x1d20 [ptlrpc]
[38962.736442]  [<ffffffffa05b2efb>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
[38962.746540]  [<ffffffffa05df67b>] ptlrpcd_check+0x4ab/0x5a0 [ptlrpc]
[38962.747294]  [<ffffffffa05dfa1b>] ptlrpcd+0x2ab/0x570 [ptlrpc]
[38962.747987]  [<ffffffff810b7cc0>] ? wake_up_state+0x20/0x20
[38962.748692]  [<ffffffffa05df770>] ? ptlrpcd_check+0x5a0/0x5a0 [ptlrpc]
[38962.749372]  [<ffffffff810a2eba>] kthread+0xea/0xf0
[38962.750046]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
[38962.750739]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
[38962.751386]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
[38962.752073] Code: 8b 55 d0 65 48 33 14 25 28 00 00 00 44 89 e0 0f 85 d1 06 00 00 48 83 c4 78 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 00 48 8b 45 80 <48> 3b 58 60 0f 85 48 06 00 00 48 8b 4d 80 8b 81 ec 00 00 00 4c 
[38962.754727] RIP  [<ffffffffa05b8d44>] ptlrpc_unregister_bulk+0x114/0x7e0 [ptlrpc]
[38962.755984]  RSP <ffff8802c1193ba0>
Generated at Sat Feb 10 02:22:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.