[LU-8316] BUG: unable to handle kernel NULL pointer dereference at tgt_free_reply_data+0x97/0x330 Created: 22/Jun/16 Updated: 11/May/20 Resolved: 11/May/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Yang Sheng | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
System crashed while testing under memory pressure: [432534.561808] Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 2 clients reconnect [432534.563083] Lustre: Skipped 3 previous similar messages [432534.593088] BUG: unable to handle kernel NULL pointer dereference at (null) [432534.594035] IP: [<ffffffffa07d31f7>] tgt_free_reply_data+0x97/0x330 [ptlrpc] [432534.594035] PGD 3c7cb067 PUD 3836e067 PMD 0 [432534.594035] Oops: 0002 [#1] SMP [432534.594035] Modules linked in: lustre(OF) ofd(OF) osp(OF) lod(OF) ost(OF) mdt(OF) mdd(OF) mgs(OF) osd_ldiskfs(OF) ldiskfs(OF) lquota(OF) lfsck(OF) obdecho(OF) mgc(OF) lov(OF) osc(OF) mdc(OF) lmv(OF) fid(OF) fld(OF) ptlrpc(OF) obdclass(OF) ksocklnd(OF) lnet(OF) libcfs(OF) loop mbcache jbd2 sha512_generic netconsole sg dm_mirror dm_region_hash dm_log crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd serio_raw virtio_balloon virtio_console dm_mod intel_agp i2c_piix4 intel_gtt nfsd auth_rpcgss nfs_acl lockd sunrpc ip_tables xfs ata_generic libcrc32c virtio_net cirrus syscopyarea sysfillrect sysimgblt virtio_scsi drm_kms_helper virtio_blk ttm drm virtio_pci agpgart ata_piix virtio_ring libata virtio i2c_core [last unloaded: libcfs] [432534.594035] CPU: 1 PID: 5669 Comm: mdt01_003 Tainted: GF O-------------- 3.10.0-229.7.2.x86_64 #7 [432534.594035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 [432534.594035] task: ffff880025081580 ti: ffff88002da2c000 task.ti: ffff88002da2c000 [432534.594035] RIP: 0010:[<ffffffffa07d31f7>] [<ffffffffa07d31f7>] tgt_free_reply_data+0x97/0x330 [ptlrpc] [432534.594035] RSP: 0018:ffff88002da2fb90 EFLAGS: 00010293 [432534.594035] RAX: 0000000000000001 RBX: ffff8800133fb8d8 RCX: 0000000000000000 [432534.594035] RDX: 0000000000000000 RSI: ffff88001289f300 RDI: ffff8800133fb8d8 [432534.594035] RBP: ffff88002da2fbd8 R08: ffff8800133fb8d8 R09: 0000000000000000 [432534.594035] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [432534.594035] R13: ffff880000e65718 R14: ffff88001289f3f8 R15: ffff88001289f300 [432534.594035] FS: 0000000000000000(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000 [432534.594035] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [432534.594035] CR2: 0000000000bfc001 CR3: 000000003c289000 CR4: 00000000001406e0 [432534.594035] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [432534.594035] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [432534.594035] Stack: [432534.594035] 0000000000000000 ffff88001289f3f8 ffffffff810c486d ffff88002da2fc30 [432534.594035] ffff8800133fb8d8 ffff88001289f300 ffff88001289ef60 ffff88001289f3f8 [432534.594035] ffff880000e65718 ffff88002da2fc30 ffffffffa07d34ee 0000000000000246 [432534.594035] Call Trace: [432534.594035] [<ffffffff810c486d>] ? trace_hardirqs_on+0xd/0x10 [432534.594035] [<ffffffffa07d34ee>] tgt_release_reply_data+0x5e/0x180 [ptlrpc] [432534.594035] [<ffffffffa07dc128>] tgt_handle_received_xid+0x98/0xe0 [ptlrpc] [432534.594035] [<ffffffffa07e1d38>] tgt_request_handle+0xb88/0x1330 [ptlrpc] [432534.594035] [<ffffffffa078d591>] ptlrpc_server_handle_request+0x231/0xac0 [ptlrpc] [432534.594035] [<ffffffffa078be15>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [432534.594035] [<ffffffffa0791790>] ptlrpc_main+0xab0/0x1e10 [ptlrpc] [432534.594035] [<ffffffff810c486d>] ? trace_hardirqs_on+0xd/0x10 [432534.594035] [<ffffffff8109b842>] ? finish_task_switch+0x42/0x150 [432534.594035] [<ffffffffa0790ce0>] ? ptlrpc_register_service+0xe50/0xe50 [ptlrpc] [432534.594035] [<ffffffff8109008a>] kthread+0xea/0xf0 [432534.594035] [<ffffffff8108ffa0>] ? kthread_create_on_node+0x140/0x140 [432534.594035] [<ffffffff81571258>] ret_from_fork+0x58/0x90 [432534.594035] [<ffffffff8108ffa0>] ? kthread_create_on_node+0x140/0x140 [432534.594035] Code: c1 fa 1f c1 ea 0c c1 f9 14 41 8d 04 14 25 ff ff 0f 00 29 d0 83 f9 0f 0f 8f 72 02 00 00 49 8b 95 28 04 00 00 48 63 c9 48 8b 14 ca <f0> 0f b3 02 19 c0 85 c0 0f 84 8b 01 00 00 48 85 db 0f 84 1b 02 [432534.594035] RIP [<ffffffffa07d31f7>] tgt_free_reply_data+0x97/0x330 [ptlrpc] [432534.594035] RSP <ffff88002da2fb90> [432534.594035] CR2: 0000000000000000 [432534.712915] ---[ end trace 26ac593d02d07dd0 ]--- [432534.714120] Kernel panic - not syncing: Fatal exception This issue is caused by error return value in : /* reply_data is supported by MDT targets only for now */
if (strncmp(obd->obd_type->typ_name, LUSTRE_MDT_NAME, 3) != 0)
RETURN(0);
OBD_ALLOC(lut->lut_reply_bitmap,
LUT_REPLY_SLOTS_MAX_CHUNKS * sizeof(unsigned long *));
if (lut->lut_reply_bitmap == NULL)
GOTO(out, rc);
-----------------------------^^^
memset(&attr, 0, sizeof(attr));
attr.la_valid = LA_MODE;
I'll push a patch for it. |
| Comments |
| Comment by Gerrit Updater [ 22/Jun/16 ] |
|
Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/20918 |
| Comment by Oleg Drokin [ 23/Jun/16 ] |
|
Must be a dup of |
| Comment by Yang Sheng [ 29/Jun/16 ] |
|
Hi, Oleg, Yes, I think it is almost dup of |
| Comment by Gerrit Updater [ 05/Jul/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20918/ |
| Comment by Yang Sheng [ 06/Jul/16 ] |
|
Patch landed. Close ticket. |
| Comment by Oleg Drokin [ 24/Apr/18 ] |
|
I just had this hit again on current master-next. [178184.101361] Lustre: DEBUG MARKER: == replay-single test 39: test recovery from unlink llog (test llog_gen_rec) ========================= 02:06:10 (1524377170) [178187.227063] Turning device loop0 (0x700000) read-only [178187.304890] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 [178187.327340] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000 [178189.095109] LustreError: 25219:0:(client.c:1147:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8802930a6c00 x1598423905600144/t0(0) o6->lustre-OST0000-osc-MDT0000@0@lo:28/4 lens 664/432 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [178189.160680] BUG: unable to handle kernel NULL pointer dereference at (null) [178189.162062] IP: [<ffffffffa06557d3>] tgt_free_reply_data+0x93/0x370 [ptlrpc] [178189.163275] PGD 0 [178189.163867] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [178189.164595] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate mbcache jbd2 syscopyarea sysfillrect sysimgblt ata_generic ttm pata_acpi drm_kms_helper drm ata_piix i2c_piix4 libata pcspkr serio_raw virtio_balloon virtio_blk virtio_console i2c_core floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs] [178189.172341] CPU: 3 PID: 143 Comm: kworker/3:1 Tainted: P OE ------------ 3.10.0-debug #2 [178189.176342] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [178189.177281] Workqueue: obd_zombid obd_zombie_exp_cull [obdclass] [178189.177975] task: ffff88032107e4c0 ti: ffff880321084000 task.ti: ffff880321084000 [178189.182094] RIP: 0010:[<ffffffffa06557d3>] [<ffffffffa06557d3>] tgt_free_reply_data+0x93/0x370 [ptlrpc] [178189.183637] RSP: 0018:ffff880321087c68 EFLAGS: 00010293 [178189.184412] RAX: 0000000000000000 RBX: ffff88025d2b5500 RCX: 0000000000000000 [178189.185641] RDX: 0000000000000000 RSI: ffff8800a9644be0 RDI: ffff88025d2b5500 [178189.187161] RBP: ffff880321087cb0 R08: ffff88025d2b5500 R09: 0000000000000000 [178189.188393] R10: 0000000000000000 R11: ffff88028dce37e0 R12: 0000000000000000 [178189.189623] R13: ffff88029383c0b0 R14: ffff88029383c0b0 R15: ffff8800a9644be0 [178189.190921] FS: 0000000000000000(0000) GS:ffff88033e460000(0000) knlGS:0000000000000000 [178189.192189] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [178189.192849] CR2: 0000000000000000 CR3: 0000000001c0e000 CR4: 00000000000006e0 [178189.194304] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [178189.195545] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [178189.196776] Stack: [178189.197399] ffff88032107e4c0 ffff8800a9644be0 0000000000000000 ffff880321087fd8 [178189.199036] ffff88027eaa9400 ffff8800a9644be0 ffff8800a9644be0 ffff88029383c0b0 [178189.201390] ffff88029383c0b0 ffff880321087d08 ffffffffa0655b38 ffff880321087d10 [178189.207062] Call Trace: [178189.207804] [<ffffffffa0655b38>] tgt_release_reply_data+0x88/0x180 [ptlrpc] [178189.208621] [<ffffffffa02183d8>] ? cfs_hash_putref+0x2e8/0x500 [libcfs] [178189.209388] [<ffffffffa06562e1>] tgt_client_free+0x81/0x360 [ptlrpc] [178189.210344] [<ffffffffa0cda13a>] mdt_destroy_export+0x5a/0x200 [mdt] [178189.211100] [<ffffffffa0395815>] class_export_destroy+0xe5/0x490 [obdclass] [178189.211914] [<ffffffffa0395bd5>] obd_zombie_exp_cull+0x15/0x20 [obdclass] [178189.212897] [<ffffffff8109adb6>] process_one_work+0x206/0x5b0 [178189.213660] [<ffffffff8109ad4b>] ? process_one_work+0x19b/0x5b0 [178189.214358] [<ffffffff8109b27b>] worker_thread+0x11b/0x3a0 [178189.215037] [<ffffffff8109b160>] ? process_one_work+0x5b0/0x5b0 [178189.215718] [<ffffffff810a2eba>] kthread+0xea/0xf0 [178189.216382] [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140 [178189.217103] [<ffffffff8170fb98>] ret_from_fork+0x58/0x90 [178189.217805] [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140 [178189.251576] Code: 41 0f 49 cc c1 fa 1f c1 ea 0c c1 f9 14 41 8d 04 14 25 ff ff 0f 00 29 d0 83 f9 0f 0f 8f b1 02 00 00 49 8b 95 58 04 00 00 48 63 c9 <48> 8b 14 ca 48 85 d2 0f 84 cf 01 00 00 f0 0f b3 02 19 c0 85 c0 |
| Comment by Oleg Drokin [ 24/Apr/18 ] |
|
seems to be still present |
| Comment by Oleg Drokin [ 11/May/20 ] |
|
that did not reoccur since Apr 23, 2018 in my testing it seems |