[LU-12011] NULL pointer dereference in osc_io_init in sanity test 411 Created: 25/Feb/19  Updated: 11/Jul/21  Resolved: 10/Jul/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-12436 Memory allocation failure error dropped Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Seems to be the other failure mode to LU-11998 because it only happens in sanity test 411.

[277274.440468] Lustre: DEBUG MARKER: == sanity test 411: Slab allocation error with cgroup does not LBUG ================================== 06:47:26 (1550922446)
[277279.569543] SLAB: Unable to allocate memory on node 0 (gfp=0x100050)
[277279.593403]   cache: kmalloc-512(0:osc_slab_alloc), object size: 4096, order: 0
[277279.595819]   node 0: slabs: 37/37, objs: 37/37, free: 0
[277315.514158] SLAB: Unable to allocate memory on node 0 (gfp=0x100050)
[277315.515355]   cache: kmalloc-512(0:osc_slab_alloc), object size: 4096, order: 0
[277315.551203]   node 0: slabs: 67/67, objs: 67/67, free: 0
[277315.581812] SLAB: Unable to allocate memory on node 0 (gfp=0x100050)
[277315.582961]   cache: kmalloc-512(0:osc_slab_alloc), object size: 4096, order: 0
[277315.585015]   node 0: slabs: 67/67, objs: 67/67, free: 0
[277315.607385] SLAB: Unable to allocate memory on node 0 (gfp=0x100050)
[277315.608453]   cache: kmalloc-512(0:osc_slab_alloc), object size: 4096, order: 0
[277315.610066]   node 0: slabs: 67/67, objs: 67/67, free: 0
[277315.641346] SLAB: Unable to allocate memory on node 0 (gfp=0x100050)
[277315.642642]   cache: kmalloc-512(0:osc_slab_alloc), object size: 4096, order: 0
[277315.645272]   node 0: slabs: 67/67, objs: 67/67, free: 0
[277321.431039] SLAB: Unable to allocate memory on node 0 (gfp=0x100050)
[277321.432413]   cache: osc_session_kmem(0:osc_slab_alloc), object size: 4096, order: 0
[277321.434844]   node 0: slabs: 1/1, objs: 1/1, free: 0
[277321.436257] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024
[277321.438634] IP: [<ffffffffa036a566>] osc_io_init+0x16/0x140 [osc]
[277321.439944] PGD 1b3ea067 PUD 1b3e9067 PMD 0 
[277321.441187] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[277321.442441] Modules linked in: dm_flakey dm_mod lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) jbd2 mbcache crc_t10dif crct10dif_generic crct10dif_common virtio_balloon virtio_console pcspkr i2c_piix4 binfmt_misc ip_tables rpcsec_gss_krb5 ata_generic pata_acpi drm_kms_helper ttm drm drm_panel_orientation_quirks ata_piix i2c_core virtio_blk serio_raw libata floppy [last unloaded: obdecho]
[277321.456256] CPU: 0 PID: 10704 Comm: dd Kdump: loaded Tainted: P        W  OE  ------------   3.10.0-7.6-debug #2
[277321.458692] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[277321.459888] task: ffff880081692900 ti: ffff88001732c000 task.ti: ffff88001732c000
[277321.461944] RIP: 0010:[<ffffffffa036a566>]  [<ffffffffa036a566>] osc_io_init+0x16/0x140 [osc]
[277321.463943] RSP: 0018:ffff88001732fa30  EFLAGS: 00010286
[277321.465068] RAX: ffffffffa036a550 RBX: ffff88005e98ce60 RCX: ffff8800692e9348
[277321.466983] RDX: ffff8800723ece80 RSI: ffffffffa038eec0 RDI: fffffffffffffff4
[277321.468993] RBP: ffff88001732fa40 R08: 0000000000000001 R09: ffff8800254a69c8
[277321.470986] R10: ffff8800254a6000 R11: ffff8800254a69c0 R12: ffff88005e98ce60
[277321.472890] R13: fffffffffffffff4 R14: ffff8800723ece80 R15: ffff88009eea8e38
[277321.474888] FS:  00007fa780b3a740(0000) GS:ffff8800bc800000(0000) knlGS:0000000000000000
[277321.476915] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[277321.478684] CR2: 0000000000000024 CR3: 00000000a6b3c000 CR4: 00000000000006f0
[277321.480711] Call Trace:
[277321.481707]  [<ffffffffa0544468>] cl_io_init0.isra.15+0x88/0x160 [obdclass]
[277321.483127]  [<ffffffffa054457a>] cl_io_sub_init+0x3a/0x80 [obdclass]
[277321.484425]  [<ffffffffa048cc52>] lov_sub_get+0x2b2/0x7e0 [lov]
[277321.485664]  [<ffffffffa04a3321>] ? lov_stripe_intersects+0xa1/0x170 [lov]
[277321.487016]  [<ffffffffa048eb1b>] lov_io_iter_init+0x26b/0x950 [lov]
[277321.488314]  [<ffffffffa048f578>] lov_io_rw_iter_init+0x1a8/0x520 [lov]
[277321.489448]  [<ffffffffa054406c>] cl_io_iter_init+0x5c/0x120 [obdclass]
[277321.490537]  [<ffffffffa0546192>] cl_io_loop+0x42/0x1c0 [obdclass]
[277321.491626]  [<ffffffffa14adab0>] ll_file_io_generic+0x590/0xcb0 [lustre]
[277321.492880]  [<ffffffffa14af028>] ll_file_aio_read+0x2c8/0x3e0 [lustre]
[277321.494200]  [<ffffffffa14af1e4>] ll_file_read+0xa4/0x170 [lustre]
[277321.495481]  [<ffffffff8123612c>] vfs_read+0x9c/0x170
[277321.496696]  [<ffffffff81236fcf>] SyS_read+0x7f/0xf0
[277321.497935]  [<ffffffff817c4d61>] ? system_call_after_swapgs+0xae/0x146
[277321.499244]  [<ffffffff817c4e15>] system_call_fastpath+0x1c/0x21
[277321.500512]  [<ffffffff817c4d61>] ? system_call_after_swapgs+0xae/0x146
0x1b596 is in osc_io_init (/home/green/git/lustre-release/lustre/include/lustre_osc.h:747).
742	
743	static inline struct osc_session *osc_env_session(const struct lu_env *env)
744	{
745		struct osc_session *ses;
746	
747		ses = lu_context_key_get(env->le_ses, &osc_session_key);
748		LASSERT(ses != NULL);
749		return ses;
750	}

So it sounds like env is NULL, but I don't readily see how it could be NULL there.
Also le_ses is at 0x30 in my tree it looks like, not 0x24... though... 0x30-0x24 = 12, so if we assume env is -12 (ENOMEM)...

the check in ll_file_read_iter seem to be all correct, so unless it was substituted somewhere along the path, it's a bit of a mystery.

This this today, so it's definitely an ongoing issue.


Generated at Sat Feb 10 02:48:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.