[LU-7095] BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8128a242>] strlen+0x2/0x30 Oops: 0000 [#1] SMP Created: 03/Sep/15  Updated: 10/Oct/21  Resolved: 10/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Ashish Purkar (Inactive) Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None
Environment:

Lustre Version : 2.7.57, 10_2.6.32_431.17.1.x2.0.61.x86_64_g037759b Build Date: Sat Aug 15 19:36:47 2015


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Lustre version :
Version : 2.7.57 Vendor: (none)
Release : 10_2.6.32_431.17.1.x2.0.61.x86_64_g037759b Build Date: Sat Aug 15 19:36:47 2015

After running sanityn test10b, client crashed as -

Aug 18 04:49:24 windu07 dcs-collectord[10608]: INFO Finished data collection (successful) for 'dmreport'. Will poll in 150 sec. (dm_report_collection_process.py:164)
Aug 18 04:50:03 windu07 kernel: BUG: unable to handle kernel NULL pointer dereference at (null)
Aug 18 04:50:03 windu07 kernel: IP: [<ffffffff8128a242>] strlen+0x2/0x30
Aug 18 04:50:03 windu07 kernel: PGD 7d1618067 PUD 7a97bf067 PMD 0 
Aug 18 04:50:03 windu07 kernel: Oops: 0000 [#1] SMP 
Aug 18 04:50:03 windu07 kernel: last sysfs file: /sys/devices/system/cpu/online
Aug 18 04:50:03 windu07 kernel: CPU 11 
Aug 18 04:50:03 windu07 kernel: Modules linked in: osc(U) mgc(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic crc32
c_intel libcfs(U) ib_ipoib(U) rdma_ucm(U) ib_ucm(U) ib_uverbs(U) ib_umad(U) rdma_cm(U) ib_cm(U) iw_cm(U) mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) nf_conntrack_ipv4 nf_defrag_ip
v4 xt_state iptable_filter xt_NOTRACK nf_conntrack xt_multiport iptable_raw ip_tables ipmi_devintf acpi_cpufreq freq_table mperf dm_mod wmi iTCO_wdt iTCO_vendor_support isci libsas scsi_t
ransport_sas sb_edac edac_core i2c_i801 ahci lpc_ich mfd_core shpchp nfs lockd fscache auth_rpcgss nfs_acl sunrpc igb dca i2c_algo_bit i2c_core mlx4_en(U) ptp pps_core mlx4_core(U) compat
(U) bonding ipv6 8021q garp stp llc [last unloaded: ib_core]
Aug 18 04:50:03 windu07 kernel: 
Aug 18 04:50:03 windu07 kernel: Pid: 47740, comm: mount.lustre Tainted: G        W  ---------------    2.6.32-431.17.1.x2.0.61.x86_64 #1 Intel Corporation S2600JF/S2600JF
Aug 18 04:50:03 windu07 kernel: RIP: 0010:[<ffffffff8128a242>]  [<ffffffff8128a242>] strlen+0x2/0x30
Aug 18 04:50:03 windu07 kernel: RSP: 0018:ffff8807cd423d00  EFLAGS: 00010246
Aug 18 04:50:03 windu07 kernel: RAX: 0000000000000000 RBX: ffff8807b36f8c40 RCX: 00000000f0000544
Aug 18 04:50:03 windu07 kernel: RDX: ffffbf07b0c21de0 RSI: ffff880811229959 RDI: 0000000000000000
Aug 18 04:50:03 windu07 kernel: RBP: ffff8807cd423d68 R08: ffff880837f0b0f8 R09: 0000000000000180
Aug 18 04:50:03 windu07 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8807e735a600
Aug 18 04:50:03 windu07 kernel: R13: 0000000000000000 R14: ffff8807b0c213e0 R15: ffff8807b0c21de0
Aug 18 04:50:03 windu07 kernel: FS:  00007fa4c86737a0(0000) GS:ffff880044760000(0000) knlGS:0000000000000000
Aug 18 04:50:03 windu07 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 18 04:50:03 windu07 kernel: CR2: 0000000000000000 CR3: 000000082fbde000 CR4: 00000000000407e0
Aug 18 04:50:03 windu07 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 18 04:50:03 windu07 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 18 04:50:03 windu07 kernel: Process mount.lustre (pid: 47740, threadinfo ffff8807cd422000, task ffff88082e7be040)
Aug 18 04:50:03 windu07 kernel: Stack:
Aug 18 04:50:03 windu07 kernel:  ffffffffa0b2a86c ffff8807cd423d00 ffff88081137a1c0 ffff8807d25da520
Aug 18 04:50:03 windu07 kernel: <d> ffff8807da61dc00 ffff8807ad9efc00 ffff8807d5c8c000 ffff8807cd423d98
Aug 18 04:50:03 windu07 kernel: <d> 0000000000000000 ffff8807da61dc00 ffff8807cd423de8 ffff880811608b40
Aug 18 04:50:03 windu07 kernel: Call Trace:
Aug 18 04:50:03 windu07 kernel:  [<ffffffffa0b2a86c>] ? ll_fill_super+0x127c/0x16a0 [lustre]
Aug 18 04:50:03 windu07 kernel:  [<ffffffffa0543dad>] lustre_fill_super+0x61d/0x990 [obdclass]
Aug 18 04:50:03 windu07 kernel:  [<ffffffffa0543790>] ? lustre_fill_super+0x0/0x990 [obdclass]
Aug 18 04:50:03 windu07 kernel:  [<ffffffff8118c2af>] get_sb_nodev+0x5f/0xa0
Aug 18 04:50:03 windu07 kernel:  [<ffffffffa053bf95>] lustre_get_sb+0x25/0x30 [obdclass]
Aug 18 04:50:03 windu07 kernel:  [<ffffffff8118b90b>] vfs_kern_mount+0x7b/0x1b0
Aug 18 04:50:03 windu07 kernel:  [<ffffffff8118bab2>] do_kern_mount+0x52/0x130
Aug 18 04:50:03 windu07 kernel:  [<ffffffff811aca8b>] do_mount+0x2fb/0x930
Aug 18 04:50:03 windu07 kernel:  [<ffffffff811ad150>] sys_mount+0x90/0xe0
Aug 18 04:50:03 windu07 kernel:  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Aug 18 04:50:03 windu07 kernel: Code: 01 00 0f b6 10 f6 82 20 2a af 81 20 74 13 0f 1f 00 48 83 c0 01 0f b6 10 f6 82 20 2a af 81 20 75 f0 c9 c3 66 0f 1f 44 00 00 31 c0 <80> 3f 00 55 48 89 fa 48 89 e5 74 11 66 90 48 83 c2 01 80 3a 00 


 Comments   
Comment by Andreas Dilger [ 04/Sep/15 ]

We need more information about what you are doing here. The failure is right at the start of mount, and my guess is that you are passing a too-long argument for mount options, or have some string that is not NUL-terminated. What line of code does the crash resolve to?

gdb lustre.ko
gdb> list *(ll_fill_super+0x127c)

What arguments are you using for mount? Any extra arguments passed using mkfs.lustre --mountfsoptions?

Comment by Ashish Purkar [ 07/Sep/15 ]

Unfortunately, we don't have crashdump available. We are working on to have the same. There was similar bug solved on lustre-dev (ORI) but not ported to release repo.

Comment by Ashish Purkar [ 08/Sep/15 ]

Changes required to fix the issue -
http://review.whamcloud.com/#/c/3767/1/lustre/obdclass/obd_mount.c

/* Generate data for registration */
static int server_lsi2mti(struct lustre_sb_info *lsi,
			  struct mgs_target_info *mti)
{
.
.
.
	if (lsi->lsi_lmd->lmd_opts != NULL) {
		/*
		 * Verify all the ->lmd_opts can be stored as ->mti_params and
		 * translate them from space to comma delimited for
		 * compatibility.  No effort is made to strip duplicate
		 * characters, however this is not harmful.
		 */
		if (strlen(lsi->lsi_lmd->lmd_opts) >= sizeof(mti->mti_params))
			return -EINVAL;

		strcpy(mti->mti_params, lsi->lsi_lmd->lmd_opts);
		while ((s = strchr(mti->mti_params, ',')) != NULL)
			*s = ' ';
	}

	return 0;
}
Generated at Sat Feb 10 02:05:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.