[LU-7095] BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8128a242>] strlen+0x2/0x30 Oops: 0000 [#1] SMP Created: 03/Sep/15 Updated: 10/Oct/21 Resolved: 10/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Ashish Purkar (Inactive) | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Version : 2.7.57, 10_2.6.32_431.17.1.x2.0.61.x86_64_g037759b Build Date: Sat Aug 15 19:36:47 2015 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Lustre version : After running sanityn test10b, client crashed as - Aug 18 04:49:24 windu07 dcs-collectord[10608]: INFO Finished data collection (successful) for 'dmreport'. Will poll in 150 sec. (dm_report_collection_process.py:164) Aug 18 04:50:03 windu07 kernel: BUG: unable to handle kernel NULL pointer dereference at (null) Aug 18 04:50:03 windu07 kernel: IP: [<ffffffff8128a242>] strlen+0x2/0x30 Aug 18 04:50:03 windu07 kernel: PGD 7d1618067 PUD 7a97bf067 PMD 0 Aug 18 04:50:03 windu07 kernel: Oops: 0000 [#1] SMP Aug 18 04:50:03 windu07 kernel: last sysfs file: /sys/devices/system/cpu/online Aug 18 04:50:03 windu07 kernel: CPU 11 Aug 18 04:50:03 windu07 kernel: Modules linked in: osc(U) mgc(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic crc32 c_intel libcfs(U) ib_ipoib(U) rdma_ucm(U) ib_ucm(U) ib_uverbs(U) ib_umad(U) rdma_cm(U) ib_cm(U) iw_cm(U) mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) nf_conntrack_ipv4 nf_defrag_ip v4 xt_state iptable_filter xt_NOTRACK nf_conntrack xt_multiport iptable_raw ip_tables ipmi_devintf acpi_cpufreq freq_table mperf dm_mod wmi iTCO_wdt iTCO_vendor_support isci libsas scsi_t ransport_sas sb_edac edac_core i2c_i801 ahci lpc_ich mfd_core shpchp nfs lockd fscache auth_rpcgss nfs_acl sunrpc igb dca i2c_algo_bit i2c_core mlx4_en(U) ptp pps_core mlx4_core(U) compat (U) bonding ipv6 8021q garp stp llc [last unloaded: ib_core] Aug 18 04:50:03 windu07 kernel: Aug 18 04:50:03 windu07 kernel: Pid: 47740, comm: mount.lustre Tainted: G W --------------- 2.6.32-431.17.1.x2.0.61.x86_64 #1 Intel Corporation S2600JF/S2600JF Aug 18 04:50:03 windu07 kernel: RIP: 0010:[<ffffffff8128a242>] [<ffffffff8128a242>] strlen+0x2/0x30 Aug 18 04:50:03 windu07 kernel: RSP: 0018:ffff8807cd423d00 EFLAGS: 00010246 Aug 18 04:50:03 windu07 kernel: RAX: 0000000000000000 RBX: ffff8807b36f8c40 RCX: 00000000f0000544 Aug 18 04:50:03 windu07 kernel: RDX: ffffbf07b0c21de0 RSI: ffff880811229959 RDI: 0000000000000000 Aug 18 04:50:03 windu07 kernel: RBP: ffff8807cd423d68 R08: ffff880837f0b0f8 R09: 0000000000000180 Aug 18 04:50:03 windu07 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8807e735a600 Aug 18 04:50:03 windu07 kernel: R13: 0000000000000000 R14: ffff8807b0c213e0 R15: ffff8807b0c21de0 Aug 18 04:50:03 windu07 kernel: FS: 00007fa4c86737a0(0000) GS:ffff880044760000(0000) knlGS:0000000000000000 Aug 18 04:50:03 windu07 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Aug 18 04:50:03 windu07 kernel: CR2: 0000000000000000 CR3: 000000082fbde000 CR4: 00000000000407e0 Aug 18 04:50:03 windu07 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Aug 18 04:50:03 windu07 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Aug 18 04:50:03 windu07 kernel: Process mount.lustre (pid: 47740, threadinfo ffff8807cd422000, task ffff88082e7be040) Aug 18 04:50:03 windu07 kernel: Stack: Aug 18 04:50:03 windu07 kernel: ffffffffa0b2a86c ffff8807cd423d00 ffff88081137a1c0 ffff8807d25da520 Aug 18 04:50:03 windu07 kernel: <d> ffff8807da61dc00 ffff8807ad9efc00 ffff8807d5c8c000 ffff8807cd423d98 Aug 18 04:50:03 windu07 kernel: <d> 0000000000000000 ffff8807da61dc00 ffff8807cd423de8 ffff880811608b40 Aug 18 04:50:03 windu07 kernel: Call Trace: Aug 18 04:50:03 windu07 kernel: [<ffffffffa0b2a86c>] ? ll_fill_super+0x127c/0x16a0 [lustre] Aug 18 04:50:03 windu07 kernel: [<ffffffffa0543dad>] lustre_fill_super+0x61d/0x990 [obdclass] Aug 18 04:50:03 windu07 kernel: [<ffffffffa0543790>] ? lustre_fill_super+0x0/0x990 [obdclass] Aug 18 04:50:03 windu07 kernel: [<ffffffff8118c2af>] get_sb_nodev+0x5f/0xa0 Aug 18 04:50:03 windu07 kernel: [<ffffffffa053bf95>] lustre_get_sb+0x25/0x30 [obdclass] Aug 18 04:50:03 windu07 kernel: [<ffffffff8118b90b>] vfs_kern_mount+0x7b/0x1b0 Aug 18 04:50:03 windu07 kernel: [<ffffffff8118bab2>] do_kern_mount+0x52/0x130 Aug 18 04:50:03 windu07 kernel: [<ffffffff811aca8b>] do_mount+0x2fb/0x930 Aug 18 04:50:03 windu07 kernel: [<ffffffff811ad150>] sys_mount+0x90/0xe0 Aug 18 04:50:03 windu07 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Aug 18 04:50:03 windu07 kernel: Code: 01 00 0f b6 10 f6 82 20 2a af 81 20 74 13 0f 1f 00 48 83 c0 01 0f b6 10 f6 82 20 2a af 81 20 75 f0 c9 c3 66 0f 1f 44 00 00 31 c0 <80> 3f 00 55 48 89 fa 48 89 e5 74 11 66 90 48 83 c2 01 80 3a 00 |
| Comments |
| Comment by Andreas Dilger [ 04/Sep/15 ] |
|
We need more information about what you are doing here. The failure is right at the start of mount, and my guess is that you are passing a too-long argument for mount options, or have some string that is not NUL-terminated. What line of code does the crash resolve to? gdb lustre.ko gdb> list *(ll_fill_super+0x127c) What arguments are you using for mount? Any extra arguments passed using mkfs.lustre --mountfsoptions? |
| Comment by Ashish Purkar [ 07/Sep/15 ] |
|
Unfortunately, we don't have crashdump available. We are working on to have the same. There was similar bug solved on lustre-dev (ORI) but not ported to release repo. |
| Comment by Ashish Purkar [ 08/Sep/15 ] |
|
Changes required to fix the issue - /* Generate data for registration */
static int server_lsi2mti(struct lustre_sb_info *lsi,
struct mgs_target_info *mti)
{
.
.
.
if (lsi->lsi_lmd->lmd_opts != NULL) {
/*
* Verify all the ->lmd_opts can be stored as ->mti_params and
* translate them from space to comma delimited for
* compatibility. No effort is made to strip duplicate
* characters, however this is not harmful.
*/
if (strlen(lsi->lsi_lmd->lmd_opts) >= sizeof(mti->mti_params))
return -EINVAL;
strcpy(mti->mti_params, lsi->lsi_lmd->lmd_opts);
while ((s = strchr(mti->mti_params, ',')) != NULL)
*s = ' ';
}
return 0;
}
|