[LU-4098] Client kernel crash due to misconfigured MDT Created: 13/Oct/13  Updated: 24/Feb/14  Resolved: 25/Oct/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1

Type: Bug Priority: Major
Reporter: S. Wendy Cheng (Inactive) Assignee: Dmitry Eremin (Inactive)
Resolution: Fixed Votes: 0
Labels: Xeon-Phi
Environment:

The problem is applied to all platforms but issue was encountered during IOzone test runs with:

  • Client: Intel Xeon Phi MPSS 2.1 (mpss_gold_update_3-2.1.6720-16)
  • Server: RHEL 6.3 with OFED 1.5.4.1

Epic/Theme: Xeon-Phi
Severity: 3
Rank (Obsolete): 11008

 Description   

Ref:
http://lists.lustre.org/pipermail/lustre-devel/2013-October/004275.html

Client kernel crashes using mount command:

Recreated by:
server> mkfs.lustre --reformat --fsname=lus1 --mgs --mdt --index=1 /dev/sdd1
server> mkfs.lustre --reformat --ost --fsname=lus1
--mgsnode=192.168.20.46@o2ib0 --index=1 /dev/sde1

client> mount.lustre -o flock 192.168.20.46@o2ib0:/lus1 /mnt/lustre

Panic occurs at lmv_get_info()

<1>[ 215.946538] BUG: unable to handle kernel NULL pointer
dereference at 0000000000000028
<1>[ 215.946572] IP: [<ffffffffa07445cb>] lmv_get_info+0x32b/0x560 [lmv]
<0>[ 215.947090] Call Trace:^M
<4>[ 215.947143] [<ffffffffa0655b70>] ll_fill_super+0x1f40/0x4330 [lustre]^M
<4>[ 215.947214] [<ffffffffa02cf527>] ?
lustre_start_mgc+0x227/0x2a90 [obdclass]^M
<4>[ 215.947275] [<ffffffffa02d3d60>] lustre_fill_super+0xa20/0x22f0
[obdclass]^M
<4>[ 215.947304] [<ffffffff810de91f>] ? set_anon_super+0x0/0xe0^M
<4>[ 215.947361] [<ffffffffa02d3340>] ? lustre_fill_super+0x0/0x22f0
[obdclass]^M
<4>[ 215.947380] [<ffffffff810df601>] mount_nodev+0x50/0x84^M
<4>[ 215.947437] [<ffffffffa02cc5d9>] lustre_mount+0x29/0x30 [obdclass]^M
<4>[ 215.947454] [<ffffffff810df009>] vfs_kern_mount+0xa8/0x1f3^M
<4>[ 215.947471] [<ffffffff810df1bc>] do_kern_mount+0x4d/0xe1^M
<4>[ 215.947489] [<ffffffff810f54d7>] do_mount+0x67d/0x6d5^M
<4>[ 215.947507] [<ffffffff810f57cc>] sys_mount+0x84/0xbd^M
<4>[ 215.947527] [<ffffffff81002aab>] system_call_fastpath+0x16/0x1b^M



 Comments   
Comment by S. Wendy Cheng (Inactive) [ 13/Oct/13 ]

Tentative patch:

diff --git a/lustre/lmv/lmv_obd.c b/lustre/lmv/lmv_obd.c
index 3091bfb..5f4a18b 100644
— a/lustre/lmv/lmv_obd.c
+++ b/lustre/lmv/lmv_obd.c
@@ -2443,6 +2443,16 @@ static int lmv_get_info(const struct lu_env
*env, struct obd_export *exp,
RETURN(rc);

/*
+ * In the case of mis-configured OSS, instead of crashing
+ * the kernel during client mount, give them a warning and
+ * gracefully back out mount process w/ -ENXIO error.
+ */
+ if (lmv->tgts[0] == NULL)

{ + CDEBUG(D_IOCTL, "NULL target for MDT0\n"); + RETURN(-ENXIO); + }

+
+ /*

  • Forwarding this request to first MDS, it should know LOV
  • desc.
    */
Comment by Peter Jones [ 13/Oct/13 ]

Thanks Wendy. Dmitry, could you please comment?

Comment by Dmitry Eremin (Inactive) [ 14/Oct/13 ]

Thanks, I also found few places without proper checking. I'm preparing the patch set.

Comment by Dmitry Eremin (Inactive) [ 14/Oct/13 ]

Patch set is http://review.whamcloud.com/#/c/7941/

Comment by Bob Glossman (Inactive) [ 21/Feb/14 ]

backport to b2_5:
http://review.whamcloud.com/9347

Generated at Sat Feb 10 01:39:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.