[LU-1411] panic in lmd_parse() on kernels < 2.6.18 Created: 15/May/12  Updated: 05/Dec/14  Resolved: 17/Oct/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Ned Bass Assignee: Zhenyu Xu
Resolution: Duplicate Votes: 0
Labels: bgp, client, ppc
Environment:

IBM BlueGene P
Linux rzdawndevio33 2.6.16.60-304.3llnl #1 SMP Wed May 9 13:15:00 PDT 2012 ppc ppc ppc GNU/Linux


Severity: 3
Rank (Obsolete): 6395

 Description   

We've begun testing the Lustre 2.1 client on our BlueGene P development system.

It immediately panicked on the first mount attempt. This turns out to be related to a patch landed for LU-163. lustre_get_sb() conditionally calls get_sb_nodev() with a struct lustre_mount_data2 * for kernels 2.6.18 and newer, and a void * data pointer for older kernels:

2196 #if (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,18))
2197 struct super_block * lustre_get_sb(struct file_system_type *fs_type, int flags,
2198                                    const char *devname, void * data)
2199 {
2200         return get_sb_nodev(fs_type, flags, data, lustre_fill_super);
2201 }
2202 #else
2203 int lustre_get_sb(struct file_system_type *fs_type, int flags,
2204                   const char *devname, void * data, struct vfsmount *mnt)
2205 {
2206         struct lustre_mount_data2 lmd2 = {data, mnt};
2207 
2208         return get_sb_nodev(fs_type, flags, &lmd2, lustre_fill_super, mnt);
2209 }
2210 #endif

However, lustre_fill_super() unconditionally casts its void *data argument to a struct lustre_mount_data2 *, which causes a crash when we deference into it on older kernels.

1945 /** Parse mount line options
1946  * e.g. mount -v -t lustre -o abort_recov uml1:uml2:/lustre-client /mnt/lustre
1947  * dev is passed as device=uml1:/lustre by mount.lustre
1948  */
1949 static int lmd_parse(char *options, struct lustre_mount_data *lmd)
1950 {
1951         char *s1, *s2, *devname = NULL;
1952         struct lustre_mount_data *raw = (struct lustre_mount_data *)options;
...
1963         /* Options should be a string - try to detect old lmd data */
1964         if ((raw->lmd_magic & 0xffffff00) == (LMD_MAGIC & 0xffffff00)) { <--- crashes here

...

2102 int lustre_fill_super(struct super_block *sb, void *data, int silent)
2103 {
2104         struct lustre_mount_data *lmd;
2105         struct lustre_mount_data2 *lmd2 = data;
...
2122 
2123         /* Figure out the lmd from the mount options */
2124         if (lmd_parse((char *)(lmd2->lmd2_data), lmd)) {


 Comments   
Comment by Ned Bass [ 15/May/12 ]

Here is a proof-of-concept patch that at least lets me build and mount on 2.6.16 and 2.6.32 kernels. I'm not sure if passing a null vfsmount pointer to ll_fill_super() is the right thing to do on older kernels.

http://review.whamcloud.com/2795

Comment by Peter Jones [ 16/May/12 ]

Bobijam

Could you please comment on the validity of this suggested approach?

Ned

Wouldn't it be safer to continue to run 1.8.x clients that support this older kernel version?

Peter

Comment by Ned Bass [ 16/May/12 ]

Hi Peter,

Yes, we originally planned to run Lustre 1.8 clients on our BGP systems until they retire. But then our BGP users ran into some bugs, i.e. LU-1378. We have a strong incentive to discontinue maintenance of our 1.8 branch, so we have started some exploratory testing of the 2.1 client on BGP which runs Linux 2.6.16 on the IO nodes.

Comment by Zhenyu Xu [ 16/May/12 ]

master branch port at http://review.whamcloud.com/2820

Comment by Peter Jones [ 02/Jun/12 ]

Ned

Is my understanding correct that you have abandoned this approach and are now using 18x clients for your systems running older kernels?

Peter

Comment by Christopher Morrone [ 04/Jun/12 ]

The plan to put 2.1 on BG/P is only delayed, not abandoned.

We need to get 2.1 working reasonably well because I do not have the resources to test and bebug 1.8 compatibility with every new 2.X release. The BG/P systems will be around for at least two more years. And the 1.8 to 2.1 compatibility is not good enough to last that long. We'll continually have new breakage as servers go beyond 2.1.

Comment by James A Simmons [ 17/Oct/12 ]

This bug is a duplicate of LU-1646. A patch already exist for this for LU-1646 which was developed under LU-812. The patch is

http://review.whamcloud.com/#change,3661

Also NFS with Lustre is broken in older kernels which is addressed with LU-1718. The patch is

http://review.whamcloud.com/#change,3840

Comment by Peter Jones [ 17/Oct/12 ]

Thanks for the tip James - it's always good to get older tickets cleaned up where possible!

Comment by James A Simmons [ 12/Nov/12 ]

The patch from LU-812 has been landed which fixes this bug. This ticket can be closed now.

Generated at Sat Feb 10 01:16:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.