Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1411

panic in lmd_parse() on kernels < 2.6.18

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • IBM BlueGene P
      Linux rzdawndevio33 2.6.16.60-304.3llnl #1 SMP Wed May 9 13:15:00 PDT 2012 ppc ppc ppc GNU/Linux
    • 3
    • 6395

    Description

      We've begun testing the Lustre 2.1 client on our BlueGene P development system.

      It immediately panicked on the first mount attempt. This turns out to be related to a patch landed for LU-163. lustre_get_sb() conditionally calls get_sb_nodev() with a struct lustre_mount_data2 * for kernels 2.6.18 and newer, and a void * data pointer for older kernels:

      2196 #if (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,18))
      2197 struct super_block * lustre_get_sb(struct file_system_type *fs_type, int flags,
      2198                                    const char *devname, void * data)
      2199 {
      2200         return get_sb_nodev(fs_type, flags, data, lustre_fill_super);
      2201 }
      2202 #else
      2203 int lustre_get_sb(struct file_system_type *fs_type, int flags,
      2204                   const char *devname, void * data, struct vfsmount *mnt)
      2205 {
      2206         struct lustre_mount_data2 lmd2 = {data, mnt};
      2207 
      2208         return get_sb_nodev(fs_type, flags, &lmd2, lustre_fill_super, mnt);
      2209 }
      2210 #endif
      

      However, lustre_fill_super() unconditionally casts its void *data argument to a struct lustre_mount_data2 *, which causes a crash when we deference into it on older kernels.

      1945 /** Parse mount line options
      1946  * e.g. mount -v -t lustre -o abort_recov uml1:uml2:/lustre-client /mnt/lustre
      1947  * dev is passed as device=uml1:/lustre by mount.lustre
      1948  */
      1949 static int lmd_parse(char *options, struct lustre_mount_data *lmd)
      1950 {
      1951         char *s1, *s2, *devname = NULL;
      1952         struct lustre_mount_data *raw = (struct lustre_mount_data *)options;
      ...
      1963         /* Options should be a string - try to detect old lmd data */
      1964         if ((raw->lmd_magic & 0xffffff00) == (LMD_MAGIC & 0xffffff00)) { <--- crashes here
      
      ...
      
      2102 int lustre_fill_super(struct super_block *sb, void *data, int silent)
      2103 {
      2104         struct lustre_mount_data *lmd;
      2105         struct lustre_mount_data2 *lmd2 = data;
      ...
      2122 
      2123         /* Figure out the lmd from the mount options */
      2124         if (lmd_parse((char *)(lmd2->lmd2_data), lmd)) {
      

      Attachments

        Activity

          [LU-1411] panic in lmd_parse() on kernels < 2.6.18

          The patch from LU-812 has been landed which fixes this bug. This ticket can be closed now.

          simmonsja James A Simmons added a comment - The patch from LU-812 has been landed which fixes this bug. This ticket can be closed now.
          pjones Peter Jones added a comment -

          Thanks for the tip James - it's always good to get older tickets cleaned up where possible!

          pjones Peter Jones added a comment - Thanks for the tip James - it's always good to get older tickets cleaned up where possible!

          This bug is a duplicate of LU-1646. A patch already exist for this for LU-1646 which was developed under LU-812. The patch is

          http://review.whamcloud.com/#change,3661

          Also NFS with Lustre is broken in older kernels which is addressed with LU-1718. The patch is

          http://review.whamcloud.com/#change,3840

          simmonsja James A Simmons added a comment - This bug is a duplicate of LU-1646 . A patch already exist for this for LU-1646 which was developed under LU-812 . The patch is http://review.whamcloud.com/#change,3661 Also NFS with Lustre is broken in older kernels which is addressed with LU-1718 . The patch is http://review.whamcloud.com/#change,3840

          The plan to put 2.1 on BG/P is only delayed, not abandoned.

          We need to get 2.1 working reasonably well because I do not have the resources to test and bebug 1.8 compatibility with every new 2.X release. The BG/P systems will be around for at least two more years. And the 1.8 to 2.1 compatibility is not good enough to last that long. We'll continually have new breakage as servers go beyond 2.1.

          morrone Christopher Morrone (Inactive) added a comment - The plan to put 2.1 on BG/P is only delayed, not abandoned. We need to get 2.1 working reasonably well because I do not have the resources to test and bebug 1.8 compatibility with every new 2.X release. The BG/P systems will be around for at least two more years. And the 1.8 to 2.1 compatibility is not good enough to last that long. We'll continually have new breakage as servers go beyond 2.1.
          pjones Peter Jones added a comment -

          Ned

          Is my understanding correct that you have abandoned this approach and are now using 18x clients for your systems running older kernels?

          Peter

          pjones Peter Jones added a comment - Ned Is my understanding correct that you have abandoned this approach and are now using 18x clients for your systems running older kernels? Peter
          bobijam Zhenyu Xu added a comment -

          master branch port at http://review.whamcloud.com/2820

          bobijam Zhenyu Xu added a comment - master branch port at http://review.whamcloud.com/2820

          Hi Peter,

          Yes, we originally planned to run Lustre 1.8 clients on our BGP systems until they retire. But then our BGP users ran into some bugs, i.e. LU-1378. We have a strong incentive to discontinue maintenance of our 1.8 branch, so we have started some exploratory testing of the 2.1 client on BGP which runs Linux 2.6.16 on the IO nodes.

          nedbass Ned Bass (Inactive) added a comment - Hi Peter, Yes, we originally planned to run Lustre 1.8 clients on our BGP systems until they retire. But then our BGP users ran into some bugs, i.e. LU-1378 . We have a strong incentive to discontinue maintenance of our 1.8 branch, so we have started some exploratory testing of the 2.1 client on BGP which runs Linux 2.6.16 on the IO nodes.
          pjones Peter Jones added a comment -

          Bobijam

          Could you please comment on the validity of this suggested approach?

          Ned

          Wouldn't it be safer to continue to run 1.8.x clients that support this older kernel version?

          Peter

          pjones Peter Jones added a comment - Bobijam Could you please comment on the validity of this suggested approach? Ned Wouldn't it be safer to continue to run 1.8.x clients that support this older kernel version? Peter

          Here is a proof-of-concept patch that at least lets me build and mount on 2.6.16 and 2.6.32 kernels. I'm not sure if passing a null vfsmount pointer to ll_fill_super() is the right thing to do on older kernels.

          http://review.whamcloud.com/2795

          nedbass Ned Bass (Inactive) added a comment - Here is a proof-of-concept patch that at least lets me build and mount on 2.6.16 and 2.6.32 kernels. I'm not sure if passing a null vfsmount pointer to ll_fill_super() is the right thing to do on older kernels. http://review.whamcloud.com/2795

          People

            bobijam Zhenyu Xu
            nedbass Ned Bass (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: