Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4098

Client kernel crash due to misconfigured MDT

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.6.0, Lustre 2.5.1
    • Lustre 2.5.0
    • The problem is applied to all platforms but issue was encountered during IOzone test runs with:

      * Client: Intel Xeon Phi MPSS 2.1 (mpss_gold_update_3-2.1.6720-16)
      * Server: RHEL 6.3 with OFED 1.5.4.1
    • 3
    • 11008

    Description

      Ref:
      http://lists.lustre.org/pipermail/lustre-devel/2013-October/004275.html

      Client kernel crashes using mount command:

      Recreated by:
      server> mkfs.lustre --reformat --fsname=lus1 --mgs --mdt --index=1 /dev/sdd1
      server> mkfs.lustre --reformat --ost --fsname=lus1
      --mgsnode=192.168.20.46@o2ib0 --index=1 /dev/sde1

      client> mount.lustre -o flock 192.168.20.46@o2ib0:/lus1 /mnt/lustre

      Panic occurs at lmv_get_info()

      <1>[ 215.946538] BUG: unable to handle kernel NULL pointer
      dereference at 0000000000000028
      <1>[ 215.946572] IP: [<ffffffffa07445cb>] lmv_get_info+0x32b/0x560 [lmv]
      <0>[ 215.947090] Call Trace:^M
      <4>[ 215.947143] [<ffffffffa0655b70>] ll_fill_super+0x1f40/0x4330 [lustre]^M
      <4>[ 215.947214] [<ffffffffa02cf527>] ?
      lustre_start_mgc+0x227/0x2a90 [obdclass]^M
      <4>[ 215.947275] [<ffffffffa02d3d60>] lustre_fill_super+0xa20/0x22f0
      [obdclass]^M
      <4>[ 215.947304] [<ffffffff810de91f>] ? set_anon_super+0x0/0xe0^M
      <4>[ 215.947361] [<ffffffffa02d3340>] ? lustre_fill_super+0x0/0x22f0
      [obdclass]^M
      <4>[ 215.947380] [<ffffffff810df601>] mount_nodev+0x50/0x84^M
      <4>[ 215.947437] [<ffffffffa02cc5d9>] lustre_mount+0x29/0x30 [obdclass]^M
      <4>[ 215.947454] [<ffffffff810df009>] vfs_kern_mount+0xa8/0x1f3^M
      <4>[ 215.947471] [<ffffffff810df1bc>] do_kern_mount+0x4d/0xe1^M
      <4>[ 215.947489] [<ffffffff810f54d7>] do_mount+0x67d/0x6d5^M
      <4>[ 215.947507] [<ffffffff810f57cc>] sys_mount+0x84/0xbd^M
      <4>[ 215.947527] [<ffffffff81002aab>] system_call_fastpath+0x16/0x1b^M

      Attachments

        Activity

          [LU-4098] Client kernel crash due to misconfigured MDT
          bogl Bob Glossman (Inactive) added a comment - backport to b2_5: http://review.whamcloud.com/9347
          dmiter Dmitry Eremin (Inactive) added a comment - Patch set is http://review.whamcloud.com/#/c/7941/

          Thanks, I also found few places without proper checking. I'm preparing the patch set.

          dmiter Dmitry Eremin (Inactive) added a comment - Thanks, I also found few places without proper checking. I'm preparing the patch set.
          pjones Peter Jones added a comment -

          Thanks Wendy. Dmitry, could you please comment?

          pjones Peter Jones added a comment - Thanks Wendy. Dmitry, could you please comment?
          wendyc S. Wendy Cheng (Inactive) added a comment - - edited

          Tentative patch:

          diff --git a/lustre/lmv/lmv_obd.c b/lustre/lmv/lmv_obd.c
          index 3091bfb..5f4a18b 100644
          — a/lustre/lmv/lmv_obd.c
          +++ b/lustre/lmv/lmv_obd.c
          @@ -2443,6 +2443,16 @@ static int lmv_get_info(const struct lu_env
          *env, struct obd_export *exp,
          RETURN(rc);

          /*
          + * In the case of mis-configured OSS, instead of crashing
          + * the kernel during client mount, give them a warning and
          + * gracefully back out mount process w/ -ENXIO error.
          + */
          + if (lmv->tgts[0] == NULL)

          { + CDEBUG(D_IOCTL, "NULL target for MDT0\n"); + RETURN(-ENXIO); + }

          +
          + /*

          • Forwarding this request to first MDS, it should know LOV
          • desc.
            */
          wendyc S. Wendy Cheng (Inactive) added a comment - - edited Tentative patch: diff --git a/lustre/lmv/lmv_obd.c b/lustre/lmv/lmv_obd.c index 3091bfb..5f4a18b 100644 — a/lustre/lmv/lmv_obd.c +++ b/lustre/lmv/lmv_obd.c @@ -2443,6 +2443,16 @@ static int lmv_get_info(const struct lu_env *env, struct obd_export *exp, RETURN(rc); /* + * In the case of mis-configured OSS, instead of crashing + * the kernel during client mount, give them a warning and + * gracefully back out mount process w/ -ENXIO error. + */ + if (lmv->tgts [0] == NULL) { + CDEBUG(D_IOCTL, "NULL target for MDT0\n"); + RETURN(-ENXIO); + } + + /* Forwarding this request to first MDS, it should know LOV desc. */

          People

            dmiter Dmitry Eremin (Inactive)
            wendyc S. Wendy Cheng (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: