Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19962

don't use MGC with non-initialized export

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Medium Medium
    • Lustre 2.18.0
    • None
    • None
    • 3
    • 9223372036854775807

      lustre_start_mgc() may leave mgc obd with non-initialized cl_mgc_mgsexp if connect to MGS has failed.

          rc = obd_connect(NULL, &exp, obd, uuid, data, NULL);
          if (rc) {
              CERROR("connect failed %d\n", rc);
              GOTO(out, rc);
          }
      ...
          obd->u.cli.cl_mgc_mgsexp = exp; <-- skipped if rc != 0
      out:
          /*
           * Keep the MGC info in the sb. Note that many lsi's can point
           * to the same mgc.
           */
          lsi->lsi_mgc = obd; <- while OBD is kept in lsi
      out_free:
          mutex_unlock(&mgc_start_lock);

      The mount process will exit with error but any other parallel mount waiting on mgc_start_lock will get access to that MGC OBD if it is not yet cleaned up:

          obd = class_name2obd(mgcname);
          if (obd && !obd->obd_stopping) {
              ...
              /* Re-using an existing MGC */
              atomic_inc(&obd->u.cli.cl_mgc_refcount);

      and exit from own call to lustre_start_mgc() without error and without export in MGC, so will continue with mount. This can cause crashes like below:

      [  184.387310] BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
      ...
      [  184.393875] RIP: 0010:server_lsi2mti+0x118/0x7b0 [obdclass]

      when code is accessing cl_mgc_mgsexp assuming it is always set.

            tappro Mikhail Pershin
            tappro Mikhail Pershin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: