Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • Lustre 2.2.0

    Description

      We recently experienced two MDS crashes on our Lustre installation.

      I've attached the netconsole output of both crashes (that's all i got: there is nothing in the syslog and i wasn't able to create a screenshot of the console output as the crashed mds was already powercycled by its failover partner).

      Attachments

        1. mds08.txt
          15 kB
        2. mds14.txt
          15 kB
        3. osd_ldiskfs.ko
          4.18 MB
        4. llog.txt
          47 kB

        Activity

          [LU-2323] mds crash
          pjones Peter Jones added a comment -

          ok thanks Adrian!

          pjones Peter Jones added a comment - ok thanks Adrian!

          Hello Peter,

          We will upgrade to 2.3 as soon as the next opportunity arises, you can therefore close this issue.

          Thanks and best regards,
          Adrian

          adrian Adrian Ulrich (Inactive) added a comment - Hello Peter, We will upgrade to 2.3 as soon as the next opportunity arises, you can therefore close this issue. Thanks and best regards, Adrian
          pjones Peter Jones added a comment -

          Adrian

          Have you decided which approach you will take - to patch 2.2 or upgrade to 2.3?

          Peter

          pjones Peter Jones added a comment - Adrian Have you decided which approach you will take - to patch 2.2 or upgrade to 2.3? Peter
          pjones Peter Jones added a comment -

          Adrian a build of the change backported to 2.2 already exists - http://build.whamcloud.com/job/lustre-reviews/10853/ - but is still in the automated test queue at the moment. Lustre 2.3 is available now and has been thoroughly tested. It will of course include other content beyond just this one fix (both additional features and many other fixes)

          pjones Peter Jones added a comment - Adrian a build of the change backported to 2.2 already exists - http://build.whamcloud.com/job/lustre-reviews/10853/ - but is still in the automated test queue at the moment. Lustre 2.3 is available now and has been thoroughly tested. It will of course include other content beyond just this one fix (both additional features and many other fixes)

          Thanks for fixing this issue: We will upgrade our MDS as soon as a new build becomes available – or should we just upgrade to 2.3?

          adrian Adrian Ulrich (Inactive) added a comment - Thanks for fixing this issue: We will upgrade our MDS as soon as a new build becomes available – or should we just upgrade to 2.3?

          backport the memory corruption fix in mdd_declare_attr_set() to b2_2: http://review.whamcloud.com/4703

          niu Niu Yawei (Inactive) added a comment - backport the memory corruption fix in mdd_declare_attr_set() to b2_2: http://review.whamcloud.com/4703

          After checking the 2.2 code carefully, I found a culprit which can cause such memory corruption:

          in mdd_declare_attr_set():

          #ifdef CONFIG_FS_POSIX_ACL
                  if (ma->ma_attr.la_valid & LA_MODE) {
                          mdd_read_lock(env, obj, MOR_TGT_CHILD);
                          rc = mdo_xattr_get(env, obj, buf, XATTR_NAME_ACL_ACCESS,
                                             BYPASS_CAPA);
                          mdd_read_unlock(env, obj);
                          if (rc == -EOPNOTSUPP || rc == -ENODATA)
                                  rc = 0;
                          else if (rc < 0)
                                  return rc;
          

          Our intention here is to retrieve the xattr length, but we passed an uninitialized buffer to mdo_xattr_get() (we should pass NULL here)...
          Actually this bug has been fixed for 2.3 & 2.4 (see http://review.whamcloud.com/#change,3928 & LU-1823), I think we need to backport it to 2.2.

          niu Niu Yawei (Inactive) added a comment - After checking the 2.2 code carefully, I found a culprit which can cause such memory corruption: in mdd_declare_attr_set(): #ifdef CONFIG_FS_POSIX_ACL if (ma->ma_attr.la_valid & LA_MODE) { mdd_read_lock(env, obj, MOR_TGT_CHILD); rc = mdo_xattr_get(env, obj, buf, XATTR_NAME_ACL_ACCESS, BYPASS_CAPA); mdd_read_unlock(env, obj); if (rc == -EOPNOTSUPP || rc == -ENODATA) rc = 0; else if (rc < 0) return rc; Our intention here is to retrieve the xattr length, but we passed an uninitialized buffer to mdo_xattr_get() (we should pass NULL here)... Actually this bug has been fixed for 2.3 & 2.4 (see http://review.whamcloud.com/#change,3928 & LU-1823 ), I think we need to backport it to 2.2.
          bobijam Zhenyu Xu added a comment -

          I think the "LustreError: 31980:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff88042d450240" is a misleading message, the message only means the current log does not has enough space for the log record, it will create a new log for it later.

          int llog_cat_add_rec(struct llog_handle *cathandle, struct llog_rec_hdr *rec,
                               struct llog_cookie *reccookie, void *buf)
          {
                  struct llog_handle *loghandle;
                  int rc;
                  ENTRY;
          
                  LASSERT(rec->lrh_len <= LLOG_CHUNK_SIZE);
                  loghandle = llog_cat_current_log(cathandle, 1);
                  if (IS_ERR(loghandle))
                          RETURN(PTR_ERR(loghandle));
                  /* loghandle is already locked by llog_cat_current_log() for us */
                  rc = llog_write_rec(loghandle, rec, reccookie, 1, buf, -1);
                  if (rc < 0)
                          CERROR("llog_write_rec %d: lh=%p\n", rc, loghandle);
                  cfs_up_write(&loghandle->lgh_lock);
                  if (rc == -ENOSPC) {
                          /* to create a new plain log */
                          loghandle = llog_cat_current_log(cathandle, 1);
                          if (IS_ERR(loghandle))
                                  RETURN(PTR_ERR(loghandle));
                          rc = llog_write_rec(loghandle, rec, reccookie, 1, buf, -1);
                          cfs_up_write(&loghandle->lgh_lock);
                  }
          
                  RETURN(rc);
          }
          
          bobijam Zhenyu Xu added a comment - I think the "LustreError: 31980:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff88042d450240" is a misleading message, the message only means the current log does not has enough space for the log record, it will create a new log for it later. int llog_cat_add_rec(struct llog_handle *cathandle, struct llog_rec_hdr *rec, struct llog_cookie *reccookie, void *buf) { struct llog_handle *loghandle; int rc; ENTRY; LASSERT(rec->lrh_len <= LLOG_CHUNK_SIZE); loghandle = llog_cat_current_log(cathandle, 1); if (IS_ERR(loghandle)) RETURN(PTR_ERR(loghandle)); /* loghandle is already locked by llog_cat_current_log() for us */ rc = llog_write_rec(loghandle, rec, reccookie, 1, buf, -1); if (rc < 0) CERROR( "llog_write_rec %d: lh=%p\n" , rc, loghandle); cfs_up_write(&loghandle->lgh_lock); if (rc == -ENOSPC) { /* to create a new plain log */ loghandle = llog_cat_current_log(cathandle, 1); if (IS_ERR(loghandle)) RETURN(PTR_ERR(loghandle)); rc = llog_write_rec(loghandle, rec, reccookie, 1, buf, -1); cfs_up_write(&loghandle->lgh_lock); } RETURN(rc); }
          bobijam Zhenyu Xu added a comment -

          Yes, even it's 1.8.x client problem we should fix it. The purpose of the question is trying to help to make out which area to find the root cause.

          I'm still investigating the llog part issue.

          bobijam Zhenyu Xu added a comment - Yes, even it's 1.8.x client problem we should fix it. The purpose of the question is trying to help to make out which area to find the root cause. I'm still investigating the llog part issue.

          Well, the problem is that i can not reproduce the crash and i did not see any new crashes since 14. November.

          (The crash was probably caused by an user job: There are about ~800 users on our cluster and i have no way to figure out what job crashed it).

          But in any case: Even if the crash was triggered by an 1.8.x client: It should get fixed, shouldn't it?

          And do we have any news about the llog_write_rec error? (did the debugfs output help?)

          adrian Adrian Ulrich (Inactive) added a comment - Well, the problem is that i can not reproduce the crash and i did not see any new crashes since 14. November. (The crash was probably caused by an user job: There are about ~800 users on our cluster and i have no way to figure out what job crashed it). But in any case: Even if the crash was triggered by an 1.8.x client: It should get fixed, shouldn't it? And do we have any news about the llog_write_rec error? (did the debugfs output help?)

          People

            niu Niu Yawei (Inactive)
            ethz.support ETHz Support (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: