[LU-2323] mds crash - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.2.0
Labels:
- server
Environment:

Hide
[root@n-mds1 ~]# cat /proc/fs/lustre/version
lustre: 2.2.0
kernel: patchless_client
build: 2.2.0-RC2--PRISTINE-2.6.32-220.4.2.el6_lustre.x86_64

[root@n-mds1 ~]# uname -r
2.6.32-220.4.2.el6_lustre.x86_64

[root@n-mds1 ~]# rpm -qa|grep lustre
lustre-ldiskfs-3.3.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
lustre-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
kernel-firmware-2.6.32-220.4.2.el6_lustre.x86_64
lustre-modules-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
kernel-headers-2.6.32-220.4.2.el6_lustre.x86_64
kernel-2.6.32-220.4.2.el6_lustre.x86_64
kernel-devel-2.6.32-220.4.2.el6_lustre.x86_64

Show
[ root@n-mds1 ~]# cat /proc/fs/lustre/version lustre: 2.2.0 kernel: patchless_client build: 2.2.0-RC2--PRISTINE-2.6.32-220.4.2.el6_lustre.x86_64 [ root@n-mds1 ~]# uname -r 2.6.32-220.4.2.el6_lustre.x86_64 [ root@n-mds1 ~]# rpm -qa|grep lustre lustre-ldiskfs-3.3.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64 lustre-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64 kernel-firmware-2.6.32-220.4.2.el6_lustre.x86_64 lustre-modules-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64 kernel-headers-2.6.32-220.4.2.el6_lustre.x86_64 kernel-2.6.32-220.4.2.el6_lustre.x86_64 kernel-devel-2.6.32-220.4.2.el6_lustre.x86_64

Severity:
1
Epic:
- metadata
- server
Rank (Obsolete):
5550

Description

We recently experienced two MDS crashes on our Lustre installation.

I've attached the netconsole output of both crashes (that's all i got: there is nothing in the syslog and i wasn't able to create a screenshot of the console output as the crashed mds was already powercycled by its failover partner).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mds08.txt
15 kB
14/Nov/12 4:47 AM
mds14.txt
15 kB
14/Nov/12 4:47 AM
osd_ldiskfs.ko
4.18 MB
15/Nov/12 12:30 PM
llog.txt
47 kB
16/Nov/12 7:32 AM

Activity

[LU-2323] mds crash

Peter Jones added a comment - 06/Dec/12 9:24 AM

ok thanks Adrian!

Peter Jones added a comment - 06/Dec/12 9:24 AM ok thanks Adrian!

Adrian Ulrich (Inactive) added a comment - 06/Dec/12 9:22 AM

Hello Peter,

We will upgrade to 2.3 as soon as the next opportunity arises, you can therefore close this issue.

Thanks and best regards,
Adrian

Adrian Ulrich (Inactive) added a comment - 06/Dec/12 9:22 AM Hello Peter, We will upgrade to 2.3 as soon as the next opportunity arises, you can therefore close this issue. Thanks and best regards, Adrian

Peter Jones added a comment - 06/Dec/12 8:54 AM

Adrian

Have you decided which approach you will take - to patch 2.2 or upgrade to 2.3?

Peter

Peter Jones added a comment - 06/Dec/12 8:54 AM Adrian Have you decided which approach you will take - to patch 2.2 or upgrade to 2.3? Peter

Peter Jones added a comment - 30/Nov/12 8:45 AM

Adrian a build of the change backported to 2.2 already exists - http://build.whamcloud.com/job/lustre-reviews/10853/ - but is still in the automated test queue at the moment. Lustre 2.3 is available now and has been thoroughly tested. It will of course include other content beyond just this one fix (both additional features and many other fixes)

Peter Jones added a comment - 30/Nov/12 8:45 AM Adrian a build of the change backported to 2.2 already exists - http://build.whamcloud.com/job/lustre-reviews/10853/ - but is still in the automated test queue at the moment. Lustre 2.3 is available now and has been thoroughly tested. It will of course include other content beyond just this one fix (both additional features and many other fixes)

Adrian Ulrich (Inactive) added a comment - 30/Nov/12 7:51 AM

Thanks for fixing this issue: We will upgrade our MDS as soon as a new build becomes available – or should we just upgrade to 2.3?

Adrian Ulrich (Inactive) added a comment - 30/Nov/12 7:51 AM Thanks for fixing this issue: We will upgrade our MDS as soon as a new build becomes available – or should we just upgrade to 2.3?

Niu Yawei (Inactive) added a comment - 29/Nov/12 2:47 AM

backport the memory corruption fix in mdd_declare_attr_set() to b2_2: http://review.whamcloud.com/4703

Niu Yawei (Inactive) added a comment - 29/Nov/12 2:47 AM backport the memory corruption fix in mdd_declare_attr_set() to b2_2: http://review.whamcloud.com/4703

Niu Yawei (Inactive) added a comment - 29/Nov/12 2:22 AM

After checking the 2.2 code carefully, I found a culprit which can cause such memory corruption:

in mdd_declare_attr_set():

#ifdef CONFIG_FS_POSIX_ACL
        if (ma->ma_attr.la_valid & LA_MODE) {
                mdd_read_lock(env, obj, MOR_TGT_CHILD);
                rc = mdo_xattr_get(env, obj, buf, XATTR_NAME_ACL_ACCESS,
                                   BYPASS_CAPA);
                mdd_read_unlock(env, obj);
                if (rc == -EOPNOTSUPP || rc == -ENODATA)
                        rc = 0;
                else if (rc < 0)
                        return rc;

Our intention here is to retrieve the xattr length, but we passed an uninitialized buffer to mdo_xattr_get() (we should pass NULL here)...
Actually this bug has been fixed for 2.3 & 2.4 (see http://review.whamcloud.com/#change,3928 & ~~LU-1823~~), I think we need to backport it to 2.2.

Niu Yawei (Inactive) added a comment - 29/Nov/12 2:22 AM After checking the 2.2 code carefully, I found a culprit which can cause such memory corruption: in mdd_declare_attr_set(): #ifdef CONFIG_FS_POSIX_ACL if (ma->ma_attr.la_valid & LA_MODE) { mdd_read_lock(env, obj, MOR_TGT_CHILD); rc = mdo_xattr_get(env, obj, buf, XATTR_NAME_ACL_ACCESS, BYPASS_CAPA); mdd_read_unlock(env, obj); if (rc == -EOPNOTSUPP || rc == -ENODATA) rc = 0; else if (rc < 0) return rc; Our intention here is to retrieve the xattr length, but we passed an uninitialized buffer to mdo_xattr_get() (we should pass NULL here)... Actually this bug has been fixed for 2.3 & 2.4 (see http://review.whamcloud.com/#change,3928 & LU-1823 ), I think we need to backport it to 2.2.

Zhenyu Xu added a comment - 23/Nov/12 4:47 AM

I think the "LustreError: 31980:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff88042d450240" is a misleading message, the message only means the current log does not has enough space for the log record, it will create a new log for it later.

int llog_cat_add_rec(struct llog_handle *cathandle, struct llog_rec_hdr *rec,
                     struct llog_cookie *reccookie, void *buf)
{
        struct llog_handle *loghandle;
        int rc;
        ENTRY;

        LASSERT(rec->lrh_len <= LLOG_CHUNK_SIZE);
        loghandle = llog_cat_current_log(cathandle, 1);
        if (IS_ERR(loghandle))
                RETURN(PTR_ERR(loghandle));
        /* loghandle is already locked by llog_cat_current_log() for us */
        rc = llog_write_rec(loghandle, rec, reccookie, 1, buf, -1);
        if (rc < 0)
                CERROR("llog_write_rec %d: lh=%p\n", rc, loghandle);
        cfs_up_write(&loghandle->lgh_lock);
        if (rc == -ENOSPC) {
                /* to create a new plain log */
                loghandle = llog_cat_current_log(cathandle, 1);
                if (IS_ERR(loghandle))
                        RETURN(PTR_ERR(loghandle));
                rc = llog_write_rec(loghandle, rec, reccookie, 1, buf, -1);
                cfs_up_write(&loghandle->lgh_lock);
        }

        RETURN(rc);
}

Zhenyu Xu added a comment - 23/Nov/12 4:47 AM I think the "LustreError: 31980:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff88042d450240" is a misleading message, the message only means the current log does not has enough space for the log record, it will create a new log for it later. int llog_cat_add_rec(struct llog_handle *cathandle, struct llog_rec_hdr *rec, struct llog_cookie *reccookie, void *buf) { struct llog_handle *loghandle; int rc; ENTRY; LASSERT(rec->lrh_len <= LLOG_CHUNK_SIZE); loghandle = llog_cat_current_log(cathandle, 1); if (IS_ERR(loghandle)) RETURN(PTR_ERR(loghandle)); /* loghandle is already locked by llog_cat_current_log() for us */ rc = llog_write_rec(loghandle, rec, reccookie, 1, buf, -1); if (rc < 0) CERROR( "llog_write_rec %d: lh=%p\n" , rc, loghandle); cfs_up_write(&loghandle->lgh_lock); if (rc == -ENOSPC) { /* to create a new plain log */ loghandle = llog_cat_current_log(cathandle, 1); if (IS_ERR(loghandle)) RETURN(PTR_ERR(loghandle)); rc = llog_write_rec(loghandle, rec, reccookie, 1, buf, -1); cfs_up_write(&loghandle->lgh_lock); } RETURN(rc); }

Zhenyu Xu added a comment - 22/Nov/12 4:35 AM

Yes, even it's 1.8.x client problem we should fix it. The purpose of the question is trying to help to make out which area to find the root cause.

I'm still investigating the llog part issue.

Zhenyu Xu added a comment - 22/Nov/12 4:35 AM Yes, even it's 1.8.x client problem we should fix it. The purpose of the question is trying to help to make out which area to find the root cause. I'm still investigating the llog part issue.

Adrian Ulrich (Inactive) added a comment - 22/Nov/12 4:19 AM

Well, the problem is that i can not reproduce the crash and i did not see any new crashes since 14. November.

(The crash was probably caused by an user job: There are about ~800 users on our cluster and i have no way to figure out what job crashed it).

But in any case: Even if the crash was triggered by an 1.8.x client: It should get fixed, shouldn't it?

And do we have any news about the llog_write_rec error? (did the debugfs output help?)

Adrian Ulrich (Inactive) added a comment - 22/Nov/12 4:19 AM Well, the problem is that i can not reproduce the crash and i did not see any new crashes since 14. November. (The crash was probably caused by an user job: There are about ~800 users on our cluster and i have no way to figure out what job crashed it). But in any case: Even if the crash was triggered by an 1.8.x client: It should get fixed, shouldn't it? And do we have any news about the llog_write_rec error? (did the debugfs output help?)

People

Assignee:: Niu Yawei (Inactive)

Reporter:: ETHz Support (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 14/Nov/12 4:46 AM

Updated:: 06/Dec/12 9:24 AM

Resolved:: 06/Dec/12 9:24 AM