[LU-2323] mds crash - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.2.0
Labels:
- server
Environment:

Hide
[root@n-mds1 ~]# cat /proc/fs/lustre/version
lustre: 2.2.0
kernel: patchless_client
build: 2.2.0-RC2--PRISTINE-2.6.32-220.4.2.el6_lustre.x86_64

[root@n-mds1 ~]# uname -r
2.6.32-220.4.2.el6_lustre.x86_64

[root@n-mds1 ~]# rpm -qa|grep lustre
lustre-ldiskfs-3.3.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
lustre-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
kernel-firmware-2.6.32-220.4.2.el6_lustre.x86_64
lustre-modules-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
kernel-headers-2.6.32-220.4.2.el6_lustre.x86_64
kernel-2.6.32-220.4.2.el6_lustre.x86_64
kernel-devel-2.6.32-220.4.2.el6_lustre.x86_64

Show
[ root@n-mds1 ~]# cat /proc/fs/lustre/version lustre: 2.2.0 kernel: patchless_client build: 2.2.0-RC2--PRISTINE-2.6.32-220.4.2.el6_lustre.x86_64 [ root@n-mds1 ~]# uname -r 2.6.32-220.4.2.el6_lustre.x86_64 [ root@n-mds1 ~]# rpm -qa|grep lustre lustre-ldiskfs-3.3.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64 lustre-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64 kernel-firmware-2.6.32-220.4.2.el6_lustre.x86_64 lustre-modules-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64 kernel-headers-2.6.32-220.4.2.el6_lustre.x86_64 kernel-2.6.32-220.4.2.el6_lustre.x86_64 kernel-devel-2.6.32-220.4.2.el6_lustre.x86_64

Severity:
1
Epic:
- metadata
- server
Rank (Obsolete):
5550

Description

We recently experienced two MDS crashes on our Lustre installation.

I've attached the netconsole output of both crashes (that's all i got: there is nothing in the syslog and i wasn't able to create a screenshot of the console output as the crashed mds was already powercycled by its failover partner).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

llog.txt
47 kB
16/Nov/12 7:32 AM
mds08.txt
15 kB
14/Nov/12 4:47 AM
mds14.txt
15 kB
14/Nov/12 4:47 AM
osd_ldiskfs.ko
4.18 MB
15/Nov/12 12:30 PM

Activity

[LU-2323] mds crash

Peter Jones added a comment - 30/Nov/12 8:45 AM

Adrian a build of the change backported to 2.2 already exists - http://build.whamcloud.com/job/lustre-reviews/10853/ - but is still in the automated test queue at the moment. Lustre 2.3 is available now and has been thoroughly tested. It will of course include other content beyond just this one fix (both additional features and many other fixes)

Peter Jones added a comment - 30/Nov/12 8:45 AM Adrian a build of the change backported to 2.2 already exists - http://build.whamcloud.com/job/lustre-reviews/10853/ - but is still in the automated test queue at the moment. Lustre 2.3 is available now and has been thoroughly tested. It will of course include other content beyond just this one fix (both additional features and many other fixes)

Adrian Ulrich (Inactive) added a comment - 30/Nov/12 7:51 AM

Thanks for fixing this issue: We will upgrade our MDS as soon as a new build becomes available – or should we just upgrade to 2.3?

Adrian Ulrich (Inactive) added a comment - 30/Nov/12 7:51 AM Thanks for fixing this issue: We will upgrade our MDS as soon as a new build becomes available – or should we just upgrade to 2.3?

Niu Yawei (Inactive) added a comment - 29/Nov/12 2:47 AM

backport the memory corruption fix in mdd_declare_attr_set() to b2_2: http://review.whamcloud.com/4703

Niu Yawei (Inactive) added a comment - 29/Nov/12 2:47 AM backport the memory corruption fix in mdd_declare_attr_set() to b2_2: http://review.whamcloud.com/4703

Niu Yawei (Inactive) added a comment - 29/Nov/12 2:22 AM

After checking the 2.2 code carefully, I found a culprit which can cause such memory corruption:

in mdd_declare_attr_set():

#ifdef CONFIG_FS_POSIX_ACL
        if (ma->ma_attr.la_valid & LA_MODE) {
                mdd_read_lock(env, obj, MOR_TGT_CHILD);
                rc = mdo_xattr_get(env, obj, buf, XATTR_NAME_ACL_ACCESS,
                                   BYPASS_CAPA);
                mdd_read_unlock(env, obj);
                if (rc == -EOPNOTSUPP || rc == -ENODATA)
                        rc = 0;
                else if (rc < 0)
                        return rc;

Our intention here is to retrieve the xattr length, but we passed an uninitialized buffer to mdo_xattr_get() (we should pass NULL here)...
Actually this bug has been fixed for 2.3 & 2.4 (see http://review.whamcloud.com/#change,3928 & ~~LU-1823~~), I think we need to backport it to 2.2.

Niu Yawei (Inactive) added a comment - 29/Nov/12 2:22 AM After checking the 2.2 code carefully, I found a culprit which can cause such memory corruption: in mdd_declare_attr_set(): #ifdef CONFIG_FS_POSIX_ACL if (ma->ma_attr.la_valid & LA_MODE) { mdd_read_lock(env, obj, MOR_TGT_CHILD); rc = mdo_xattr_get(env, obj, buf, XATTR_NAME_ACL_ACCESS, BYPASS_CAPA); mdd_read_unlock(env, obj); if (rc == -EOPNOTSUPP || rc == -ENODATA) rc = 0; else if (rc < 0) return rc; Our intention here is to retrieve the xattr length, but we passed an uninitialized buffer to mdo_xattr_get() (we should pass NULL here)... Actually this bug has been fixed for 2.3 & 2.4 (see http://review.whamcloud.com/#change,3928 & LU-1823 ), I think we need to backport it to 2.2.

Zhenyu Xu added a comment - 23/Nov/12 4:47 AM

I think the "LustreError: 31980:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff88042d450240" is a misleading message, the message only means the current log does not has enough space for the log record, it will create a new log for it later.

int llog_cat_add_rec(struct llog_handle *cathandle, struct llog_rec_hdr *rec,
                     struct llog_cookie *reccookie, void *buf)
{
        struct llog_handle *loghandle;
        int rc;
        ENTRY;

        LASSERT(rec->lrh_len <= LLOG_CHUNK_SIZE);
        loghandle = llog_cat_current_log(cathandle, 1);
        if (IS_ERR(loghandle))
                RETURN(PTR_ERR(loghandle));
        /* loghandle is already locked by llog_cat_current_log() for us */
        rc = llog_write_rec(loghandle, rec, reccookie, 1, buf, -1);
        if (rc < 0)
                CERROR("llog_write_rec %d: lh=%p\n", rc, loghandle);
        cfs_up_write(&loghandle->lgh_lock);
        if (rc == -ENOSPC) {
                /* to create a new plain log */
                loghandle = llog_cat_current_log(cathandle, 1);
                if (IS_ERR(loghandle))
                        RETURN(PTR_ERR(loghandle));
                rc = llog_write_rec(loghandle, rec, reccookie, 1, buf, -1);
                cfs_up_write(&loghandle->lgh_lock);
        }

        RETURN(rc);
}

Zhenyu Xu added a comment - 23/Nov/12 4:47 AM I think the "LustreError: 31980:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff88042d450240" is a misleading message, the message only means the current log does not has enough space for the log record, it will create a new log for it later. int llog_cat_add_rec(struct llog_handle *cathandle, struct llog_rec_hdr *rec, struct llog_cookie *reccookie, void *buf) { struct llog_handle *loghandle; int rc; ENTRY; LASSERT(rec->lrh_len <= LLOG_CHUNK_SIZE); loghandle = llog_cat_current_log(cathandle, 1); if (IS_ERR(loghandle)) RETURN(PTR_ERR(loghandle)); /* loghandle is already locked by llog_cat_current_log() for us */ rc = llog_write_rec(loghandle, rec, reccookie, 1, buf, -1); if (rc < 0) CERROR( "llog_write_rec %d: lh=%p\n" , rc, loghandle); cfs_up_write(&loghandle->lgh_lock); if (rc == -ENOSPC) { /* to create a new plain log */ loghandle = llog_cat_current_log(cathandle, 1); if (IS_ERR(loghandle)) RETURN(PTR_ERR(loghandle)); rc = llog_write_rec(loghandle, rec, reccookie, 1, buf, -1); cfs_up_write(&loghandle->lgh_lock); } RETURN(rc); }

Zhenyu Xu added a comment - 22/Nov/12 4:35 AM

Yes, even it's 1.8.x client problem we should fix it. The purpose of the question is trying to help to make out which area to find the root cause.

I'm still investigating the llog part issue.

Zhenyu Xu added a comment - 22/Nov/12 4:35 AM Yes, even it's 1.8.x client problem we should fix it. The purpose of the question is trying to help to make out which area to find the root cause. I'm still investigating the llog part issue.

Adrian Ulrich (Inactive) added a comment - 22/Nov/12 4:19 AM

Well, the problem is that i can not reproduce the crash and i did not see any new crashes since 14. November.

(The crash was probably caused by an user job: There are about ~800 users on our cluster and i have no way to figure out what job crashed it).

But in any case: Even if the crash was triggered by an 1.8.x client: It should get fixed, shouldn't it?

And do we have any news about the llog_write_rec error? (did the debugfs output help?)

Adrian Ulrich (Inactive) added a comment - 22/Nov/12 4:19 AM Well, the problem is that i can not reproduce the crash and i did not see any new crashes since 14. November. (The crash was probably caused by an user job: There are about ~800 users on our cluster and i have no way to figure out what job crashed it). But in any case: Even if the crash was triggered by an 1.8.x client: It should get fixed, shouldn't it? And do we have any news about the llog_write_rec error? (did the debugfs output help?)

Zhenyu Xu added a comment - 22/Nov/12 2:33 AM

Adrian,

Did you have opportunity to try excluding 1.8.x clients to check whether the MDS still crashes with only 2.x clients accessing it?

Zhenyu Xu added a comment - 22/Nov/12 2:33 AM Adrian, Did you have opportunity to try excluding 1.8.x clients to check whether the MDS still crashes with only 2.x clients accessing it?

Adrian Ulrich (Inactive) added a comment - 16/Nov/12 7:37 AM

I did this using a snapshot from the MDS (taken at 5. November).
The output of llog_reader is attached to the case (llog.txt)

Output of CONFIGS/ via debugfs:

$ debugfs mds.dump
debugfs 1.41.12 (17-May-2010)
debugfs: ls -l CONFIGS
467550721 40777 (2) 0 0 4096 9-Oct-2012 07:05 .
2 40755 (2) 0 0 4096 3-May-2012 14:58 ..
467550722 100644 (1) 0 0 12288 9-May-2012 09:21 mountdata
467550723 100644 (1) 0 0 0 3-May-2012 14:58 _mgs-sptlrpc
467550724 100644 (1) 0 0 89128 3-May-2012 14:58 nero-client
467550725 100644 (1) 0 0 0 3-May-2012 14:58 nero-sptlrpc
467550726 100644 (1) 0 0 89000 3-May-2012 14:58 nero-MDT0000
467550727 100644 (1) 0 0 0 3-May-2012 14:58 changelog_catalog
467550728 100644 (1) 0 0 0 3-May-2012 14:58 changelog_users
467550730 100644 (1) 0 0 9432 3-May-2012 15:05 nero-OST0000
467550729 100644 (1) 0 0 0 9-Oct-2012 07:05 sptlrpc
467550731 100644 (1) 0 0 9432 3-May-2012 15:08 nero-OST0008
467550732 100644 (1) 0 0 9432 3-May-2012 15:09 nero-OST0010
467550733 100644 (1) 0 0 9432 3-May-2012 15:09 nero-OST0018
467550734 100644 (1) 0 0 9432 3-May-2012 15:58 nero-OST0001
467550735 100644 (1) 0 0 9432 3-May-2012 15:58 nero-OST0009
467550736 100644 (1) 0 0 9432 3-May-2012 15:59 nero-OST0011
467550737 100644 (1) 0 0 9432 3-May-2012 15:59 nero-OST0019
467550738 100644 (1) 0 0 9432 3-May-2012 16:00 nero-OST0002
467550739 100644 (1) 0 0 9432 3-May-2012 16:01 nero-OST000a
467550740 100644 (1) 0 0 9432 3-May-2012 16:04 nero-OST0012
467550741 100644 (1) 0 0 9432 3-May-2012 16:05 nero-OST001a
467550742 100644 (1) 0 0 9432 3-May-2012 16:06 nero-OST0003
467550743 100644 (1) 0 0 9432 3-May-2012 16:06 nero-OST000b
467550744 100644 (1) 0 0 9432 3-May-2012 16:06 nero-OST0013
467550745 100644 (1) 0 0 9432 3-May-2012 16:07 nero-OST001b
467550746 100644 (1) 0 0 9432 3-May-2012 16:11 nero-OST0004
467550747 100644 (1) 0 0 9432 3-May-2012 16:11 nero-OST000c
467550748 100644 (1) 0 0 9432 3-May-2012 16:12 nero-OST0014
467550749 100644 (1) 0 0 9432 3-May-2012 16:12 nero-OST001c
467550750 100644 (1) 0 0 9432 3-May-2012 16:14 nero-OST0005
467550751 100644 (1) 0 0 9432 3-May-2012 16:14 nero-OST000d
467550752 100644 (1) 0 0 9432 3-May-2012 16:14 nero-OST0015
467550753 100644 (1) 0 0 9432 3-May-2012 16:15 nero-OST001d
467550754 100644 (1) 0 0 9432 3-May-2012 16:18 nero-OST0006
467550755 100644 (1) 0 0 9432 3-May-2012 16:18 nero-OST000e
467550756 100644 (1) 0 0 9432 3-May-2012 16:18 nero-OST0016
467550757 100644 (1) 0 0 9432 3-May-2012 16:18 nero-OST001e
467550758 100644 (1) 0 0 9432 3-May-2012 16:21 nero-OST0007
467550759 100644 (1) 0 0 9432 3-May-2012 16:21 nero-OST000f
467550760 100644 (1) 0 0 9432 3-May-2012 16:22 nero-OST0017
467550761 100644 (1) 0 0 9432 3-May-2012 16:22 nero-OST001f

Adrian Ulrich (Inactive) added a comment - 16/Nov/12 7:37 AM I did this using a snapshot from the MDS (taken at 5. November). The output of llog_reader is attached to the case (llog.txt) Output of CONFIGS/ via debugfs: $ debugfs mds.dump debugfs 1.41.12 (17-May-2010) debugfs: ls -l CONFIGS 467550721 40777 (2) 0 0 4096 9-Oct-2012 07:05 . 2 40755 (2) 0 0 4096 3-May-2012 14:58 .. 467550722 100644 (1) 0 0 12288 9-May-2012 09:21 mountdata 467550723 100644 (1) 0 0 0 3-May-2012 14:58 _mgs-sptlrpc 467550724 100644 (1) 0 0 89128 3-May-2012 14:58 nero-client 467550725 100644 (1) 0 0 0 3-May-2012 14:58 nero-sptlrpc 467550726 100644 (1) 0 0 89000 3-May-2012 14:58 nero-MDT0000 467550727 100644 (1) 0 0 0 3-May-2012 14:58 changelog_catalog 467550728 100644 (1) 0 0 0 3-May-2012 14:58 changelog_users 467550730 100644 (1) 0 0 9432 3-May-2012 15:05 nero-OST0000 467550729 100644 (1) 0 0 0 9-Oct-2012 07:05 sptlrpc 467550731 100644 (1) 0 0 9432 3-May-2012 15:08 nero-OST0008 467550732 100644 (1) 0 0 9432 3-May-2012 15:09 nero-OST0010 467550733 100644 (1) 0 0 9432 3-May-2012 15:09 nero-OST0018 467550734 100644 (1) 0 0 9432 3-May-2012 15:58 nero-OST0001 467550735 100644 (1) 0 0 9432 3-May-2012 15:58 nero-OST0009 467550736 100644 (1) 0 0 9432 3-May-2012 15:59 nero-OST0011 467550737 100644 (1) 0 0 9432 3-May-2012 15:59 nero-OST0019 467550738 100644 (1) 0 0 9432 3-May-2012 16:00 nero-OST0002 467550739 100644 (1) 0 0 9432 3-May-2012 16:01 nero-OST000a 467550740 100644 (1) 0 0 9432 3-May-2012 16:04 nero-OST0012 467550741 100644 (1) 0 0 9432 3-May-2012 16:05 nero-OST001a 467550742 100644 (1) 0 0 9432 3-May-2012 16:06 nero-OST0003 467550743 100644 (1) 0 0 9432 3-May-2012 16:06 nero-OST000b 467550744 100644 (1) 0 0 9432 3-May-2012 16:06 nero-OST0013 467550745 100644 (1) 0 0 9432 3-May-2012 16:07 nero-OST001b 467550746 100644 (1) 0 0 9432 3-May-2012 16:11 nero-OST0004 467550747 100644 (1) 0 0 9432 3-May-2012 16:11 nero-OST000c 467550748 100644 (1) 0 0 9432 3-May-2012 16:12 nero-OST0014 467550749 100644 (1) 0 0 9432 3-May-2012 16:12 nero-OST001c 467550750 100644 (1) 0 0 9432 3-May-2012 16:14 nero-OST0005 467550751 100644 (1) 0 0 9432 3-May-2012 16:14 nero-OST000d 467550752 100644 (1) 0 0 9432 3-May-2012 16:14 nero-OST0015 467550753 100644 (1) 0 0 9432 3-May-2012 16:15 nero-OST001d 467550754 100644 (1) 0 0 9432 3-May-2012 16:18 nero-OST0006 467550755 100644 (1) 0 0 9432 3-May-2012 16:18 nero-OST000e 467550756 100644 (1) 0 0 9432 3-May-2012 16:18 nero-OST0016 467550757 100644 (1) 0 0 9432 3-May-2012 16:18 nero-OST001e 467550758 100644 (1) 0 0 9432 3-May-2012 16:21 nero-OST0007 467550759 100644 (1) 0 0 9432 3-May-2012 16:21 nero-OST000f 467550760 100644 (1) 0 0 9432 3-May-2012 16:22 nero-OST0017 467550761 100644 (1) 0 0 9432 3-May-2012 16:22 nero-OST001f

ETHz Support (Inactive) added a comment - 15/Nov/12 3:45 PM

Adrian,
if I remember well you can use debufs with the device mounted. Try:

debugfs -c -R 'dump CONFIGS/ /tmp/config' /dev/<device>
llog_reader /tmp/config

ETHz Support (Inactive) added a comment - 15/Nov/12 3:45 PM Adrian, if I remember well you can use debufs with the device mounted. Try: debugfs -c -R 'dump CONFIGS/ /tmp/config' /dev/<device> llog_reader /tmp/config

Alex Zhuravlev added a comment - 15/Nov/12 1:58 PM

no, i've got what I need, thanks

a bit of analysis here:

0000000000004d26 <osd_trans_stop+0x56> mov 0x50(%rbx),%r12
0000000000004d2a <osd_trans_stop+0x5a> test %r12,%r12
0000000000004d2d <osd_trans_stop+0x5d> je 0000000000004e82 <osd_trans_stop+0x1b2>
0000000000004d33 <osd_trans_stop+0x63> movzbl 0x28(%r12),%eax
0000000000004d39 <osd_trans_stop+0x69> movzbl 0x4c(%rbx),%edx
0000000000004d3d <osd_trans_stop+0x6d> and $0xfffffffe,%eax
0000000000004d40 <osd_trans_stop+0x70> and $0x1,%edx
0000000000004d43 <osd_trans_stop+0x73> or %edx,%eax
0000000000004d45 <osd_trans_stop+0x75> mov %al,0x28(%r12)
0000000000004d4a <osd_trans_stop+0x7a> mov (%r12),%rax

so rbx contains pointer to oh:

(gdb) p/x sizeof(struct thandle)
$2 = 0x50
struct osd_thandle {
struct thandle ot_super;
handle_t *ot_handle;

0000000000004db3 <osd_trans_stop+0xe3> mov (%rbx),%rax
0000000000004db6 <osd_trans_stop+0xe6> test %rax,%rax
0000000000004db9 <osd_trans_stop+0xe9> je 0000000000004dc4 <osd_trans_stop+0xf4>
0000000000004dbb <osd_trans_stop+0xeb> mov 0x8(%rax),%rax
0000000000004dbf <osd_trans_stop+0xef> testb $0x1,(%rax)

these lines implement:
if (lu_device_is_md(&th->th_dev->dd_lu_dev)) {

RAX: 0006000100000002 is supposed to be ld_type (and 0x8(%rax) is ld_type->ldt_tags)

IOW, thandle was broken and pointing to garbage instead of a device.

now the question what broke that..

Alex Zhuravlev added a comment - 15/Nov/12 1:58 PM no, i've got what I need, thanks a bit of analysis here: 0000000000004d26 <osd_trans_stop+0x56> mov 0x50(%rbx),%r12 0000000000004d2a <osd_trans_stop+0x5a> test %r12,%r12 0000000000004d2d <osd_trans_stop+0x5d> je 0000000000004e82 <osd_trans_stop+0x1b2> 0000000000004d33 <osd_trans_stop+0x63> movzbl 0x28(%r12),%eax 0000000000004d39 <osd_trans_stop+0x69> movzbl 0x4c(%rbx),%edx 0000000000004d3d <osd_trans_stop+0x6d> and $0xfffffffe,%eax 0000000000004d40 <osd_trans_stop+0x70> and $0x1,%edx 0000000000004d43 <osd_trans_stop+0x73> or %edx,%eax 0000000000004d45 <osd_trans_stop+0x75> mov %al,0x28(%r12) 0000000000004d4a <osd_trans_stop+0x7a> mov (%r12),%rax so rbx contains pointer to oh: (gdb) p/x sizeof(struct thandle) $2 = 0x50 struct osd_thandle { struct thandle ot_super; handle_t *ot_handle; 0000000000004db3 <osd_trans_stop+0xe3> mov (%rbx),%rax 0000000000004db6 <osd_trans_stop+0xe6> test %rax,%rax 0000000000004db9 <osd_trans_stop+0xe9> je 0000000000004dc4 <osd_trans_stop+0xf4> 0000000000004dbb <osd_trans_stop+0xeb> mov 0x8(%rax),%rax 0000000000004dbf <osd_trans_stop+0xef> testb $0x1,(%rax) these lines implement: if (lu_device_is_md(&th->th_dev->dd_lu_dev)) { RAX: 0006000100000002 is supposed to be ld_type (and 0x8(%rax) is ld_type->ldt_tags) IOW, thandle was broken and pointing to garbage instead of a device. now the question what broke that..

People

Assignee:: Niu Yawei (Inactive)

Reporter:: ETHz Support (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 14/Nov/12 4:46 AM

Updated:: 06/Dec/12 9:24 AM

Resolved:: 06/Dec/12 9:24 AM