[LU-2323] mds crash - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.2.0
Labels:
- server
Environment:

Hide
[root@n-mds1 ~]# cat /proc/fs/lustre/version
lustre: 2.2.0
kernel: patchless_client
build: 2.2.0-RC2--PRISTINE-2.6.32-220.4.2.el6_lustre.x86_64

[root@n-mds1 ~]# uname -r
2.6.32-220.4.2.el6_lustre.x86_64

[root@n-mds1 ~]# rpm -qa|grep lustre
lustre-ldiskfs-3.3.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
lustre-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
kernel-firmware-2.6.32-220.4.2.el6_lustre.x86_64
lustre-modules-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
kernel-headers-2.6.32-220.4.2.el6_lustre.x86_64
kernel-2.6.32-220.4.2.el6_lustre.x86_64
kernel-devel-2.6.32-220.4.2.el6_lustre.x86_64

Show
[ root@n-mds1 ~]# cat /proc/fs/lustre/version lustre: 2.2.0 kernel: patchless_client build: 2.2.0-RC2--PRISTINE-2.6.32-220.4.2.el6_lustre.x86_64 [ root@n-mds1 ~]# uname -r 2.6.32-220.4.2.el6_lustre.x86_64 [ root@n-mds1 ~]# rpm -qa|grep lustre lustre-ldiskfs-3.3.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64 lustre-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64 kernel-firmware-2.6.32-220.4.2.el6_lustre.x86_64 lustre-modules-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64 kernel-headers-2.6.32-220.4.2.el6_lustre.x86_64 kernel-2.6.32-220.4.2.el6_lustre.x86_64 kernel-devel-2.6.32-220.4.2.el6_lustre.x86_64

Severity:
1
Epic:
- metadata
- server
Rank (Obsolete):
5550

Description

We recently experienced two MDS crashes on our Lustre installation.

I've attached the netconsole output of both crashes (that's all i got: there is nothing in the syslog and i wasn't able to create a screenshot of the console output as the crashed mds was already powercycled by its failover partner).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

llog.txt
47 kB
16/Nov/12 7:32 AM
mds08.txt
15 kB
14/Nov/12 4:47 AM
mds14.txt
15 kB
14/Nov/12 4:47 AM
osd_ldiskfs.ko
4.18 MB
15/Nov/12 12:30 PM

Activity

[LU-2323] mds crash

ETHz Support (Inactive) added a comment - 15/Nov/12 3:45 PM

Adrian,
if I remember well you can use debufs with the device mounted. Try:

debugfs -c -R 'dump CONFIGS/ /tmp/config' /dev/<device>
llog_reader /tmp/config

ETHz Support (Inactive) added a comment - 15/Nov/12 3:45 PM Adrian, if I remember well you can use debufs with the device mounted. Try: debugfs -c -R 'dump CONFIGS/ /tmp/config' /dev/<device> llog_reader /tmp/config

Alex Zhuravlev added a comment - 15/Nov/12 1:58 PM

no, i've got what I need, thanks

a bit of analysis here:

0000000000004d26 <osd_trans_stop+0x56> mov 0x50(%rbx),%r12
0000000000004d2a <osd_trans_stop+0x5a> test %r12,%r12
0000000000004d2d <osd_trans_stop+0x5d> je 0000000000004e82 <osd_trans_stop+0x1b2>
0000000000004d33 <osd_trans_stop+0x63> movzbl 0x28(%r12),%eax
0000000000004d39 <osd_trans_stop+0x69> movzbl 0x4c(%rbx),%edx
0000000000004d3d <osd_trans_stop+0x6d> and $0xfffffffe,%eax
0000000000004d40 <osd_trans_stop+0x70> and $0x1,%edx
0000000000004d43 <osd_trans_stop+0x73> or %edx,%eax
0000000000004d45 <osd_trans_stop+0x75> mov %al,0x28(%r12)
0000000000004d4a <osd_trans_stop+0x7a> mov (%r12),%rax

so rbx contains pointer to oh:

(gdb) p/x sizeof(struct thandle)
$2 = 0x50
struct osd_thandle {
struct thandle ot_super;
handle_t *ot_handle;

0000000000004db3 <osd_trans_stop+0xe3> mov (%rbx),%rax
0000000000004db6 <osd_trans_stop+0xe6> test %rax,%rax
0000000000004db9 <osd_trans_stop+0xe9> je 0000000000004dc4 <osd_trans_stop+0xf4>
0000000000004dbb <osd_trans_stop+0xeb> mov 0x8(%rax),%rax
0000000000004dbf <osd_trans_stop+0xef> testb $0x1,(%rax)

these lines implement:
if (lu_device_is_md(&th->th_dev->dd_lu_dev)) {

RAX: 0006000100000002 is supposed to be ld_type (and 0x8(%rax) is ld_type->ldt_tags)

IOW, thandle was broken and pointing to garbage instead of a device.

now the question what broke that..

Alex Zhuravlev added a comment - 15/Nov/12 1:58 PM no, i've got what I need, thanks a bit of analysis here: 0000000000004d26 <osd_trans_stop+0x56> mov 0x50(%rbx),%r12 0000000000004d2a <osd_trans_stop+0x5a> test %r12,%r12 0000000000004d2d <osd_trans_stop+0x5d> je 0000000000004e82 <osd_trans_stop+0x1b2> 0000000000004d33 <osd_trans_stop+0x63> movzbl 0x28(%r12),%eax 0000000000004d39 <osd_trans_stop+0x69> movzbl 0x4c(%rbx),%edx 0000000000004d3d <osd_trans_stop+0x6d> and $0xfffffffe,%eax 0000000000004d40 <osd_trans_stop+0x70> and $0x1,%edx 0000000000004d43 <osd_trans_stop+0x73> or %edx,%eax 0000000000004d45 <osd_trans_stop+0x75> mov %al,0x28(%r12) 0000000000004d4a <osd_trans_stop+0x7a> mov (%r12),%rax so rbx contains pointer to oh: (gdb) p/x sizeof(struct thandle) $2 = 0x50 struct osd_thandle { struct thandle ot_super; handle_t *ot_handle; 0000000000004db3 <osd_trans_stop+0xe3> mov (%rbx),%rax 0000000000004db6 <osd_trans_stop+0xe6> test %rax,%rax 0000000000004db9 <osd_trans_stop+0xe9> je 0000000000004dc4 <osd_trans_stop+0xf4> 0000000000004dbb <osd_trans_stop+0xeb> mov 0x8(%rax),%rax 0000000000004dbf <osd_trans_stop+0xef> testb $0x1,(%rax) these lines implement: if (lu_device_is_md(&th->th_dev->dd_lu_dev)) { RAX: 0006000100000002 is supposed to be ld_type (and 0x8(%rax) is ld_type->ldt_tags) IOW, thandle was broken and pointing to garbage instead of a device. now the question what broke that..

Adrian Ulrich (Inactive) added a comment - 15/Nov/12 12:32 PM

Alex: I attached the requested kernel module (or did you mean with 'attach' that i should insmod it? )

Adrian Ulrich (Inactive) added a comment - 15/Nov/12 12:32 PM Alex: I attached the requested kernel module (or did you mean with 'attach' that i should insmod it? )

Adrian Ulrich (Inactive) added a comment - 15/Nov/12 12:30 PM

md5sum = 6a07cbbb49f63ea0f6a5e6bc067bc7c9

requested kernel module

Adrian Ulrich (Inactive) added a comment - 15/Nov/12 12:30 PM md5sum = 6a07cbbb49f63ea0f6a5e6bc067bc7c9 requested kernel module

Alex Zhuravlev added a comment - 15/Nov/12 12:22 PM

ok, good to know.. llog_write_rec() getting -28 mean MDS was not able to write (transactionally) a record to make some updates to OST (remove OST object, update OST object attributes).

then could you attach osd_ldiskfs.ko please ?

Alex Zhuravlev added a comment - 15/Nov/12 12:22 PM ok, good to know.. llog_write_rec() getting -28 mean MDS was not able to write (transactionally) a record to make some updates to OST (remove OST object, update OST object attributes). then could you attach osd_ldiskfs.ko please ?

Adrian Ulrich (Inactive) added a comment - 15/Nov/12 12:13 PM

Yong Fan:

We need the 1.8.x client to copy data from our old Lustre 1.8 installation to the 2.2 installation.
I will temporarily disable it on the next crash (there are still users moving data around).

We didn't do any upgrades recently: We didn't touch the servers since months.

We have 2.3 clients because the 2.2-client is too unstable for us and i don't have enough time to backport all crash fixes from 2.3 to our 2.2-client

Alex: We get this llog_write_rec errors since months (Jun 2012) while our MDS only startet to crash recently (also note that there are ~5 hours between the error and the crash).

What does the llog_write_rec error mean anyway?

I'll post the output of 'ls CONFIGS/' as soon as i have a chance to remount the volume via ldiskfs (= on next crash or on monday evening)

Adrian Ulrich (Inactive) added a comment - 15/Nov/12 12:13 PM Yong Fan: We need the 1.8.x client to copy data from our old Lustre 1.8 installation to the 2.2 installation. I will temporarily disable it on the next crash (there are still users moving data around). We didn't do any upgrades recently: We didn't touch the servers since months. We have 2.3 clients because the 2.2-client is too unstable for us and i don't have enough time to backport all crash fixes from 2.3 to our 2.2-client Alex: We get this llog_write_rec errors since months (Jun 2012) while our MDS only startet to crash recently (also note that there are ~5 hours between the error and the crash). What does the llog_write_rec error mean anyway? I'll post the output of 'ls CONFIGS/' as soon as i have a chance to remount the volume via ldiskfs (= on next crash or on monday evening)

nasf (Inactive) added a comment - 15/Nov/12 12:06 PM

Adrian, if you cannot abandon 1.8.4 clients, then please umount them temporarily, and try to reproduce the failure with other clients. If cannot reproduce, then it is quite possible related with the interoperability issues.

nasf (Inactive) added a comment - 15/Nov/12 12:06 PM Adrian, if you cannot abandon 1.8.4 clients, then please umount them temporarily, and try to reproduce the failure with other clients. If cannot reproduce, then it is quite possible related with the interoperability issues.

Alex Zhuravlev added a comment - 15/Nov/12 12:04 PM

in the both cases the very first messages were about inability to add llog record:

LustreError: 3809:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff880482b99800

probably it's a problem in the code handling this error. I think we should reproduce this locally.

it makes sense to ls CONFIGS/ directory using ldiskfs or debugfs to see how much space can be free after orphan cleanup.

Alex Zhuravlev added a comment - 15/Nov/12 12:04 PM in the both cases the very first messages were about inability to add llog record: LustreError: 3809:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff880482b99800 probably it's a problem in the code handling this error. I think we should reproduce this locally. it makes sense to ls CONFIGS/ directory using ldiskfs or debugfs to see how much space can be free after orphan cleanup.

nasf (Inactive) added a comment - 15/Nov/12 11:52 AM - edited

I do not think it is NULL "name" for osd_xattr_set() caused the failure. In this case, the "name" is from MDS internal, not from client. I do not find any internal callers passing NULL "name". On the other hand, the other failure instances have different call traces.

Adrian, have you made any system upgrading recently when you hit the failure? There is no evidence to indicate that it is interoperability issue caused the failure, but we can try to locate the issue step by step. The first step, please migrate the system load from above non-2.2 clients to other standard-2.2 clients. Especially the 1.8.4 client, because it is too old. I do not think we have tested the interoperation between Lustre-2.2 server and Lustre-1.8.4 client when we released Lustre-2.2. So it is the most suspicious.

nasf (Inactive) added a comment - 15/Nov/12 11:52 AM - edited I do not think it is NULL "name" for osd_xattr_set() caused the failure. In this case, the "name" is from MDS internal, not from client. I do not find any internal callers passing NULL "name". On the other hand, the other failure instances have different call traces. Adrian, have you made any system upgrading recently when you hit the failure? There is no evidence to indicate that it is interoperability issue caused the failure, but we can try to locate the issue step by step. The first step, please migrate the system load from above non-2.2 clients to other standard-2.2 clients. Especially the 1.8.4 client, because it is too old. I do not think we have tested the interoperation between Lustre-2.2 server and Lustre-1.8.4 client when we released Lustre-2.2. So it is the most suspicious.

Adrian Ulrich (Inactive) added a comment - 14/Nov/12 10:41 AM

Johann:

Does the crash actually get triggered due to a client calling setaddr?

One of our most frequent setattr callers is an 1.8.4 client (10.201.32.32) - could this client be the cause of the crash?

[root@n-mds1 exports]# grep setattr */stats | awk '

{print $2 " " $1}

'|sort -n | tail -5
1694 10.201.38.39@o2ib/stats:setattr
1841 10.201.38.21@o2ib/stats:setattr
2062 10.201.38.23@o2ib/stats:setattr <-- 2.2.93 client
8931 10.201.32.32@o2ib/stats:setattr <-- 1.8.4 client
17278 10.201.32.31@o2ib/stats:setattr <-- 2.3.0 client

Adrian Ulrich (Inactive) added a comment - 14/Nov/12 10:41 AM Johann: Does the crash actually get triggered due to a client calling setaddr? One of our most frequent setattr callers is an 1.8.4 client (10.201.32.32) - could this client be the cause of the crash? [root@n-mds1 exports] # grep setattr */stats | awk ' {print $2 " " $1} '|sort -n | tail -5 1694 10.201.38.39@o2ib/stats:setattr 1841 10.201.38.21@o2ib/stats:setattr 2062 10.201.38.23@o2ib/stats:setattr <-- 2.2.93 client 8931 10.201.32.32@o2ib/stats:setattr <-- 1.8.4 client 17278 10.201.32.31@o2ib/stats:setattr <-- 2.3.0 client

ETHz Support (Inactive) added a comment - 14/Nov/12 10:34 AM

Adrian,
I'm suggest to disable the cluster (to avoid pingpong) and mount manually the mds : mount -t lustre -L <MDT name> -o abort_recov <mount point>

ETHz Support (Inactive) added a comment - 14/Nov/12 10:34 AM Adrian, I'm suggest to disable the cluster (to avoid pingpong) and mount manually the mds : mount -t lustre -L <MDT name> -o abort_recov <mount point>

People

Assignee:: Niu Yawei (Inactive)

Reporter:: ETHz Support (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 14/Nov/12 4:46 AM

Updated:: 06/Dec/12 9:24 AM

Resolved:: 06/Dec/12 9:24 AM