[LU-2323] mds crash Created: 14/Nov/12  Updated: 06/Dec/12  Resolved: 06/Dec/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: ETHz Support (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: server
Environment:

[root@n-mds1 ~]# cat /proc/fs/lustre/version
lustre: 2.2.0
kernel: patchless_client
build: 2.2.0-RC2--PRISTINE-2.6.32-220.4.2.el6_lustre.x86_64

[root@n-mds1 ~]# uname -r
2.6.32-220.4.2.el6_lustre.x86_64

[root@n-mds1 ~]# rpm -qa|grep lustre
lustre-ldiskfs-3.3.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
lustre-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
kernel-firmware-2.6.32-220.4.2.el6_lustre.x86_64
lustre-modules-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
kernel-headers-2.6.32-220.4.2.el6_lustre.x86_64
kernel-2.6.32-220.4.2.el6_lustre.x86_64
kernel-devel-2.6.32-220.4.2.el6_lustre.x86_64


Attachments: Text File llog.txt     Text File mds08.txt     Text File mds14.txt     File osd_ldiskfs.ko    
Severity: 1
Epic: metadata, server
Rank (Obsolete): 5550

 Description   

We recently experienced two MDS crashes on our Lustre installation.

I've attached the netconsole output of both crashes (that's all i got: there is nothing in the syslog and i wasn't able to create a screenshot of the console output as the crashed mds was already powercycled by its failover partner).



 Comments   
Comment by ETHz Support (Inactive) [ 14/Nov/12 ]

netconsole output

Comment by ETHz Support (Inactive) [ 14/Nov/12 ]

In terms of activity, during the crashes, the mds server has memory available and the IB network have low traffic.

In both cases there was free memory available: The crash yesterday didn't even have a full read-cache and the MDS was never swapping (it has a 50GB Swap partition).

I also checked the system load + infiniband traffic: The MDS was doing almost nothing during the crash (load less than 0.5 / IB traffic ~100KB/s).

Comment by Niu Yawei (Inactive) [ 14/Nov/12 ]
2012-11-08T20:14:29+01:00 n-mds1 [<ffffffffa0dd4dbb>] osd_trans_stop+0xeb/0x390 [osd_ldiskfs]
2012-11-08T20:14:29+01:00 n-mds1 RSP <ffff880bd116da70>
2012-11-08T20:14:29+01:00 n-mds1 ---[ end trace 69a06040c21c938c ]---
2012-11-08T20:14:29+01:00 n-mds1 Kernel panic - not syncing: Fatal exception
2012-11-08T20:14:29+01:00 n-mds1 Pid: 3845, comm: mdt_52 Tainted: G      D    ----------------   2.6.32-220.4.2.el6_lustre.x86_64 #1
2012-11-08T20:14:29+01:00 n-mds1 Call Trace:
2012-11-08T20:14:29+01:00 n-mds1 [<ffffffff814ec61a>] ? panic+0x78/0x143
2012-11-08T20:14:29+01:00 n-mds1 [<ffffffff814f07a4>] ? oops_end+0xe4/0x100
2012-11-08T20:14:29+01:00 n-mds1 [<ffffffff8100f26b>] ? die+0x5b/0x90
2012-11-08T20:14:29+01:00 n-mds1 [<ffffffff814f0312>] ? do_general_protection+0x152/0x160
2012-11-08T20:14:29+01:00 n-mds1 [<ffffffffa0ddaada>] ? osd_xattr_set+0x14a/0x1d0 [osd_ldiskfs]
2012-11-08T20:14:29+01:00 n-mds1 [<ffffffff814efae5>] ? general_protection+0x25/0x30
2012-11-08T20:14:30+01:00 n-mds1 [<ffffffffa0dd4dbb>] ? osd_trans_stop+0xeb/0x390 [osd_ldiskfs]
2012-11-08T20:14:30+01:00 n-mds1 [<ffffffffa0cde12a>] ? mdd_trans_stop+0x1a/0x20 [mdd]
2012-11-08T20:14:30+01:00 n-mds1 [<ffffffffa0cc1036>] ? mdd_attr_set+0xbf6/0x2030 [mdd]
2012-11-08T20:14:30+01:00 n-mds1 [<ffffffffa0677820>] ? ldlm_completion_ast+0x0/0x6d0 [ptlrpc]
2012-11-08T20:14:30+01:00 n-mds1 [<ffffffffa069ae3c>] ? lustre_msg_get_versions+0x6c/0xb0 [ptlrpc]
2012-11-08T20:14:30+01:00 n-mds1 [<ffffffffa0e12a1c>] ? cml_attr_set+0x6c/0x160 [cmm]
2012-11-08T20:14:30+01:00 n-mds1 [<ffffffffa0d4e578>] ? mdt_attr_set+0x268/0x4b0 [mdt]
2012-11-08T20:14:30+01:00 n-mds1 [<ffffffffa0d4eb0d>] ? mdt_reint_setattr+0x34d/0x1060 [mdt]
2012-11-08T20:14:30+01:00 n-mds1 [<ffffffffa0d48e7b>] ? mdt_reint_rec+0x4b/0xa0 [mdt]
2012-11-08T20:14:30+01:00 n-mds1 [<ffffffffa0d41069>] ? mdt_reint_internal+0x479/0x7b0 [mdt]
2012-11-08T20:14:30+01:00 n-mds1 [<ffffffffa0d413ee>] ? mdt_reint+0x4e/0xb0 [mdt]
2012-11-08T20:14:30+01:00 n-mds1 [<ffffffffa0d37b9d>] ? mdt_handle_common+0x74d/0x1400 [mdt]
2012-11-08T20:14:30+01:00 n-mds1 [<ffffffffa0d38925>] ? mdt_regular_handle+0x15/0x20 [mdt]
2012-11-08T20:14:30+01:00 n-mds1 [<ffffffffa06a6011>] ? ptlrpc_server_handle_request+0x3c1/0xcb0 [ptlrpc]
2012-11-08T20:14:30+01:00 n-mds1 [<ffffffffa04373ee>] ? cfs_timer_arm+0xe/0x10 [libcfs]
2012-11-08T20:14:30+01:00 n-mds1 [<ffffffffa0441e19>] ? lc_watchdog_touch+0x79/0x110 [libcfs]
2012-11-08T20:14:31+01:00 n-mds1 [<ffffffffa06a00e2>] ? ptlrpc_wait_event+0xb2/0x2c0 [ptlrpc]
2012-11-08T20:14:31+01:00 n-mds1 [<ffffffff810519c3>] ? __wake_up+0x53/0x70
2012-11-08T20:14:31+01:00 n-mds1 [<ffffffffa06a701f>] ? ptlrpc_main+0x71f/0x1210 [ptlrpc]
2012-11-08T20:14:31+01:00 n-mds1 [<ffffffffa06a6900>] ? ptlrpc_main+0x0/0x1210 [ptlrpc]
2012-11-08T20:14:31+01:00 n-mds1 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
2012-11-08T20:14:31+01:00 n-mds1 [<ffffffffa06a6900>] ? ptlrpc_main+0x0/0x1210 [ptlrpc]
2012-11-08T20:14:31+01:00 n-mds1 [<ffffffffa06a6900>] ? ptlrpc_main+0x0/0x1210 [ptlrpc]
2012-11-08T20:14:31+01:00 n-mds1 [<ffffffff8100c140>] ? child_rip+0x0/0x20

In the 2.2 code, we added following code in osd_xattr_set():

        /* version set is not real XATTR */
        if (strcmp(name, XATTR_NAME_VERSION) == 0) {
                /* for version we are just using xattr API but change inode
                 * field instead */
                LASSERT(buf->lb_len == sizeof(dt_obj_version_t));
                osd_object_version_set(env, dt, buf->lb_buf);
                return sizeof(dt_obj_version_t);
        }

we should probably check if the "name" is NULL first. Alex, any thought?

Comment by Alex Zhuravlev [ 14/Nov/12 ]

hmm, I'd say we should assert on name=NULL.. and the caller must be fixed.

Comment by Adrian Ulrich (Inactive) [ 14/Nov/12 ]

This has become quite serious for us right now: Our filesystem is currently down: We entered a reboot -> panic -> reboot -> panic cycle:

Our MDS crashes with the same message as soon as we reboot it:
2012-11-14T15:45:07+01:00 n-mds2 [<ffffffffa0cde12a>] mdd_trans_stop+0x1a/0x20 [mdd]
2012-11-14T15:45:07+01:00 n-mds2 [<ffffffffa0cc1036>] mdd_attr_set+0xbf6/0x2030 [mdd]
2012-11-14T15:45:07+01:00 n-mds2 [<ffffffffa0677820>] ? ldlm_completion_ast+0x0/0x6d0 [ptlrpc]
2012-11-14T15:45:08+01:00 n-mds2 [<ffffffffa069ae3c>] ? lustre_msg_get_versions+0x6c/0xb0 [ptlrpc]
2012-11-14T15:45:08+01:00 n-mds2 [<ffffffffa0e12a1c>] cml_attr_set+0x6c/0x160 [cmm]
2012-11-14T15:45:08+01:00 n-mds2 [<ffffffffa0d4e578>] mdt_attr_set+0x268/0x4b0 [mdt]
2012-11-14T15:45:08+01:00 n-mds2 [<ffffffffa0d4eb0d>] mdt_reint_setattr+0x34d/0x1060 [mdt]
2012-11-14T15:45:08+01:00 n-mds2 [<ffffffffa0d48e7b>] mdt_reint_rec+0x4b/0xa0 [mdt]
2012-11-14T15:45:08+01:00 n-mds2 [<ffffffffa0d41069>] mdt_reint_internal+0x479/0x7b0 [mdt]
2012-11-14T15:45:08+01:00 n-mds2 [<ffffffffa0d413ee>] mdt_reint+0x4e/0xb0 [mdt]
2012-11-14T15:45:08+01:00 n-mds2 [<ffffffffa0d37b9d>] mdt_handle_common+0x74d/0x1400 [mdt]
2012-11-14T15:45:08+01:00 n-mds2 [<ffffffffa0d38925>] mdt_regular_handle+0x15/0x20 [mdt]
2012-11-14T15:45:08+01:00 n-mds2 [<ffffffffa06a6011>] ptlrpc_server_handle_request+0x3c1/0xcb0 [ptlrpc]
2012-11-14T15:45:08+01:00 n-mds2 [<ffffffffa04373ee>] ? cfs_timer_arm+0xe/0x10 [libcfs]
2012-11-14T15:45:08+01:00 n-mds2 [<ffffffffa0441e19>] ? lc_watchdog_touch+0x79/0x110 [libcfs]
2012-11-14T15:45:08+01:00 n-mds2 [<ffffffffa06a00e2>] ? ptlrpc_wait_event+0xb2/0x2c0 [ptlrpc]
2012-11-14T15:45:08+01:00 n-mds2 [<ffffffff810519c3>] ? __wake_up+0x53/0x70
2012-11-14T15:45:08+01:00 n-mds2 [<ffffffffa06a701f>] ptlrpc_main+0x71f/0x1210 [ptlrpc]
2012-11-14T15:45:08+01:00 n-mds2 [<ffffffffa06a6900>] ? ptlrpc_main+0x0/0x1210 [ptlrpc]
2012-11-14T15:45:09+01:00 n-mds2 [<ffffffff8100c14a>] child_rip+0xa/0x20
2012-11-14T15:45:09+01:00 n-mds2 [<ffffffffa06a6900>] ? ptlrpc_main+0x0/0x1210 [ptlrpc]
2012-11-14T15:45:09+01:00 n-mds2 [<ffffffffa06a6900>] ? ptlrpc_main+0x0/0x1210 [ptlrpc]
2012-11-14T15:45:09+01:00 n-mds2 [<ffffffff8100c140>] ? child_rip+0x0/0x20
2012-11-14T15:45:09+01:00 n-mds2 Code:

Comment by Johann Lombardi (Inactive) [ 14/Nov/12 ]

Adrian, have you tried to remount the MDS with "-o abort_recov"?

Comment by Adrian Ulrich (Inactive) [ 14/Nov/12 ]

No, but it seems to be 'stable' again.

After the 4th crash i started a 'fsck -n /dev/mapper/...', got impatient and aborted it after ~10 minutes.

After this i was able to start the MDS without any new crash: Looks like the '10 minute downtime' was enough to time out the 'evil' client/operation !?

Is there any way to see which client causes the crash?

Comment by ETHz Support (Inactive) [ 14/Nov/12 ]

Adrian,
I'm suggest to disable the cluster (to avoid pingpong) and mount manually the mds : mount -t lustre -L <MDT name> -o abort_recov <mount point>

Comment by Adrian Ulrich (Inactive) [ 14/Nov/12 ]

Johann:

Does the crash actually get triggered due to a client calling setaddr?

One of our most frequent setattr callers is an 1.8.4 client (10.201.32.32) - could this client be the cause of the crash?

[root@n-mds1 exports]# grep setattr */stats | awk '

{print $2 " " $1}

'|sort -n | tail -5
1694 10.201.38.39@o2ib/stats:setattr
1841 10.201.38.21@o2ib/stats:setattr
2062 10.201.38.23@o2ib/stats:setattr <-- 2.2.93 client
8931 10.201.32.32@o2ib/stats:setattr <-- 1.8.4 client
17278 10.201.32.31@o2ib/stats:setattr <-- 2.3.0 client

Comment by nasf (Inactive) [ 15/Nov/12 ]

I do not think it is NULL "name" for osd_xattr_set() caused the failure. In this case, the "name" is from MDS internal, not from client. I do not find any internal callers passing NULL "name". On the other hand, the other failure instances have different call traces.

Adrian, have you made any system upgrading recently when you hit the failure? There is no evidence to indicate that it is interoperability issue caused the failure, but we can try to locate the issue step by step. The first step, please migrate the system load from above non-2.2 clients to other standard-2.2 clients. Especially the 1.8.4 client, because it is too old. I do not think we have tested the interoperation between Lustre-2.2 server and Lustre-1.8.4 client when we released Lustre-2.2. So it is the most suspicious.

Comment by Alex Zhuravlev [ 15/Nov/12 ]

in the both cases the very first messages were about inability to add llog record:

LustreError: 3809:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff880482b99800

probably it's a problem in the code handling this error. I think we should reproduce this locally.

it makes sense to ls CONFIGS/ directory using ldiskfs or debugfs to see how much space can be free after orphan cleanup.

Comment by nasf (Inactive) [ 15/Nov/12 ]

Adrian, if you cannot abandon 1.8.4 clients, then please umount them temporarily, and try to reproduce the failure with other clients. If cannot reproduce, then it is quite possible related with the interoperability issues.

Comment by Adrian Ulrich (Inactive) [ 15/Nov/12 ]

Yong Fan:

  • We need the 1.8.x client to copy data from our old Lustre 1.8 installation to the 2.2 installation.
    I will temporarily disable it on the next crash (there are still users moving data around).
  • We didn't do any upgrades recently: We didn't touch the servers since months.
  • We have 2.3 clients because the 2.2-client is too unstable for us and i don't have enough time to backport all crash fixes from 2.3 to our 2.2-client

Alex: We get this llog_write_rec errors since months (Jun 2012) while our MDS only startet to crash recently (also note that there are ~5 hours between the error and the crash).

What does the llog_write_rec error mean anyway?

I'll post the output of 'ls CONFIGS/' as soon as i have a chance to remount the volume via ldiskfs (= on next crash or on monday evening)

Comment by Alex Zhuravlev [ 15/Nov/12 ]

ok, good to know.. llog_write_rec() getting -28 mean MDS was not able to write (transactionally) a record to make some updates to OST (remove OST object, update OST object attributes).

then could you attach osd_ldiskfs.ko please ?

Comment by Adrian Ulrich (Inactive) [ 15/Nov/12 ]

md5sum = 6a07cbbb49f63ea0f6a5e6bc067bc7c9

requested kernel module

Comment by Adrian Ulrich (Inactive) [ 15/Nov/12 ]

Alex: I attached the requested kernel module (or did you mean with 'attach' that i should insmod it? )

Comment by Alex Zhuravlev [ 15/Nov/12 ]

no, i've got what I need, thanks

a bit of analysis here:

0000000000004d26 <osd_trans_stop+0x56> mov 0x50(%rbx),%r12
0000000000004d2a <osd_trans_stop+0x5a> test %r12,%r12
0000000000004d2d <osd_trans_stop+0x5d> je 0000000000004e82 <osd_trans_stop+0x1b2>
0000000000004d33 <osd_trans_stop+0x63> movzbl 0x28(%r12),%eax
0000000000004d39 <osd_trans_stop+0x69> movzbl 0x4c(%rbx),%edx
0000000000004d3d <osd_trans_stop+0x6d> and $0xfffffffe,%eax
0000000000004d40 <osd_trans_stop+0x70> and $0x1,%edx
0000000000004d43 <osd_trans_stop+0x73> or %edx,%eax
0000000000004d45 <osd_trans_stop+0x75> mov %al,0x28(%r12)
0000000000004d4a <osd_trans_stop+0x7a> mov (%r12),%rax

so rbx contains pointer to oh:

(gdb) p/x sizeof(struct thandle)
$2 = 0x50
struct osd_thandle {
struct thandle ot_super;
handle_t *ot_handle;

0000000000004db3 <osd_trans_stop+0xe3> mov (%rbx),%rax
0000000000004db6 <osd_trans_stop+0xe6> test %rax,%rax
0000000000004db9 <osd_trans_stop+0xe9> je 0000000000004dc4 <osd_trans_stop+0xf4>
0000000000004dbb <osd_trans_stop+0xeb> mov 0x8(%rax),%rax
0000000000004dbf <osd_trans_stop+0xef> testb $0x1,(%rax)

these lines implement:
if (lu_device_is_md(&th->th_dev->dd_lu_dev)) {

RAX: 0006000100000002 is supposed to be ld_type (and 0x8(%rax) is ld_type->ldt_tags)

IOW, thandle was broken and pointing to garbage instead of a device.

now the question what broke that..

Comment by ETHz Support (Inactive) [ 15/Nov/12 ]

Adrian,
if I remember well you can use debufs with the device mounted. Try:

debugfs -c -R 'dump CONFIGS/ /tmp/config' /dev/<device>
llog_reader /tmp/config

Comment by Adrian Ulrich (Inactive) [ 16/Nov/12 ]

I did this using a snapshot from the MDS (taken at 5. November).
The output of llog_reader is attached to the case (llog.txt)

Output of CONFIGS/ via debugfs:

$ debugfs mds.dump
debugfs 1.41.12 (17-May-2010)
debugfs: ls -l CONFIGS
467550721 40777 (2) 0 0 4096 9-Oct-2012 07:05 .
2 40755 (2) 0 0 4096 3-May-2012 14:58 ..
467550722 100644 (1) 0 0 12288 9-May-2012 09:21 mountdata
467550723 100644 (1) 0 0 0 3-May-2012 14:58 _mgs-sptlrpc
467550724 100644 (1) 0 0 89128 3-May-2012 14:58 nero-client
467550725 100644 (1) 0 0 0 3-May-2012 14:58 nero-sptlrpc
467550726 100644 (1) 0 0 89000 3-May-2012 14:58 nero-MDT0000
467550727 100644 (1) 0 0 0 3-May-2012 14:58 changelog_catalog
467550728 100644 (1) 0 0 0 3-May-2012 14:58 changelog_users
467550730 100644 (1) 0 0 9432 3-May-2012 15:05 nero-OST0000
467550729 100644 (1) 0 0 0 9-Oct-2012 07:05 sptlrpc
467550731 100644 (1) 0 0 9432 3-May-2012 15:08 nero-OST0008
467550732 100644 (1) 0 0 9432 3-May-2012 15:09 nero-OST0010
467550733 100644 (1) 0 0 9432 3-May-2012 15:09 nero-OST0018
467550734 100644 (1) 0 0 9432 3-May-2012 15:58 nero-OST0001
467550735 100644 (1) 0 0 9432 3-May-2012 15:58 nero-OST0009
467550736 100644 (1) 0 0 9432 3-May-2012 15:59 nero-OST0011
467550737 100644 (1) 0 0 9432 3-May-2012 15:59 nero-OST0019
467550738 100644 (1) 0 0 9432 3-May-2012 16:00 nero-OST0002
467550739 100644 (1) 0 0 9432 3-May-2012 16:01 nero-OST000a
467550740 100644 (1) 0 0 9432 3-May-2012 16:04 nero-OST0012
467550741 100644 (1) 0 0 9432 3-May-2012 16:05 nero-OST001a
467550742 100644 (1) 0 0 9432 3-May-2012 16:06 nero-OST0003
467550743 100644 (1) 0 0 9432 3-May-2012 16:06 nero-OST000b
467550744 100644 (1) 0 0 9432 3-May-2012 16:06 nero-OST0013
467550745 100644 (1) 0 0 9432 3-May-2012 16:07 nero-OST001b
467550746 100644 (1) 0 0 9432 3-May-2012 16:11 nero-OST0004
467550747 100644 (1) 0 0 9432 3-May-2012 16:11 nero-OST000c
467550748 100644 (1) 0 0 9432 3-May-2012 16:12 nero-OST0014
467550749 100644 (1) 0 0 9432 3-May-2012 16:12 nero-OST001c
467550750 100644 (1) 0 0 9432 3-May-2012 16:14 nero-OST0005
467550751 100644 (1) 0 0 9432 3-May-2012 16:14 nero-OST000d
467550752 100644 (1) 0 0 9432 3-May-2012 16:14 nero-OST0015
467550753 100644 (1) 0 0 9432 3-May-2012 16:15 nero-OST001d
467550754 100644 (1) 0 0 9432 3-May-2012 16:18 nero-OST0006
467550755 100644 (1) 0 0 9432 3-May-2012 16:18 nero-OST000e
467550756 100644 (1) 0 0 9432 3-May-2012 16:18 nero-OST0016
467550757 100644 (1) 0 0 9432 3-May-2012 16:18 nero-OST001e
467550758 100644 (1) 0 0 9432 3-May-2012 16:21 nero-OST0007
467550759 100644 (1) 0 0 9432 3-May-2012 16:21 nero-OST000f
467550760 100644 (1) 0 0 9432 3-May-2012 16:22 nero-OST0017
467550761 100644 (1) 0 0 9432 3-May-2012 16:22 nero-OST001f

Comment by Zhenyu Xu [ 22/Nov/12 ]

Adrian,

Did you have opportunity to try excluding 1.8.x clients to check whether the MDS still crashes with only 2.x clients accessing it?

Comment by Adrian Ulrich (Inactive) [ 22/Nov/12 ]

Well, the problem is that i can not reproduce the crash and i did not see any new crashes since 14. November.

(The crash was probably caused by an user job: There are about ~800 users on our cluster and i have no way to figure out what job crashed it).

But in any case: Even if the crash was triggered by an 1.8.x client: It should get fixed, shouldn't it?

And do we have any news about the llog_write_rec error? (did the debugfs output help?)

Comment by Zhenyu Xu [ 22/Nov/12 ]

Yes, even it's 1.8.x client problem we should fix it. The purpose of the question is trying to help to make out which area to find the root cause.

I'm still investigating the llog part issue.

Comment by Zhenyu Xu [ 23/Nov/12 ]

I think the "LustreError: 31980:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff88042d450240" is a misleading message, the message only means the current log does not has enough space for the log record, it will create a new log for it later.

int llog_cat_add_rec(struct llog_handle *cathandle, struct llog_rec_hdr *rec,
                     struct llog_cookie *reccookie, void *buf)
{
        struct llog_handle *loghandle;
        int rc;
        ENTRY;

        LASSERT(rec->lrh_len <= LLOG_CHUNK_SIZE);
        loghandle = llog_cat_current_log(cathandle, 1);
        if (IS_ERR(loghandle))
                RETURN(PTR_ERR(loghandle));
        /* loghandle is already locked by llog_cat_current_log() for us */
        rc = llog_write_rec(loghandle, rec, reccookie, 1, buf, -1);
        if (rc < 0)
                CERROR("llog_write_rec %d: lh=%p\n", rc, loghandle);
        cfs_up_write(&loghandle->lgh_lock);
        if (rc == -ENOSPC) {
                /* to create a new plain log */
                loghandle = llog_cat_current_log(cathandle, 1);
                if (IS_ERR(loghandle))
                        RETURN(PTR_ERR(loghandle));
                rc = llog_write_rec(loghandle, rec, reccookie, 1, buf, -1);
                cfs_up_write(&loghandle->lgh_lock);
        }

        RETURN(rc);
}
Comment by Niu Yawei (Inactive) [ 29/Nov/12 ]

After checking the 2.2 code carefully, I found a culprit which can cause such memory corruption:

in mdd_declare_attr_set():

#ifdef CONFIG_FS_POSIX_ACL
        if (ma->ma_attr.la_valid & LA_MODE) {
                mdd_read_lock(env, obj, MOR_TGT_CHILD);
                rc = mdo_xattr_get(env, obj, buf, XATTR_NAME_ACL_ACCESS,
                                   BYPASS_CAPA);
                mdd_read_unlock(env, obj);
                if (rc == -EOPNOTSUPP || rc == -ENODATA)
                        rc = 0;
                else if (rc < 0)
                        return rc;

Our intention here is to retrieve the xattr length, but we passed an uninitialized buffer to mdo_xattr_get() (we should pass NULL here)...
Actually this bug has been fixed for 2.3 & 2.4 (see http://review.whamcloud.com/#change,3928 & LU-1823), I think we need to backport it to 2.2.

Comment by Niu Yawei (Inactive) [ 29/Nov/12 ]

backport the memory corruption fix in mdd_declare_attr_set() to b2_2: http://review.whamcloud.com/4703

Comment by Adrian Ulrich (Inactive) [ 30/Nov/12 ]

Thanks for fixing this issue: We will upgrade our MDS as soon as a new build becomes available – or should we just upgrade to 2.3?

Comment by Peter Jones [ 30/Nov/12 ]

Adrian a build of the change backported to 2.2 already exists - http://build.whamcloud.com/job/lustre-reviews/10853/ - but is still in the automated test queue at the moment. Lustre 2.3 is available now and has been thoroughly tested. It will of course include other content beyond just this one fix (both additional features and many other fixes)

Comment by Peter Jones [ 06/Dec/12 ]

Adrian

Have you decided which approach you will take - to patch 2.2 or upgrade to 2.3?

Peter

Comment by Adrian Ulrich (Inactive) [ 06/Dec/12 ]

Hello Peter,

We will upgrade to 2.3 as soon as the next opportunity arises, you can therefore close this issue.

Thanks and best regards,
Adrian

Comment by Peter Jones [ 06/Dec/12 ]

ok thanks Adrian!

Generated at Sat Feb 10 01:24:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.