[LU-5150] NULL pointer dereference in posix_acl_valid() under mdc_get_lustre_md() - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: Lustre 2.6.0, Lustre 2.7.0, Lustre 2.5.4
Affects Version/s: Lustre 2.4.2, Lustre 2.5.3
Labels:
- llnl
- mn4
- prz

Severity:
2
Rank (Obsolete):
14213

Description

After upgrading our servers from Lustre 2.4.0-28chaos to Lustre 2.4.2-11chaos (see github.com/chaos/lustre), we are seeing many client crashes with a NULL pointer dereference in posix_acl_valid() under mdc_get_lustre_md(). Note that both 2.4.0-19chaos client nodes and 2.4.2-11chaos client nodes are exhibiting this behavior.

The backtrace looks like:

PID: 3690   TASK: ffff880338d69540  CPU: 7   COMMAND: "ll_sa_3689"
 #0 [ffff8802ddf51800] machine_kexec+0x18b at ffffffff810391ab
 #1 [ffff8802ddf51860] crash_kexec+0x72 at ffffffff810c5d52
 #2 [ffff8802ddf51930] oops_end+0xc0 at ffffffff8152e630
 #3 [ffff8802ddf51960] no_context+0xfb at ffffffff8104a00b
 #4 [ffff8802ddf519b0] __bad_area_nosemaphore+0x125 at ffffffff8104a295
 #5 [ffff8802ddf51a00] bad_area_nosemaphore+0x13 at ffffffff8104a363
 #6 [ffff8802ddf51a10] __do_page_fault+0x32f at ffffffff8104aacf
 #7 [ffff8802ddf51b30] do_page_fault+0x3e at ffffffff8153057e
 #8 [ffff8802ddf51b60] page_fault+0x25 at ffffffff8152d935
    [exception RIP: posix_acl_valid+9]
    RIP: ffffffff811ea9b9  RSP: ffff8802ddf51c10  RFLAGS: 00010207
    RAX: 0000000000000000  RBX: ffff8805ec607000  RCX: ffff8805ebddda00
    RDX: 0000000000000004  RSI: 0000000000000004  RDI: 0000000000000000
    RBP: ffff8802ddf51c10   R8: 0000000000000000   R9: ffff8805ec46dc40
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff88063987bc00
    R13: ffff8802ddf51cf0  R14: 0000000000000000  R15: 0000000000000050
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff8802ddf51c18] mdc_get_lustre_md+0xc5a at ffffffffa0af4faa [mdc]
#10 [ffff8802ddf51c78] lmv_get_lustre_md+0x153 at ffffffffa0d668d3 [lmv]
#11 [ffff8802ddf51cc8] ll_prep_inode+0x3f7 at ffffffffa0c7e217 [lustre]
#12 [ffff8802ddf51da8] ll_post_statahead+0x2f7 at ffffffffa0ca0577 [lustre]
#13 [ffff8802ddf51e18] ll_statahead_thread+0xd38 at ffffffffa0ca4ff8 [lustre]
#14 [ffff8802ddf51f48] child_rip+0xa at ffffffff8100c24a

The crash is on this line in posix_acl_valid():

crash> gdb list *(posix_acl_valid+9)
0xffffffff811ea9b9 is in posix_acl_valid (fs/posix_acl.c:88).
83              const struct posix_acl_entry *pa, *pe;
84              int state = ACL_USER_OBJ;
85              unsigned int id = 0;  /* keep gcc happy */
86              int needs_mask = 0;
87     
88              FOREACH_ACL_ENTRY(pa, acl, pe) {
89                      if (pa->e_perm & ~(ACL_READ|ACL_WRITE|ACL_EXECUTE))
90                              return -EINVAL;
91                      switch (pa->e_tag) {
92                              case ACL_USER_OBJ:

The problem is not particular to the statahead thread. That was just one example. Here is another where I ran getfattr on the problem file:

2014-06-05 14:51:29 BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
2014-06-05 14:51:29 IP: [<ffffffff811ea9b9>] posix_acl_valid+0x9/0x120
2014-06-05 14:51:29 PGD 638f93067 PUD 5f1873067 PMD 0 
2014-06-05 14:51:29 Oops: 0000 [#1] SMP 
2014-06-05 14:51:29 last sysfs file: /sys/devices/system/edac/pci/pci_parity_count
2014-06-05 14:51:29 CPU 10 
2014-06-05 14:51:29 Modules linked in: lmv(U) mgc(U) zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate lustre(U) lov(U) osc(U) mdc(U
2014-06-05 14:51:29 
2014-06-05 14:51:29 Pid: 6114, comm: getfattr Tainted: P           ---------------    2.6.32-431.17.2.1chaos.ch5.2.x86_64 #1 Dell     XS23-TY35   /0GW08P
2014-06-05 14:51:29 RIP: 0010:[<ffffffff811ea9b9>]  [<ffffffff811ea9b9>] posix_acl_valid+0x9/0x120
2014-06-05 14:51:29 RSP: 0018:ffff8805f0e698d8  EFLAGS: 00010207
2014-06-05 14:51:29 RAX: 0000000000000000 RBX: ffff880639212000 RCX: ffff8805f1870a00
2014-06-05 14:51:29 RDX: 0000000000000004 RSI: 0000000000000004 RDI: 0000000000000000
2014-06-05 14:51:29 RBP: ffff8805f0e698d8 R08: 0000000000000000 R09: 0000000000000040
2014-06-05 14:51:29 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8805f3b22c00
2014-06-05 14:51:29 R13: ffff8805f0e699b8 R14: 0000000000000000 R15: 0000000000000050
2014-06-05 14:51:29 FS:  00002aaaab266fa0(0000) GS:ffff88034ac80000(0000) knlGS:0000000000000000
2014-06-05 14:51:29 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
2014-06-05 14:51:29 CR2: 0000000000000004 CR3: 0000000638355000 CR4: 00000000000007e0
2014-06-05 14:51:29 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2014-06-05 14:51:29 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
2014-06-05 14:51:29 Process getfattr (pid: 6114, threadinfo ffff8805f0e68000, task ffff88063a007500)
2014-06-05 14:51:29 Stack:
2014-06-05 14:51:29  ffff8805f0e69938 ffffffffa091dfaa ffff8805f1870a00 ffff8806389acc00
2014-06-05 14:51:29 <d> 0000000000000000 0000000000000000 6c0db3630c468718 ffff8806389acc00
2014-06-05 14:51:29 <d> ffff880639212000 ffff880639212400 ffff8805f0e699b8 ffff8805f3b22c00
2014-06-05 14:51:29 Call Trace:
2014-06-05 14:51:29  [<ffffffffa091dfaa>] mdc_get_lustre_md+0xc5a/0x1420 [mdc]
2014-06-05 14:51:29  [<ffffffffa0d6c8d3>] lmv_get_lustre_md+0x153/0x3d0 [lmv]
2014-06-05 14:51:29  [<ffffffffa0a99217>] ll_prep_inode+0x3f7/0xf60 [lustre]
2014-06-05 14:51:29  [<ffffffffa079faa8>] ? req_capsule_server_get+0x18/0x20 [ptlrpc]
2014-06-05 14:51:29  [<ffffffffa0d85e4a>] ? lmv_intent_lookup+0x25a/0x770 [lmv]
2014-06-05 14:51:29  [<ffffffffa0aa53b0>] ? ll_md_blocking_ast+0x0/0x740 [lustre]
2014-06-05 14:51:29  [<ffffffffa0aa89aa>] ll_lookup_it_finish+0x1da/0xe80 [lustre]
2014-06-05 14:51:29  [<ffffffffa0d86fca>] ? lmv_intent_lock+0x32a/0x380 [lmv]
2014-06-05 14:51:29  [<ffffffffa0aa53b0>] ? ll_md_blocking_ast+0x0/0x740 [lustre]
2014-06-05 14:51:29  [<ffffffffa0aa9a3d>] ll_lookup_it+0x3ed/0xbd0 [lustre]
2014-06-05 14:51:29  [<ffffffffa0aa53b0>] ? ll_md_blocking_ast+0x0/0x740 [lustre]
2014-06-05 14:51:29  [<ffffffffa0aaa2ac>] ll_lookup_nd+0x8c/0x430 [lustre]
2014-06-05 14:51:29  [<ffffffff811a457e>] ? d_alloc+0x13e/0x1b0
2014-06-05 14:51:29  [<ffffffff811998a5>] do_lookup+0x1a5/0x230
2014-06-05 14:51:29  [<ffffffff81199fb7>] __link_path_walk+0x587/0x850
2014-06-05 14:51:29  [<ffffffff811680ea>] ? alloc_pages_current+0xaa/0x110
2014-06-05 14:51:29  [<ffffffff8119a97a>] path_walk+0x6a/0xe0
2014-06-05 14:51:29  [<ffffffff8119ab8b>] filename_lookup+0x6b/0xc0
2014-06-05 14:51:29  [<ffffffff8119bcb7>] user_path_at+0x57/0xa0
2014-06-05 14:51:29  [<ffffffff8104a9a4>] ? __do_page_fault+0x204/0x490
2014-06-05 14:51:29  [<ffffffff8128ae05>] ? rb_insert_color+0x125/0x160
2014-06-05 14:51:29  [<ffffffff8114f020>] ? __vma_link_rb+0x30/0x40
2014-06-05 14:51:29  [<ffffffff8118f7a0>] vfs_fstatat+0x50/0xa0
2014-06-05 14:51:29  [<ffffffff8118f85e>] vfs_lstat+0x1e/0x20
2014-06-05 14:51:29  [<ffffffff8118f884>] sys_newlstat+0x24/0x50
2014-06-05 14:51:29  [<ffffffff8153057e>] ? do_page_fault+0x3e/0xa0
2014-06-05 14:51:29  [<ffffffff8152d935>] ? page_fault+0x25/0x30
2014-06-05 14:51:29  [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

vmcore-dmesg.txt
69 kB
09/Jun/14 8:43 AM

Issue Links

is related to

LU-4787 OOPS from null pointer dereference in mdc_unpack_acl

Resolved

is related to

LU-4680 Invalid system.posix_acl_default breaks umask on ZFS MDT

Resolved

mentioned in: Page No Confluence page found with the given URL.

Activity

[LU-5150] NULL pointer dereference in posix_acl_valid() under mdc_get_lustre_md()

Christopher Morrone (Inactive) added a comment - 06/Jun/14 3:31 AM

Oh, I didn't newly mount the client. That is not a useful approach if that is needed.

Most of users will enable "cal" by default, so I want to make sure whether the switch "-o acl/noacl" really works or not.

"acl" is in the list of mdc connect_flags, yes.

BTW, the new mounted client also will crash after the MDT mounted with "-o noacl" ?

If we have to remount the client, we might as well just install my client fix. That isn't going to be useful.

Christopher Morrone (Inactive) added a comment - 06/Jun/14 3:31 AM Oh, I didn't newly mount the client. That is not a useful approach if that is needed. Most of users will enable "cal" by default, so I want to make sure whether the switch "-o acl/noacl" really works or not. "acl" is in the list of mdc connect_flags, yes. BTW, the new mounted client also will crash after the MDT mounted with "-o noacl" ? If we have to remount the client, we might as well just install my client fix. That isn't going to be useful.

nasf (Inactive) added a comment - 06/Jun/14 3:23 AM

Would you please to check what the connect_flags (/proc/fs/lustre/mdc/xxx/connect_flags) are on the crashed client? Most of users will enable "cal" by default, so I want to make sure whether the switch "-o acl/noacl" really works or not. Thanks!

BTW, the new mounted client also will crash after the MDT mounted with "-o noacl" ?

nasf (Inactive) added a comment - 06/Jun/14 3:23 AM Would you please to check what the connect_flags (/proc/fs/lustre/mdc/xxx/connect_flags) are on the crashed client? Most of users will enable "cal" by default, so I want to make sure whether the switch "-o acl/noacl" really works or not. Thanks! BTW, the new mounted client also will crash after the MDT mounted with "-o noacl" ?

Christopher Morrone (Inactive) added a comment - 06/Jun/14 3:07 AM

Mounting the MDT device with "-o noacl" does not seem to help. Clients still crash.

Christopher Morrone (Inactive) added a comment - 06/Jun/14 3:07 AM Mounting the MDT device with "-o noacl" does not seem to help. Clients still crash.

nasf (Inactive) added a comment - 06/Jun/14 3:01 AM

"0" case means that there is only ACL header, then the ACL size is not zero, but the ACL entries count in the header in zero. Such case should be an invalid ACL.

nasf (Inactive) added a comment - 06/Jun/14 3:01 AM "0" case means that there is only ACL header, then the ACL size is not zero, but the ACL entries count in the header in zero. Such case should be an invalid ACL.

Christopher Morrone (Inactive) added a comment - 06/Jun/14 2:58 AM

I'll ask if they are OK with disabling acl, and test if that works.

Option 2 is probably our more preferable one. What do you mean about the "0 case"?

Christopher Morrone (Inactive) added a comment - 06/Jun/14 2:58 AM I'll ask if they are OK with disabling acl, and test if that works. Option 2 is probably our more preferable one. What do you mean about the "0 case"?

nasf (Inactive) added a comment - 06/Jun/14 2:41 AM

Hi Chris,

Under the urgent case, there are two temporary solutions may be helpful for your system:

1) I am not sure how much your system depends on the ACL for permission. If it is not too much, you can mount the MDT device with "-o noacl", then as expected, all ACL related code should be disabled automatically.

2) If option 1) is not suitable for you, and if we cannot figure out why the bad ACL is generated in a short time, then we can filter out the bad ACL on the MDS before it returned to the client via checking the ACL size and the ACL entries count (0 case will also cause client side to interpret it as NULL ACL). There are only two entries for such checking: mdt_finish_open() and mdt_getattr_internal(), or we can add the filter inside the mdd_xattr_get(). Anyway, it is relative easy to make the patch.

nasf (Inactive) added a comment - 06/Jun/14 2:41 AM Hi Chris, Under the urgent case, there are two temporary solutions may be helpful for your system: 1) I am not sure how much your system depends on the ACL for permission. If it is not too much, you can mount the MDT device with "-o noacl", then as expected, all ACL related code should be disabled automatically. 2) If option 1) is not suitable for you, and if we cannot figure out why the bad ACL is generated in a short time, then we can filter out the bad ACL on the MDS before it returned to the client via checking the ACL size and the ACL entries count (0 case will also cause client side to interpret it as NULL ACL). There are only two entries for such checking: mdt_finish_open() and mdt_getattr_internal(), or we can add the filter inside the mdd_xattr_get(). Anyway, it is relative easy to make the patch.

Christopher Morrone (Inactive) added a comment - 06/Jun/14 2:11 AM

This is not to say that the ticket is done. We still would love to have a server fix ASAP. Even a server-side workaround to hide the problem xattr would be good until we decide on the longer term server fix.

Christopher Morrone (Inactive) added a comment - 06/Jun/14 2:11 AM This is not to say that the ticket is done. We still would love to have a server fix ASAP. Even a server-side workaround to hide the problem xattr would be good until we decide on the longer term server fix.

Christopher Morrone (Inactive) added a comment - 06/Jun/14 2:10 AM

Yes, the client side fix works:

http://review.whamcloud.com/10620

Note that in the patch I added an extra check for NULL rather than just changing IS_ERR to IS_ERR_OR_NULL because I don't want to see those CERROR messages constantly in the NULL case.

Christopher Morrone (Inactive) added a comment - 06/Jun/14 2:10 AM Yes, the client side fix works: http://review.whamcloud.com/10620 Note that in the patch I added an extra check for NULL rather than just changing IS_ERR to IS_ERR_OR_NULL because I don't want to see those CERROR messages constantly in the NULL case.

Christopher Morrone (Inactive) added a comment - 06/Jun/14 1:49 AM

Also, upgrading our clients is going to be tough, and this particular problem isn't easy to fix with systemtap on the clients. Is there an easy way to filter out this xattr on the MDS side when the value is bad, so that the client never sees it? A server side patch would get us up an working again quickly.

Christopher Morrone (Inactive) added a comment - 06/Jun/14 1:49 AM Also, upgrading our clients is going to be tough, and this particular problem isn't easy to fix with systemtap on the clients. Is there an easy way to filter out this xattr on the MDS side when the value is bad, so that the client never sees it? A server side patch would get us up an working again quickly.

Christopher Morrone (Inactive) added a comment - 06/Jun/14 1:47 AM

We have had some trouble in this area before. When we do something like this on Lustre:

$ mkdir x
$ cp -rp x y

you wind up with the following xattr set:

$ getfattr -d -m. y
# file: y
system.posix_acl_default=0sAgAAAA==

Now, if we create a file under that directory with our Lustre 2.4.0-28chaos and older version of ZFS, the newly created file has...nothing unusual.

But if we create a file in that directory under Lustre 2.4.0-28chaos with the latest ZFS, we see that it winds up with this acl:

system.posix_acl_access = \002\000\000\000

We needed to use ZDB to get that through the posix mount of the MDT, because both 2.4.0 and 2.4.2 clients crash when trying to access that xattr.

In mdc_unpack_acl(), we have this code:

        acl = posix_acl_from_xattr(buf, body->aclsize);
        if (IS_ERR(acl)) {
                rc = PTR_ERR(acl);
                CERROR("convert xattr to acl: %d\n", rc);
                RETURN(rc);
        }

        rc = posix_acl_valid(acl);
        if (rc) {
                CERROR("validate acl: %d\n", rc);
                posix_acl_release(acl);
                RETURN(rc);
        }

My belief is that bufsize is nonzero, but posix_acl_from_xattr() is returning NULL. We don't check for NULL, only if the returned pointer has an error code set. We then pass the NULL pointer to posix_acl_valid(), which in turn crashes.

I have a patch to handle a NULL pointer that I am about to test, and will push to gerrit if it fixes the client.

However, this still leaves the question of what has changed to allow that almost-empty xattr to be created in the first place, and what we should do about it.

Christopher Morrone (Inactive) added a comment - 06/Jun/14 1:47 AM We have had some trouble in this area before. When we do something like this on Lustre: $ mkdir x $ cp -rp x y you wind up with the following xattr set: $ getfattr -d -m. y # file: y system.posix_acl_default=0sAgAAAA== Now, if we create a file under that directory with our Lustre 2.4.0-28chaos and older version of ZFS, the newly created file has...nothing unusual. But if we create a file in that directory under Lustre 2.4.0-28chaos with the latest ZFS, we see that it winds up with this acl: system.posix_acl_access = \002\000\000\000 We needed to use ZDB to get that through the posix mount of the MDT, because both 2.4.0 and 2.4.2 clients crash when trying to access that xattr. In mdc_unpack_acl(), we have this code: acl = posix_acl_from_xattr(buf, body->aclsize); if (IS_ERR(acl)) { rc = PTR_ERR(acl); CERROR( "convert xattr to acl: %d\n" , rc); RETURN(rc); } rc = posix_acl_valid(acl); if (rc) { CERROR( "validate acl: %d\n" , rc); posix_acl_release(acl); RETURN(rc); } My belief is that bufsize is nonzero, but posix_acl_from_xattr() is returning NULL. We don't check for NULL, only if the returned pointer has an error code set. We then pass the NULL pointer to posix_acl_valid(), which in turn crashes. I have a patch to handle a NULL pointer that I am about to test, and will push to gerrit if it fixes the client. However, this still leaves the question of what has changed to allow that almost-empty xattr to be created in the first place, and what we should do about it.

Peter Jones added a comment - 06/Jun/14 12:16 AM

Lai

Could you please look into this?

Thanks

Peter

Peter Jones added a comment - 06/Jun/14 12:16 AM Lai Could you please look into this? Thanks Peter

People

Assignee:: Lai Siyao

Reporter:: Christopher Morrone (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 05/Jun/14 10:11 PM

Updated:: 26/Feb/15 10:24 PM

Resolved:: 25/Aug/14 9:22 PM