[LU-2610] sanity test_103: ACL copy failed zfs - umask ignored Created: 12/Jan/13  Updated: 29/Oct/14  Resolved: 29/Oct/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: LB, sequoia, zfs
Environment:

lustre-master #1142 OFED build ZFS


Issue Links:
Duplicate
is duplicated by LU-2672 umask ignored on mkdir Closed
Related
is related to LU-2804 umask in default ACL is not being tra... Resolved
Severity: 3
Rank (Obsolete): 6094

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/e2ede512-5ae4-11e2-8985-52540035b04c.

The sub-test test_103 failed with the following error:

== sanity test 103: acl test =========================================== 16:53:11 (1357692791)
performing cp ...
[3] $ umask 022 -- ok
[4] $ mkdir d -- ok
[5] $ cd d -- ok
[6] $ touch f -- ok
[7] $ setfacl -m u:bin:rw f -- ok
[8] $ ls -l f | awk -- '{ print $1 }' -- failed
-rw-rw-r--+                           ? -rw-rw-rw-+                            
[11] $ cp f g -- ok
[12] $ ls -l g | awk -- '{ print $1 }' -- failed
-rw-r--r--                            ? -rw-rw-rw-                             
[15] $ rm g -- ok
[16] $ cp -p f g -- ok
[17] $ ls -l f | awk -- '{ print $1 }' -- failed
-rw-rw-r--+                           ? -rw-rw-rw-+                            
[20] $ mkdir h -- ok
[21] $ echo blubb > h/x -- ok
[22] $ cp -rp h i -- ok
[23] $ cat i/x -- ok
[26] $ rm -r i -- ok
[31] $ setfacl -R -m u:bin:rwx h -- ok
[32] $ getfacl --omit-header h/x -- failed
user::rw-                             | user::rw-                              
user:bin:rwx                          | user:bin:rwx                           
group::r--                            ? group::rw-                             
mask::rwx                             | mask::rwx                              
other::r--                            ? other::rw-                             
                                      |                                        
[40] $ cp -rp h i -- ok
[41] $ getfacl --omit-header i/x -- failed
user::rw-                             | user::rw-                              
user:bin:rwx                          | user:bin:rwx                           
group::r--                            ? group::rw-                             
mask::rwx                             | mask::rwx                              
other::r--                            ? other::rw-                             
                                      |                                        
[49] $ cd .. -- ok
[50] $ rm -r d -- ok
22 commands (17 passed, 5 failed)
 sanity test_103: @@@@@@ FAIL:  


 Comments   
Comment by Nathaniel Clark [ 21/Jan/13 ]

In test 103 (over the wire) the REINT_CREATE packet exchange:
for both ldiskfs and zfs:
Mode: 040777
Umask: 0022

ldiskfs returned Mode: 040755

ZFS returned Mode: 040777

ZFS appears to be ignoring the passed umask.

Comment by Nathaniel Clark [ 25/Jan/13 ]

http://review.whamcloud.com/5174

Comment by Christopher Morrone [ 25/Jan/13 ]

Interesting. On first glance at your patch I thought "that can't be right", but now I see what is going on. It is surprising to me that the server knows about the umask of the calling thread. Do you have any idea why it is designed this way? Why isn't the umask simply applied to the mode inside the lustre client?

If it wasn't before, it seems like a bad design now that we can support multiple OSD. Making every osd implement the same thing seems error prone, which is what led to this bug.

Comment by Nathaniel Clark [ 27/Jan/13 ]

It's because POSIX ACLs and umask interact and it's up to the filesystem to work out the correct interaction.

I think ACL + umask calculations are done in OSD because of the different way's ACLs are stored (that's my impression from working through the code), but I agree it isn't well encapsulated.

Comment by Lai Siyao [ 28/Jan/13 ]

Chris, it's designed to be so because the creation mode is decided by umask and parent default ACL, and it's not a good idea to hold parent UPDATE lock by client in creation. So finally it's decided to send calling thread umask to MDS and let server backend filesystem to calculate the correct mode.

Linux kernel also lets bottom filesystem to do this instead of in VFS, so that ldiskfs can get the correct mode. I don't find ZFS does this at the first glance of code. Nathaniel, could you test this with ZFS as local filesystem?

Comment by Nathaniel Clark [ 28/Jan/13 ]

ZFS does behave correctly as a local filesystem (in my limited testing). It honoured umask during file creation, I didn't do extensive umask vs ACL testing though.

Comment by Lai Siyao [ 04/Feb/13 ]

Ldiskfs handles ACL and creation mode in low level, see ldiskfs_new_inode(), in contrast, ZFS handles this in ZPL layer, however lustre calls DMU directly to make new node, that's why we see different result in these two OSD backend filesystem.

Nathaniel, now that you want to fix all this in MDD layer, I think you need check the patch for LU-974, because in that patch it's left to be done in OSD.

Comment by Nathaniel Clark [ 04/Feb/13 ]

My patch still has the OSD layer taking care of setting the mode. The mdd change just ensures that if mode+umask is different the the current computed node that the osd layer is called finish setting that mode. I was cognizant of LU-974 patch, and I didn't want to change what layer set the mode.

I could update my patch so that osd-ldiskfs takes advantage of the cr_umask instead of passing it in via current->fs->umask from mdd, which would essentially just move where current->fs->umask gets set from mdd_internal.h::mdo_create_obj to osd_handler.c::__osd_object_create.

Comment by Lai Siyao [ 04/Feb/13 ]

Actually for ldiskfs __mdd_acl_init() is redundant, because ACL is handled internally by ldiskfs in creation already. IMHO this function exists because in original design, ACL initialization should be done in MDD layer, and only the interfaces of get/set xattr are needed for OSD. (this is how ZFS OSD is implemented).

If this is true, __mdd_acl_init() should act the same as ldiskfs_init_acl(), and also note __mdd_acl_init() is now inside #ifdef CONFIG_FS_POSIX_ACL and if (got_def_acl), this will not be true any more, because __mdd_acl_init() should initialise mode even when acl doesn't exist.

Comment by Peter Jones [ 25/Feb/13 ]

Lai advises that this issue will be more completely addressed by LU-2804

Comment by Andreas Dilger [ 01/Oct/14 ]

sanity.sh test_40 is being skipped due to this bug. If the problem was fixed by LU-2804 then a patch is needed to re-enable this test.

Comment by Nathaniel Clark [ 01/Oct/14 ]

Re-enable test_40
http://review.whamcloud.com/12153

Comment by Nathaniel Clark [ 29/Oct/14 ]

Patch landed to master (prior to 2.6.54)

Generated at Sat Feb 10 01:26:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.