[LU-2610] sanity test_103: ACL copy failed zfs - umask ignored Created: 12/Jan/13 Updated: 29/Oct/14 Resolved: 29/Oct/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Nathaniel Clark |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | LB, sequoia, zfs | ||
| Environment: |
lustre-master #1142 OFED build ZFS |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 6094 | ||||||||||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/e2ede512-5ae4-11e2-8985-52540035b04c. The sub-test test_103 failed with the following error: == sanity test 103: acl test =========================================== 16:53:11 (1357692791)
performing cp ...
[3] $ umask 022 -- ok
[4] $ mkdir d -- ok
[5] $ cd d -- ok
[6] $ touch f -- ok
[7] $ setfacl -m u:bin:rw f -- ok
[8] $ ls -l f | awk -- '{ print $1 }' -- failed
-rw-rw-r--+ ? -rw-rw-rw-+
[11] $ cp f g -- ok
[12] $ ls -l g | awk -- '{ print $1 }' -- failed
-rw-r--r-- ? -rw-rw-rw-
[15] $ rm g -- ok
[16] $ cp -p f g -- ok
[17] $ ls -l f | awk -- '{ print $1 }' -- failed
-rw-rw-r--+ ? -rw-rw-rw-+
[20] $ mkdir h -- ok
[21] $ echo blubb > h/x -- ok
[22] $ cp -rp h i -- ok
[23] $ cat i/x -- ok
[26] $ rm -r i -- ok
[31] $ setfacl -R -m u:bin:rwx h -- ok
[32] $ getfacl --omit-header h/x -- failed
user::rw- | user::rw-
user:bin:rwx | user:bin:rwx
group::r-- ? group::rw-
mask::rwx | mask::rwx
other::r-- ? other::rw-
|
[40] $ cp -rp h i -- ok
[41] $ getfacl --omit-header i/x -- failed
user::rw- | user::rw-
user:bin:rwx | user:bin:rwx
group::r-- ? group::rw-
mask::rwx | mask::rwx
other::r-- ? other::rw-
|
[49] $ cd .. -- ok
[50] $ rm -r d -- ok
22 commands (17 passed, 5 failed)
sanity test_103: @@@@@@ FAIL:
|
| Comments |
| Comment by Nathaniel Clark [ 21/Jan/13 ] |
|
In test 103 (over the wire) the REINT_CREATE packet exchange: ldiskfs returned Mode: 040755 ZFS returned Mode: 040777 ZFS appears to be ignoring the passed umask. |
| Comment by Nathaniel Clark [ 25/Jan/13 ] |
| Comment by Christopher Morrone [ 25/Jan/13 ] |
|
Interesting. On first glance at your patch I thought "that can't be right", but now I see what is going on. It is surprising to me that the server knows about the umask of the calling thread. Do you have any idea why it is designed this way? Why isn't the umask simply applied to the mode inside the lustre client? If it wasn't before, it seems like a bad design now that we can support multiple OSD. Making every osd implement the same thing seems error prone, which is what led to this bug. |
| Comment by Nathaniel Clark [ 27/Jan/13 ] |
|
It's because POSIX ACLs and umask interact and it's up to the filesystem to work out the correct interaction. I think ACL + umask calculations are done in OSD because of the different way's ACLs are stored (that's my impression from working through the code), but I agree it isn't well encapsulated. |
| Comment by Lai Siyao [ 28/Jan/13 ] |
|
Chris, it's designed to be so because the creation mode is decided by umask and parent default ACL, and it's not a good idea to hold parent UPDATE lock by client in creation. So finally it's decided to send calling thread umask to MDS and let server backend filesystem to calculate the correct mode. Linux kernel also lets bottom filesystem to do this instead of in VFS, so that ldiskfs can get the correct mode. I don't find ZFS does this at the first glance of code. Nathaniel, could you test this with ZFS as local filesystem? |
| Comment by Nathaniel Clark [ 28/Jan/13 ] |
|
ZFS does behave correctly as a local filesystem (in my limited testing). It honoured umask during file creation, I didn't do extensive umask vs ACL testing though. |
| Comment by Lai Siyao [ 04/Feb/13 ] |
|
Ldiskfs handles ACL and creation mode in low level, see ldiskfs_new_inode(), in contrast, ZFS handles this in ZPL layer, however lustre calls DMU directly to make new node, that's why we see different result in these two OSD backend filesystem. Nathaniel, now that you want to fix all this in MDD layer, I think you need check the patch for |
| Comment by Nathaniel Clark [ 04/Feb/13 ] |
|
My patch still has the OSD layer taking care of setting the mode. The mdd change just ensures that if mode+umask is different the the current computed node that the osd layer is called finish setting that mode. I was cognizant of I could update my patch so that osd-ldiskfs takes advantage of the cr_umask instead of passing it in via current->fs->umask from mdd, which would essentially just move where current->fs->umask gets set from mdd_internal.h::mdo_create_obj to osd_handler.c::__osd_object_create. |
| Comment by Lai Siyao [ 04/Feb/13 ] |
|
Actually for ldiskfs __mdd_acl_init() is redundant, because ACL is handled internally by ldiskfs in creation already. IMHO this function exists because in original design, ACL initialization should be done in MDD layer, and only the interfaces of get/set xattr are needed for OSD. (this is how ZFS OSD is implemented). If this is true, __mdd_acl_init() should act the same as ldiskfs_init_acl(), and also note __mdd_acl_init() is now inside #ifdef CONFIG_FS_POSIX_ACL and if (got_def_acl), this will not be true any more, because __mdd_acl_init() should initialise mode even when acl doesn't exist. |
| Comment by Peter Jones [ 25/Feb/13 ] |
|
Lai advises that this issue will be more completely addressed by |
| Comment by Andreas Dilger [ 01/Oct/14 ] |
|
sanity.sh test_40 is being skipped due to this bug. If the problem was fixed by |
| Comment by Nathaniel Clark [ 01/Oct/14 ] |
|
Re-enable test_40 |
| Comment by Nathaniel Clark [ 29/Oct/14 ] |
|
Patch landed to master (prior to 2.6.54) |