[LU-14430] Amount of default ACLs is limited by 31 for new files Created: 12/Feb/21 Updated: 16/Jul/23 Resolved: 29/Jul/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.5 |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Mikhail Pershin | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
While directory may have many default ACLs they cannot be inherited in newly created file. This is MDD internal issue and it is caused by buffer size limitation during the ACL processing |
| Comments |
| Comment by Gerrit Updater [ 12/Feb/21 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41494 |
| Comment by Gerrit Updater [ 22/Feb/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41494/ |
| Comment by Peter Jones [ 22/Feb/21 ] |
|
Landed for 2.15 |
| Comment by Alex Zhuravlev [ 26/Feb/21 ] |
|
just got this on fresh master:
LustreError: 13185:0:(mdd_dir.c:2322:mdd_acl_init()) ASSERTION( def_acl_buf->lb_len <= acl_buf->lb_len ) failed: in sanity / 103e
Trace:
PID: 13185 TASK: ffff8802066e8000 CPU: 1 COMMAND: "mdt01_001"
#0 [ffff880247133960] panic at ffffffff810af9a3
/home/lustre/linux-4.18.0-32.el8/kernel/panic.c: 265
#1 [ffff8802471339f0] mdd_create at ffffffffa0c2997e [mdd]
/home/lustre/master-mine/libcfs/include/libcfs/libcfs_fail.h: 95
#2 [ffff880247133ab8] mdt_reint_open at ffffffffa0cd02f4 [mdt]
/home/lustre/master-mine/lustre/include/md_object.h: 616
#3 [ffff880247133c10] mdt_intent_open at ffffffffa0ca3dff [mdt]
/home/lustre/master-mine/lustre/mdt/mdt_handler.c: 4469
#4 [ffff880247133c50] mdt_intent_policy at ffffffffa0ca1b89 [mdt]
/home/lustre/master-mine/lustre/mdt/mdt_handler.c: 4616
#5 [ffff880247133cb8] ldlm_lock_enqueue at ffffffffa0553cf8 [ptlrpc]
/home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_lock.c: 1776
#6 [ffff880247133d18] ldlm_handle_enqueue0 at ffffffffa0578f98 [ptlrpc]
/home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_lockd.c: 1390
#7 [ffff880247133d90] tgt_enqueue at ffffffffa05fcddf [ptlrpc]
/home/lustre/master-mine/lustre/ptlrpc/../../lustre/target/tgt_handler.c: 1393
#8 [ffff880247133da8] tgt_request_handle at ffffffffa0602c70 [ptlrpc]
/home/lustre/master-mine/lustre/include/lu_target.h: 618
#9 [ffff880247133e20] ptlrpc_main at ffffffffa05ae915 [ptlrpc]
/home/lustre/master-mine/lustre/include/lustre_net.h: 2448
#10 [ffff880247133f10] kthread at ffffffff810d0350
/home/lustre/linux-4.18.0-32.el8/kernel/kthread.c: 246
#11 [ffff880247133f50] ret_from_fork at ffffffff818001c4
/home/lustre/linux-4.18.0-32.el8/arch/x86/entry/entry_64.S: 422
|
| Comment by Gerrit Updater [ 26/Feb/21 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41775 |
| Comment by Mikhail Pershin [ 26/Feb/21 ] |
|
Alex, could you check that the latest patch fixes that assertion? |
| Comment by Mikhail Pershin [ 04/Mar/21 ] |
|
More work is needed here, after allowing maximum ACL buffers to be filled (64K), simple test to set as many ACLs as possible fails in several aspects. More patches to be added here |
| Comment by Mikhail Pershin [ 06/Mar/21 ] |
|
While testing maximum amount of ACL with ldiskfs I've encountered problem with transaction credits upon file creation. Investigation made by Alex showed that ldiskfs_new_inode() calls also ldiskfs_init_acl(). It copies all parent ACLs to the new file as well as adds transaction credits for that. This causes LBUG() each time I am trying to create file in directory with maximum default ACLs: osd_trans_dump_creds()) create: 1/8/16, destroy: 0/0/0 ... [25034.738128] LustreError: 30925:0:(osd_internal.h:1319:osd_trans_exec_check()) LBUG [25034.738368] Pid: 30925, comm: mdt01_003 3.10.0 #5 SMP Sun Jun 2 15:04:32 EDT 2019 [25034.738369] Call Trace: [25034.738375] [<ffffffffa00d77ad>] libcfs_call_trace+0x7d/0xa0 [libcfs] [25034.738384] [<ffffffffa00d784c>] lbug_with_loc+0x4c/0xa0 [libcfs] [25034.738390] [<ffffffffa0a7d346>] cfs_fail_check_set.part.51.constprop.95+0x0/0x79 [osd_ldiskfs] [25034.738401] [<ffffffffa0a512a2>] osd_create+0x972/0x13c0 [osd_ldiskfs] [25034.738410] [<ffffffffa0c949d5>] lod_sub_create+0x1e5/0x470 [lod] [25034.738422] [<ffffffffa0c85189>] lod_create+0x69/0x360 [lod] [25034.738430] [<ffffffffa0b391c3>] mdd_create_object_internal+0xc3/0x300 [mdd] [25034.738440] [<ffffffffa0b2189c>] mdd_create_object+0x5c/0x800 [mdd] [25034.738447] [<ffffffffa0b2c44d>] mdd_create+0xe6d/0x1600 [mdd] [25034.738453] [<ffffffffa0bbd3a0>] mdt_reint_open+0x2470/0x32c0 [mdt] [25034.738468] [<ffffffffa0bb00b3>] mdt_reint_rec+0x83/0x220 [mdt] [25034.738479] [<ffffffffa0b8c2c1>] mdt_reint_internal+0x6e1/0xb00 [mdt] [25034.738488] [<ffffffffa0b98eb2>] mdt_intent_open+0x82/0x3a0 [mdt] [25034.738498] [<ffffffffa0b96fd5>] mdt_intent_policy+0x445/0xd90 [mdt] [25034.738508] [<ffffffffa04b6636>] ldlm_lock_enqueue+0x366/0x9c0 [ptlrpc] [25034.738540] [<ffffffffa04ddd26>] ldlm_handle_enqueue0+0xa66/0x1620 [ptlrpc] [25034.738568] [<ffffffffa0566b12>] tgt_enqueue+0x62/0x210 [ptlrpc] [25034.738604] [<ffffffffa056fede>] tgt_request_handle+0xade/0x15e0 [ptlrpc] [25034.738744] [<ffffffffa051171b>] ptlrpc_server_handle_request+0x25b/0xad0 [ptlrpc] [25034.738776] [<ffffffffa0515aa3>] ptlrpc_main+0xbe3/0x21e0 [ptlrpc] [25034.738805] [<ffffffff8110aad4>] kthread+0xd4/0xe0 [25034.738810] [<ffffffff81839777>] ret_from_fork_nospec_end+0x0/0x39 [25034.738813] [<ffffffffffffffff>] 0xffffffffffffffff That is caused by ldiskfs adds more credits in transaction and MDD is not aware about that. Moreover that also means the ACLs EA is copied first always, so LMA and LOV EA could go into extra block if ACL is big enough. Considering that MDD handles all ACLs by itself because of ZFS, it is not needed at all to use ACL handling in ldiskfs, so I was trying to disable ACL in ldiskfs and let MDD work with it as with ZFS. Unfortunately that is not possible without ldiskfs patching, it uses set of xattr handlers which have internal checks for ACL mount option and deny any setxattr/getxattr for ACL EA. At the moment I have no good and simple solution for that LBUG. Reserving twice more credits in MDD would cause big overhead with many stripes and that is just not good to do double work while handling ACLs with ldiskfs in terms of performance and resources. |
| Comment by Gerrit Updater [ 11/Mar/21 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/42013 |
| Comment by Gerrit Updater [ 13/Mar/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41775/ |
| Comment by Gerrit Updater [ 28/Apr/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/42013/ |
| Comment by Gerrit Updater [ 12/May/21 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43672 |
| Comment by Andreas Dilger [ 13/May/21 ] |
|
I added some debugging code to the users of mti_big_buf and tracked the double-use problem down to the code in mdd_declare_changelog_store() using it internally while only declaring the operation: [ 6181.942124] Lustre: testfs-MDD0000: changelog on [ 6185.985737] Lustre: testfs-MDD0001: changelog on [ 6201.030925] LustreError: 6461:0:(mdd_dir.c:766:mdd_declare_changelog_store()) ASSERTION( !mdd_env_info(env)->mdi_big_buf_used ) failed: mdi_big_buf used in mdd_dir.c:2630:mdd_create() [ 6201.043070] LustreError: 6461:0:(mdd_dir.c:766:mdd_declare_changelog_store()) LBUG [ 6201.049962] Pid: 6461, comm: mdt00_003 3.10.0-1160.21.1.el7_lustre.ddn13.x86_64 #1 SMP Fri Mar 19 20:56:15 UTC 2021 [ 6201.059974] Call Trace: [ 6201.062767] [<ffffffffc06317cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 6201.067814] [<ffffffffc063187c>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 6201.070500] [<ffffffffc10f189c>] mdd_declare_changelog_store+0x39c/0x410 [mdd] [ 6201.077335] [<ffffffffc10f1d9d>] mdd_declare_create+0x48d/0xdf0 [mdd] [ 6201.083021] [<ffffffffc10f55b1>] mdd_create+0x8f1/0x1790 [mdd] [ 6201.085670] [<ffffffffc1188bf8>] mdt_reint_open+0x2578/0x33d0 [mdt] [ 6201.090106] [<ffffffffc117b7d3>] mdt_reint_rec+0x83/0x210 [mdt] [ 6201.092757] [<ffffffffc1157481>] mdt_reint_internal+0x6e1/0xb00 [mdt] [ 6201.094711] [<ffffffffc11641a2>] mdt_intent_open+0x82/0x3a0 [mdt] [ 6201.099578] [<ffffffffc11622c5>] mdt_intent_policy+0x435/0xd80 [mdt] [ 6201.104017] [<ffffffffc0a7a686>] ldlm_lock_enqueue+0x376/0x9b0 [ptlrpc] [ 6201.107773] [<ffffffffc0aa2236>] ldlm_handle_enqueue0+0xaa6/0x1630 [ptlrpc] [ 6201.113342] [<ffffffffc0b2c012>] tgt_enqueue+0x62/0x210 [ptlrpc] [ 6201.118300] [<ffffffffc0b30bee>] tgt_request_handle+0xaee/0x15f0 [ptlrpc] [ 6201.132348] [<ffffffffc0ad75db>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [ 6201.137484] [<ffffffffc0adaf44>] ptlrpc_main+0xb34/0x1470 [ptlrpc] That code doesn't need a large lu_buf for the whole changelog record to declare the transaction size, just rec->cr_hdr (struct llog_rec_hdr), which could be allocated directly on the stack. That would also avoid some overhead in that function since it doesn't need to check/allocate mti_big_buf. |
| Comment by Gerrit Updater [ 13/May/21 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43683 |
| Comment by Gerrit Updater [ 19/May/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43672/ |
| Comment by Gerrit Updater [ 19/May/21 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43738 |
| Comment by Gerrit Updater [ 19/May/21 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43739 |
| Comment by Gerrit Updater [ 19/May/21 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43740 |
| Comment by Peter Jones [ 19/May/21 ] |
|
Still some patches being tracked under this ticket |
| Comment by Gerrit Updater [ 27/May/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43683/ |
| Comment by Gerrit Updater [ 12/Jul/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43738/ |
| Comment by Gerrit Updater [ 12/Jul/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43739/ |
| Comment by Gerrit Updater [ 27/Jul/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43740/ |
| Comment by Peter Jones [ 29/Jul/21 ] |
|
Landed for 2.15 |