[LU-16056] rarely directories got created with the wrong permission Created: 28/Jul/22  Updated: 18/Jul/23  Resolved: 01/Sep/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0, Lustre 2.15.2

Type: Bug Priority: Critical
Reporter: Lai Siyao Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

we've had two instances of this, with the latest being:

[root@ec01 ~]# ls -ld /exafs/home/sihara/mdtest.out/test-dir.0-0/mdtest_tree.488.0/file.mdtest.488.30670
---------- 1 sihara users 0 Jul 16 10:10 /exafs/home/sihara/mdtest.out/test-dir.0-0/mdtest_tree.488.0/file.mdtest.488.30670

[root@ec01 ~]# lfs getdirstripe /exafs/home/sihara/mdtest.out/test-dir.0-0/mdtest_tree.488.0
lmv_stripe_count: 0 lmv_stripe_offset: 1 lmv_hash_type: none
[root@ec01 ~]# lfs getdirstripe -D /exafs/home/sihara/mdtest.out/test-dir.0-0/mdtest_tree.488.0
lmv_stripe_count: 1 lmv_stripe_offset: -1 lmv_hash_type: none lmv_max_inherit: -1 lmv_max_inherit_rr: 5
[root@ec01 ~]# lfs getstripe -m /exafs/home/sihara/mdtest.out/test-dir.0-0/mdtest_tree.488.0/file.mdtest.488.30670 
1 

The username/group ownership is fine, the permissions are supposed to be 755.



 Comments   
Comment by Gerrit Updater [ 28/Jul/22 ]

"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48071
Subject: LU-16056 mdt: debug patch for wrong dir permission
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7291ac8308437f11fffb4ef7b2a7dded2c59c0fd

Comment by Andreas Dilger [ 29/Jul/22 ]

Lai, I noticed a sanity test_103b failure on the Gerrit Janitor testing of an unrelated patch that might be the same as this issue:
https://testing.whamcloud.com/gerrit-janitor/24464/testresults/sanity2-ldiskfs-DNE-centos7_x86_64-centos7_x86_64-retry2/

== sanity test 103b: umask lfs setstripe ================= 18:43:45 (1659134625)
 sanity test_103b: @@@@@@ FAIL: lfs setstripe /mnt/lustre/f103b.sanity.s0074 '0000' != '0074' 
 sanity test_103b: @@@@@@ FAIL: lfs setstripe -N2 /mnt/lustre/f103b.sanity.m0061 '0000' != '0061' 

it looks like some kind of race condition/bug with umask handling, and has full debug logs on both the client and server, so this might be easier to investigate than the huge mdtest reproducer.

Comment by Lai Siyao [ 01/Aug/22 ]

I checked the logs, but strangely didn't find any messages about these two files. And this test never failed in maloo test.

Comment by Andreas Dilger [ 01/Aug/22 ]

If there isn't enough logging enabled for this test you could push a patch on master that modifies the test, in case it is hit again in the future? Maybe including your extra CDEBUG changes?

Comment by Shuichi Ihara [ 02/Aug/22 ]

The latest patch worked and dumped when no permission file was created.

[root@ai400x2-1-vm1 ~]# lctl debug_file  /scratch/log/lustre-log.1659427201.8549 | grep '##'
00000004:00020000:19.0:1659427201.471371:0:8549:0:(mdt_handler.c:877:mdt_pack_attr2body()) ##### [0x200000483:0xfd91:0x0] with mode 32768 uid 8888 gid 100 valid 1100000080002f8f

[root@ec01 ~]# lfs fid2path /exafs "[0x200000483:0xfd91:0x0]"
/exafs/home/sihara/mdtest.out/test-dir.0-0/mdtest_tree.162.0/file.mdtest.162.13744

[root@ec01 ~]# ls -l /exafs/home/sihara/mdtest.out/test-dir.0-0/mdtest_tree.162.0/file.mdtest.162.13744
---------- 1 sihara users 0 Aug  2 17:00 /exafs/home/sihara/mdtest.out/test-dir.0-0/mdtest_tree.162.0/file.mdtest.162.13744
Comment by Gerrit Updater [ 16/Aug/22 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48233
Subject: LU-16056 libcfs: restore umask handling in kernel threads
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 06fe5984b7ebfe0153155546995576ea35f7ef21

Comment by Gerrit Updater [ 01/Sep/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/48233/
Subject: LU-16056 libcfs: restore umask handling in kernel threads
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c92bdd97d99cc755a187987f3d5963adeb3ea475

Comment by Peter Jones [ 01/Sep/22 ]

Landed for 2.16

Comment by Gerrit Updater [ 01/Sep/22 ]

"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48414
Subject: LU-16056 libcfs: restore umask handling in kernel threads
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: ac44a6c05afd67d9006c93132c354aef994c4383

Generated at Sat Feb 10 03:23:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.