[LU-15978] directory deletion fails on multiple MDS for 64K PAGE_SIZE Created: 28/Jun/22  Updated: 12/Jan/24  Resolved: 19/Aug/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Major
Reporter: Xinliang Liu Assignee: Xinliang Liu
Resolution: Fixed Votes: 0
Labels: DNE, arm, ppc
Environment:

aarch64 server centos 8
[centos@liuxl-centos-aio lustre-release]$ lsb_release -a
LSB Version: :core-4.1-aarch64:core-4.1-noarch:cxx-4.1-aarch64:cxx-4.1-noarch:desktop-4.1-aarch64:desktop-4.1-noarch:languages-4.1-aarch64:languages-4.1-noarch:printing-4.1-aarch64:printing-4.1-noarch
Distributor ID: CentOSStream
Description: CentOS Stream release 8
Release: 8
Codename: n/a
[centos@liuxl-centos-aio lustre-release]$ uname -a
Linux liuxl-centos-aio.novalocal 4.18.0-348.2.1.el8_lustre.aarch64 #1 SMP Fri Apr 15 07:36:36 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux


Severity: 2
Rank (Obsolete): 9223372036854775807

 Description   

Directory deletion fails on multiple MDS for aarch64 64K PAGE_SIZE.

It fails on many tests which creating directories on multi MDS. E.g. sanity test_1
 
sudo MDSCOUNT=2 PTLDEBUG=-1  RUNAS_ID=1000 ~/work/lustre-release/lustre/tests/auster   -D ~/log-28  -rv  sanity --only 1
...
== sanity test 1: mkdir; remkdir; rmdir ================== 01:44:19 (1656294259)
striped dir -i1 -c2 -H fnv_1a_64 /mnt/lustre/d1.sanity
striped dir -i1 -c2 -H all_char /mnt/lustre/d1.sanity/d2
mkdir: cannot create directory '/mnt/lustre/d1.sanity/d2': File exists
/mnt/lustre/d1.sanity/d2 has type dir OK
rmdir: failed to remove '/mnt/lustre/d1.sanity/d2': Invalid argument
rmdir: failed to remove '/mnt/lustre/d1.sanity': Directory not empty
/mnt/lustre/d1.sanity exists
 sanity test_1: @@@@@@ FAIL: d1.sanity was not removed
  Trace dump:
  = /home/centos/work/lustre-release/lustre/tests/test-framework.sh:6408:error()
  = /home/centos/work/lustre-release/lustre/tests/sanity.sh:280:test_1()
  = /home/centos/work/lustre-release/lustre/tests/test-framework.sh:6743:run_one()
  = /home/centos/work/lustre-release/lustre/tests/test-framework.sh:6790:run_one_logged()
  = /home/centos/work/lustre-release/lustre/tests/test-framework.sh:6616:run_test()
  = /home/centos/work/lustre-release/lustre/tests/sanity.sh:282:main()
Dumping lctl log to /home/centos/log-26/sanity.test_1.*.1656294260.log
Dumping logs only on local client.
FAIL 1 (2s)
resend_count is set to 4 4
resend_count is set to 4 4
resend_count is set to 4 4
resend_count is set to 4 4
resend_count is set to 4 4
== sanity test complete, duration 7 sec ================== 01:44:23 (1656294263)
sanity: FAIL: test_1 d1.sanity was not removed
rm: cannot remove '/mnt/lustre/d1.sanity': Directory not empty
 sanity test_904: @@@@@@ FAIL: remove sub-test dirs failed
  Trace dump:
  = /home/centos/work/lustre-release/lustre/tests/test-framework.sh:6408:error()
  = /home/centos/work/lustre-release/lustre/tests/test-framework.sh:5892:check_and_cleanup_lustre()
  = /home/centos/work/lustre-release/lustre/tests/sanity.sh:28307:main()
Dumping lctl log to /home/centos/log-26/sanity.test_904.*.1656294263.log
Dumping logs only on local client.
sanity returned 1
...



 Comments   
Comment by Xinliang Liu [ 28/Jun/22 ]

There is an error in the kernel log:

[100991.850151] LustreError: 381691:0:(osp_object.c:1998:osp_it_next_page()) lustre-MDT0000-osp-MDT0001: invalid magic (0 != 8a6d6b6c) for page 0/1 while read layout orphan index
Comment by Gerrit Updater [ 28/Jun/22 ]

"xinliang <xinliang.liu@linaro.org>" uploaded a new patch: https://review.whamcloud.com/47812
Subject: LU-15978 osp: fix directory deletion fails for 64K PAGE_SIZE
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6ee28ced35584e388b552e906bc89028e8f81277

Comment by Gerrit Updater [ 19/Aug/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47812/
Subject: LU-15978 osp: fix striped directory deletion fails for 64K PAGE_SIZE
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7576d294582b818b20559138500cf1e58607cfc8

Comment by Peter Jones [ 19/Aug/23 ]

Landed for 2.16

Comment by Gerrit Updater [ 12/Jan/24 ]

"xinliang <xinliang.liu@linaro.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53654
Subject: LU-15978 osp: fix striped directory deletion fails for 64K PAGE_SIZE
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 3701855140356c5d663f9d266246e81fc8b46892

Generated at Sat Feb 10 03:22:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.