[LU-16673] sanity test_125: failures with aarch64 servers Created: 27/Mar/23  Updated: 21/Aug/23  Resolved: 19/Aug/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Xinliang Liu
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-16541 sanity test_64f: buffered io, not wri... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

This issue relates to the following test suite run with aarch64 servers:
https://testing.whamcloud.com/test_sets/38335ace-3096-4f6a-a18a-f252f4d7bf6c

test_125 failed with the following error:

setfacl /mnt/lustre/d125.sanity failed

Test session details:
clients: 2.15.54.79 - 4.18.0-348.23.1.el8_lustre331.aarch64
servers: 2.15.54.79 - 4.18.0-348.23.1.el8_lustre331.aarch64

Also failed the following subtests (listing all here for searching):
test_150f
test_150g
test_151
test_154a
test_154b
test_156

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_125 - setfacl /mnt/lustre/d125.sanity failed



 Comments   
Comment by Andreas Dilger [ 27/Mar/23 ]

It looks like there are a handful of consistent sanity subtest failures when running with aarch64 servers.

You are likely already aware of this, since the test results are coming from a linaro test cluster, but I filed the ticket to allow tracking the test failures, and for submitting patches to fix them.

Comment by Xinliang Liu [ 28/Mar/23 ]

Yeah, we are aware of this. We tracking the sanity.sh test suite failed tests on this ticket: https://linaro.atlassian.net/browse/STOR-123

Recently, I am looking at this. 

Comment by Xinliang Liu [ 30/Mar/23 ]

Looks like the user USER0 for setfacl operation is nonexistent when specifying a valid user test 125 pass.

USER0=${USER0:-"sanityusr"}

setfacl -R -m u:$USER0:rwx $DIR/$tdir

$ sudo USER0=openeuler RUNAS_ID="1000" ~/lustre-release/lustre/tests/auster  -rv sanity --only 125

== sanity test 125: don't return EPROTO when a dir has a non-default striping and ACLs ========================================================== 09:03:07 (1680166987)
setfacl -R -m u:openeuler:rwx /mnt/lustre/d125.sanity
drwxrwxr-x+ 2 root root 4096 Mar 30 09:03 /mnt/lustre/d125.sanity
PASS 125 (1s)
 
Comment by Xinliang Liu [ 03/Apr/23 ]

Should add a check for user USER0 in the test.

Comment by Gerrit Updater [ 03/Apr/23 ]

"xinliang <xinliang.liu@linaro.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50501
Subject: LU-16673 tests: add checking for user USER0
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7559cd4ae840828720b5ae5f1ccf927ca96280e2

Comment by Xinliang Liu [ 06/Apr/23 ]

And for those tests 150f,150g,151,156, failing might relate to 64K page size, because these tests pass on aarch64 4k page size cluster.

Comment by Andreas Dilger [ 06/Apr/23 ]

xinliang, out of curiosity, are you formatting the OSTs with 64KB blocksize, or still 4KB blocksize (assuming ldiskfs)? There are definitely some interesting benefits from 64KB blocksize:

  • larger directories with fewer htree levels
  • 256x fewer block groups due to 16x bits per bitmap x 16x larger blocks
  • more efficient allocation for large files

This would mean less space efficiency for files < 64KB in size and for the fraction of a block at the end of each file, but for files <= 64KB it would be possible to use DoM to store them on the MDT.

Comment by Xinliang Liu [ 07/Apr/23 ]

Our aarch64 CI clusters run on CentOS 8 which use 64KB PAGE_SIZE and thus use 64KB blocksize for ldiskfs.

It sounds a good idea to store files < 64KB on MDT with DoM for 64KB PAGE_SIZE cluster.

 

We also do test locally on openEuler which use 4K PAGE_SIZE.

 

Comment by Xinliang Liu [ 07/Apr/23 ]

BTW, RHEL 9 uses 4K PAGE_SIZE.

Comment by Gerrit Updater [ 19/Aug/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50501/
Subject: LU-16673 tests: add checking for user USER0
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b7b55534b9b5d4d58025082ae5403938853b168c

Comment by Peter Jones [ 19/Aug/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:29:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.