[LU-16673] sanity test_125: failures with aarch64 servers Created: 27/Mar/23 Updated: 21/Aug/23 Resolved: 19/Aug/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Xinliang Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com> This issue relates to the following test suite run with aarch64 servers: test_125 failed with the following error: setfacl /mnt/lustre/d125.sanity failed Test session details: Also failed the following subtests (listing all here for searching): VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Andreas Dilger [ 27/Mar/23 ] |
|
It looks like there are a handful of consistent sanity subtest failures when running with aarch64 servers. You are likely already aware of this, since the test results are coming from a linaro test cluster, but I filed the ticket to allow tracking the test failures, and for submitting patches to fix them. |
| Comment by Xinliang Liu [ 28/Mar/23 ] |
|
Yeah, we are aware of this. We tracking the sanity.sh test suite failed tests on this ticket: https://linaro.atlassian.net/browse/STOR-123 Recently, I am looking at this. |
| Comment by Xinliang Liu [ 30/Mar/23 ] |
|
Looks like the user USER0 for setfacl operation is nonexistent when specifying a valid user test 125 pass. USER0=${USER0:-"sanityusr"} setfacl -R -m u:$USER0:rwx $DIR/$tdir $ sudo USER0=openeuler RUNAS_ID="1000" ~/lustre-release/lustre/tests/auster -rv sanity --only 125 == sanity test 125: don't return EPROTO when a dir has a non-default striping and ACLs ========================================================== 09:03:07 (1680166987) setfacl -R -m u:openeuler:rwx /mnt/lustre/d125.sanity drwxrwxr-x+ 2 root root 4096 Mar 30 09:03 /mnt/lustre/d125.sanity PASS 125 (1s) |
| Comment by Xinliang Liu [ 03/Apr/23 ] |
|
Should add a check for user USER0 in the test. |
| Comment by Gerrit Updater [ 03/Apr/23 ] |
|
"xinliang <xinliang.liu@linaro.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50501 |
| Comment by Xinliang Liu [ 06/Apr/23 ] |
|
And for those tests 150f,150g,151,156, failing might relate to 64K page size, because these tests pass on aarch64 4k page size cluster. |
| Comment by Andreas Dilger [ 06/Apr/23 ] |
|
xinliang, out of curiosity, are you formatting the OSTs with 64KB blocksize, or still 4KB blocksize (assuming ldiskfs)? There are definitely some interesting benefits from 64KB blocksize:
This would mean less space efficiency for files < 64KB in size and for the fraction of a block at the end of each file, but for files <= 64KB it would be possible to use DoM to store them on the MDT. |
| Comment by Xinliang Liu [ 07/Apr/23 ] |
|
Our aarch64 CI clusters run on CentOS 8 which use 64KB PAGE_SIZE and thus use 64KB blocksize for ldiskfs. It sounds a good idea to store files < 64KB on MDT with DoM for 64KB PAGE_SIZE cluster.
We also do test locally on openEuler which use 4K PAGE_SIZE.
|
| Comment by Xinliang Liu [ 07/Apr/23 ] |
|
BTW, RHEL 9 uses 4K PAGE_SIZE. |
| Comment by Gerrit Updater [ 19/Aug/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50501/ |
| Comment by Peter Jones [ 19/Aug/23 ] |
|
Landed for 2.16 |