[LU-11785] conf-sanity test_98 fails with 'Buffer overflow check failed' Created: 14/Dec/18 Updated: 02/Aug/23 Resolved: 11/Apr/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.2, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.4, Lustre 2.12.5, Lustre 2.15.0, Lustre 2.15.3 |
| Fix Version/s: | Lustre 2.16.0, Lustre 2.15.4 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Kevin Zhao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | arm, ppc | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
conf-sanity test_98 fails with 'Buffer overflow check failed' since October 26, 2018. So far, this test is only failing with this error for ARM architecture testing. Looking at the logs of a recent failure, https://testing.whamcloud.com/test_sets/f2d62dae-fdef-11e8-b837-52540065bddc , the last lines of test output in the client test_log are Started lustre-OST0000 mount lustre on /mnt/lustre..... Starting client: trevis-79vm11.trevis.whamcloud.com: -o user_xattr,flock trevis-24vm4@tcp:/lustre /mnt/lustre CMD: trevis-79vm11.trevis.whamcloud.com mkdir -p /mnt/lustre CMD: trevis-79vm11.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-24vm4@tcp:/lustre /mnt/lustre CMD: trevis-79vm11.trevis.whamcloud.com cp /etc/passwd /mnt/lustre/a CMD: trevis-79vm11.trevis.whamcloud.com rm /mnt/lustre/a CMD: trevis-79vm11.trevis.whamcloud.com grep /mnt/lustre' ' /proc/mounts > /dev/null setup single mount lustre success conf-sanity test_98: @@@@@@ FAIL: Buffer overflow check failed The test is fairly simple
7230 test_98()
7231 {
7232 local mountopt
7233 local temp=$MDS_MOUNT_OPTS
7234
7235 setup
7236 check_mount || error "mount failed"
7237 mountopt="user_xattr"
7238 for ((x = 1; x <= 400; x++)); do
7239 mountopt="$mountopt,user_xattr"
7240 done
7241 remount_client $mountopt $MOUNT 2>&1 | grep "too long" ||
7242 error "Buffer overflow check failed"
7243 cleanup || error "cleanup failed"
7244 }
7245 run_test 98 "Buffer-overflow check while parsing mount_opts"
In the console log of client 1 (vm11), we see the long mount option [45496.444538] Lustre: DEBUG MARKER: mount -t lustre -o remount,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xat [45497.132159] Lustre: DEBUG MARKER: /usr/sbin/lctl mark conf-sanity test_98: @@@@@@ FAIL: Buffer overflow check failed but there are no obvious errors in the console logs. Logs for other failures are at |
| Comments |
| Comment by Peter Jones [ 17/Dec/18 ] |
|
Jian Could you please investigate? Peter |
| Comment by James A Simmons [ 14/Jun/22 ] |
|
Is this still true? |
| Comment by Kevin Zhao [ 25/Jul/22 ] |
|
Still fail in our test infra: https://testing.whamcloud.com/sub_tests/85914d66-28c0-4eee-9b38-5b4156a49dd4
will work on a fix |
| Comment by Jian Yu [ 25/Jul/22 ] |
|
Thank you, Kevin. |
| Comment by Andreas Dilger [ 27/Oct/22 ] |
|
It looks like the test itself will always fail for PAGE_SIZE=65536 (aarch64 and ppcle64) because the test is trying to create a "too long" mount option string using 400 times ",user_xattr". That is 400x11=4400 bytes, which is only long enough to cause a failure on x86 with PAGE_SIZE=4096. The test can trivially be updated to run the loop longer when needed:
for ((x = 0; x <= PAGE_SIZE/11; x++)); do
mountopt="$mountopt,user_xattr"
done
|
| Comment by Andreas Dilger [ 27/Oct/22 ] |
kevin.zhao or xinliang, any chance you can submit the trivial patch to fix this, and verify it passes on aarch64? |
| Comment by Kevin Zhao [ 28/Oct/22 ] |
|
Hi Andreas, Thanks for the comment, I will submit the trivial fix. |
| Comment by Andreas Dilger [ 28/Oct/22 ] |
|
Strange. There was an existing patch for this ticket from Kevin, but it was not linked here. |
| Comment by Andreas Dilger [ 28/Oct/22 ] |
|
"Kevin Zhao <kevin.zhao@linaro.org>" uploaded a new patch: https://review.whamcloud.com/48177 |
| Comment by Xinliang Liu [ 31/Oct/22 ] |
|
Verified that the fix make conf-sanity test_98 pass in aarch64 one node test environment. Thanks Andreas Dilger and Kevin for posting the fix. |
| Comment by Gerrit Updater [ 11/Apr/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48177/ |
| Comment by Peter Jones [ 11/Apr/23 ] |
|
Landed for 2.16 |
| Comment by Gerrit Updater [ 12/Jun/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51288 |
| Comment by Gerrit Updater [ 02/Aug/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51288/ |