[LU-11785] conf-sanity test_98 fails with 'Buffer overflow check failed' Created: 14/Dec/18  Updated: 02/Aug/23  Resolved: 11/Apr/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.2, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.4, Lustre 2.12.5, Lustre 2.15.0, Lustre 2.15.3
Fix Version/s: Lustre 2.16.0, Lustre 2.15.4

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Kevin Zhao
Resolution: Fixed Votes: 0
Labels: arm, ppc

Issue Links:
Related
is related to LU-7919 Buffer overflow in mount_lustre: pars... Resolved
is related to LU-10300 Can the Lustre 2.10.x clients support... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

conf-sanity test_98 fails with 'Buffer overflow check failed' since October 26, 2018. So far, this test is only failing with this error for ARM architecture testing.

Looking at the logs of a recent failure, https://testing.whamcloud.com/test_sets/f2d62dae-fdef-11e8-b837-52540065bddc , the last lines of test output in the client test_log are

Started lustre-OST0000
mount lustre on /mnt/lustre.....
Starting client: trevis-79vm11.trevis.whamcloud.com:  -o user_xattr,flock trevis-24vm4@tcp:/lustre /mnt/lustre
CMD: trevis-79vm11.trevis.whamcloud.com mkdir -p /mnt/lustre
CMD: trevis-79vm11.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-24vm4@tcp:/lustre /mnt/lustre
CMD: trevis-79vm11.trevis.whamcloud.com cp /etc/passwd /mnt/lustre/a
CMD: trevis-79vm11.trevis.whamcloud.com rm /mnt/lustre/a
CMD: trevis-79vm11.trevis.whamcloud.com grep /mnt/lustre' ' /proc/mounts > /dev/null
setup single mount lustre success
 conf-sanity test_98: @@@@@@ FAIL: Buffer overflow check failed

The test is fairly simple

7230 test_98()
7231 {
7232         local mountopt
7233         local temp=$MDS_MOUNT_OPTS
7234 
7235         setup
7236         check_mount || error "mount failed"
7237         mountopt="user_xattr"
7238         for ((x = 1; x <= 400; x++)); do
7239                 mountopt="$mountopt,user_xattr"
7240         done
7241         remount_client $mountopt $MOUNT  2>&1 | grep "too long" ||
7242                 error "Buffer overflow check failed"
7243         cleanup || error "cleanup failed"
7244 }
7245 run_test 98 "Buffer-overflow check while parsing mount_opts"

In the console log of client 1 (vm11), we see the long mount option

[45496.444538] Lustre: DEBUG MARKER: mount -t lustre -o remount,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xattr,user_xat
[45497.132159] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_98: @@@@@@ FAIL: Buffer overflow check failed 

but there are no obvious errors in the console logs.

Logs for other failures are at
https://testing.whamcloud.com/test_sets/8ca4b49e-d9d0-11e8-b46b-52540065bddc
https://testing.whamcloud.com/test_sets/876f284e-df5f-11e8-a871-52540065bddc
https://testing.whamcloud.com/test_sets/e7d91f16-f7a9-11e8-86c0-52540065bddc



 Comments   
Comment by Peter Jones [ 17/Dec/18 ]

Jian

Could you please investigate?

Peter

Comment by James A Simmons [ 14/Jun/22 ]

Is this still true?

Comment by Kevin Zhao [ 25/Jul/22 ]

Still fail in our test infra: https://testing.whamcloud.com/sub_tests/85914d66-28c0-4eee-9b38-5b4156a49dd4

 

will work on  a fix

Comment by Jian Yu [ 25/Jul/22 ]

Thank you, Kevin.

Comment by Andreas Dilger [ 27/Oct/22 ]

It looks like the test itself will always fail for PAGE_SIZE=65536 (aarch64 and ppcle64) because the test is trying to create a "too long" mount option string using 400 times ",user_xattr". That is 400x11=4400 bytes, which is only long enough to cause a failure on x86 with PAGE_SIZE=4096.

The test can trivially be updated to run the loop longer when needed:

        for ((x = 0; x <= PAGE_SIZE/11; x++)); do
                 mountopt="$mountopt,user_xattr"
        done
Comment by Andreas Dilger [ 27/Oct/22 ]

The test can trivially be updated to run the loop longer when needed:

        for ((x = 0; x <= PAGE_SIZE/11; x++)); do
                 mountopt="$mountopt,user_xattr"
        done

kevin.zhao or xinliang, any chance you can submit the trivial patch to fix this, and verify it passes on aarch64?

Comment by Kevin Zhao [ 28/Oct/22 ]

Hi Andreas,

Thanks for the comment, I will submit the trivial fix.

Comment by Andreas Dilger [ 28/Oct/22 ]

Strange. There was an existing patch for this ticket from Kevin, but it was not linked here.

Comment by Andreas Dilger [ 28/Oct/22 ]

"Kevin Zhao <kevin.zhao@linaro.org>" uploaded a new patch: https://review.whamcloud.com/48177
Subject: LU-11785 tests: fix conf-sanity/98 mount check on 64K page
Project: fs/lustre-release
Branch: master
Current Patch Set: 6
Commit: 2dcbc53a685c12824de020d6b4567360ef996f90

Comment by Xinliang Liu [ 31/Oct/22 ]

Verified that the fix make conf-sanity test_98 pass in aarch64 one node test environment. Thanks Andreas Dilger and Kevin for posting the fix.

Comment by Gerrit Updater [ 11/Apr/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48177/
Subject: LU-11785 tests: fix conf-sanity/98 mount check on 64K page
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4068ca725954db2a1fc42bf8d184f4672c2ed113

Comment by Peter Jones [ 11/Apr/23 ]

Landed for 2.16

Comment by Gerrit Updater [ 12/Jun/23 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51288
Subject: LU-11785 tests: fix conf-sanity/98 mount check on 64K page
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: dc6dffbd28e7ac81f279ebfeb80531444e5425c5

Comment by Gerrit Updater [ 02/Aug/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51288/
Subject: LU-11785 tests: fix conf-sanity/98 mount check on 64K page
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: 3e9a06398a168a52f72a65450e5249ed502a86cd

Generated at Sat Feb 10 02:46:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.