[LU-6345] sanity-hsm test_30c: Binary overwritten during exec Created: 06/Mar/15  Updated: 23/Dec/15  Resolved: 23/Mar/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 17760

 Description   

This issue was created by maloo for John Hammond <john.hammond@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/e5fd8970-7648-11e4-ad19-5254006e85c2.

The sub-test test_30c failed with the following error:

Binary overwritten during exec

The open for write fails, but cmp is still indicating that the file has changed.

CMD: onyx-33vm5 /mnt/lustre/d30c.sanity-hsm/SLEEP 10
/usr/lib64/lustre/tests/sanity-hsm.sh: line 2141: /mnt/lustre/d30c.sanity-hsm/SLEEP: Text file busy
/bin/sleep /mnt/lustre/d30c.sanity-hsm/SLEEP differ: byte 41, line 1
 sanity-hsm test_30c: @@@@@@ FAIL: Binary overwritten during exec
...

Info required for matching: sanity-hsm 30c



 Comments   
Comment by John Hammond [ 06/Mar/15 ]

DATE GERRIT PARENT SESSION

2014-11-27 12810 v2_6_90_0-31-g648c73b https://testing.hpdd.intel.com/sub_tests/ec700076-7648-11e4-ad19-5254006e85c2

2014-12-11 12961 v2_6_91_0-4-gfeaeafe https://testing.hpdd.intel.com/sub_tests/1cb013da-8106-11e4-b2c2-5254006e85c2

2015-02-21 13832 v2_6_94_0-14-g5bae33a https://testing.hpdd.intel.com/sub_tests/a8049d26-b9c7-11e4-8278-5254006e85c2

2015-03-05 13126 v2_7_50_0-10-g56875fd https://testing.hpdd.intel.com/sub_tests/b94b144e-c349-11e4-b384-5254006e85c2

Comment by John Hammond [ 06/Mar/15 ]

The times of day of the failures

== sanity-hsm test 30c: Update during exec of released file must fail == 04:17:55 (1425557875)
== sanity-hsm test 30c: Update during exec of released file must fail == 07:31:28 (1424503888)
== sanity-hsm test 30c: Update during exec of released file must fail == 04:17:50 (1418271470)
== sanity-hsm test 30c: Update during exec of released file must fail == 04:35:26 (1417091726)

along with the consistent offset reported by cmp

/bin/sleep /mnt/lustre/d30c.sanity-hsm/SLEEP differ: byte 41, line 1
/bin/sleep /mnt/lustre/d30c.sanity-hsm/SLEEP differ: byte 41, line 1
/bin/sleep /mnt/lustre/d30c.sanity-hsm/SLEEP differ: byte 41, line 1
/bin/sleep /mnt/lustre/d30c.sanity-hsm/SLEEP differ: byte 41, line 1

suggest that this may be due to /usr/sbin/prelink being run on /bin/sleep from /etc/cron.daily/prelink. I got a similar error by running 'prelink -af' and 30c in tow simultaneous loops.

Comment by Jodi Levi (Inactive) [ 06/Mar/15 ]

Emoly,
could you please have a look at this one?
Thank you!

Comment by Andreas Dilger [ 06/Mar/15 ]

Oleg suggests that the root problem is that the source /bin/sleep binary is being modified while the test is running, since we always run on a newly-installed system. One solution is probably to make a copy of /bin/sleep to some temporary location first, then copy from that temporary file into Lustre. Another solution is to do a checksum of /bin/sleep before the test and then not mark the test a failure if the checksum has changed since the test started.

Comment by Gerrit Updater [ 10/Mar/15 ]

Emoly Liu (emoly.liu@intel.com) uploaded a new patch: http://review.whamcloud.com/14025
Subject: LU-6345 test: compare /bin/sleep in sanity-hsm.sh test_30c
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4236054300f2fe8cad18dcd24338c7096b41a8d8

Comment by Gerrit Updater [ 18/Mar/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14025/
Subject: LU-6345 test: compare /bin/sleep in sanity-hsm.sh test_30c
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 79020798bdcc09477b0b4d05b1d35e2432909aab

Generated at Sat Feb 10 01:59:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.