[LU-6937] sanity-hsm test_106 test_403: error opening lhsmtool_posix: No such file or directory Created: 31/Jul/15 Updated: 04/Aug/17 Resolved: 11/May/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0, Lustre 2.9.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
client and server: lustre-master build # 3120 RHEL7 DNE |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
This issue was created by maloo for sarah_lw <wei3.liu@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/d07a7a8c-3799-11e5-9d53-5254006e85c2. The sub-test test_106 failed with the following error: uuid 0 not found in agent list on mds1 test log == sanity-hsm test 106: Copytool register/unregister ================================================= 13:35:57 (1438202157) CMD: onyx-30vm5 pkill -CONT -x lhsmtool_posix Purging archive on onyx-30vm5 CMD: onyx-30vm5 rm -rf /tmp/arc1/* Starting copytool agt1 on onyx-30vm5 CMD: onyx-30vm5 mkdir -p /tmp/arc1 CMD: onyx-30vm5 lhsmtool_posix --daemon --hsm-root /tmp/arc1 --bandwidth 1 /mnt/lustre2 < /dev/null > /logdir/test_logs/2015-07-29/lustre-master-el7-x86_64--full--1_7_1__3120__-70238999807760-034047/sanity-hsm.test_106.copytool_log.onyx-30vm5.log 2>&1 CMD: onyx-30vm5 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests//usr/lib64/lustre/tests/utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/openmpi/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/usr/sbin:/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh pgrep -fl lhsmtool_posix CMD: onyx-30vm5 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests//usr/lib64/lustre/tests/utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/openmpi/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/usr/sbin:/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh get_client_uuid lhsmtool_posix bash bash sh onyx-30vm5: error opening lhsmtool_posix: No such file or directory (2) onyx-30vm5: cannot get name for `lhsmtool_posix': No such file or directory onyx-30vm5: error: get_param: llite//uuid: Found no match CMD: onyx-30vm3 /usr/sbin/lctl get_param -n mdt.lustre-MDT000.hsm.agents | grep 0 onyx-30vm3: error: get_param: mdt/lustre-MDT000/hsm/agents: Found no match sanity-hsm test_106: @@@@@@ FAIL: uuid 0 not found in agent list on mds1 Info required for matching: sanity-hsm 106 |
| Comments |
| Comment by Saurabh Tandan (Inactive) [ 11/Dec/15 ] |
|
master, build# 3264, 2.7.64 tag |
| Comment by Saurabh Tandan (Inactive) [ 16/Dec/15 ] |
|
Server: 2.5.5, b2_5_fe/62 |
| Comment by Saurabh Tandan (Inactive) [ 18/Dec/15 ] |
|
Another instance forEL7.1 Server/EL7.1 Client - ZFS |
| Comment by Saurabh Tandan (Inactive) [ 18/Dec/15 ] |
|
Another instance for EL7.1 Server/EL7.1 Client - DNE |
| Comment by Saurabh Tandan (Inactive) [ 23/Dec/15 ] |
|
Another instance found for : |
| Comment by Saurabh Tandan (Inactive) [ 19/Jan/16 ] |
|
Another instance found for interop : 2.7.1 Server/EL7 Client |
| Comment by Saurabh Tandan (Inactive) [ 20/Jan/16 ] |
|
Another instance found for interop : 2.5.5 Server/EL7 Client |
| Comment by Saurabh Tandan (Inactive) [ 03/Feb/16 ] |
|
Encountered another instance for tag 2.7.66 for FULL - EL7.1 Server/EL7.1 Client , master , build# 3314. |
| Comment by Saurabh Tandan (Inactive) [ 04/Feb/16 ] |
|
Another instance for FULL - EL7.1 Server/EL7.1 Client - DNE, master, build# 3314 Another instance on master for FULL - EL7.1 Server/EL7.1 Client - ZFS, build# 3314 |
| Comment by Saurabh Tandan (Inactive) [ 10/Feb/16 ] |
|
Another instance found for interop tag 2.7.66 - 2.7.1 Server/EL7 Client, build# 3316 Another instance found for interop tag 2.7.66 - 2.5.5 Server/EL7 Client, build# 3316 Another instance found for Full tag 2.7.66 - EL7.1 Server/EL7.1 Client, build# 3314 Another instance found for Full tag 2.7.66 -EL7.1 Server/EL7.1 Client - ZFS, build# 3314 Another instance found for Full tag 2.7.66 -EL7.1 Server/EL7.1 Client - DNE, build# 3314 |
| Comment by Saurabh Tandan (Inactive) [ 24/Feb/16 ] |
|
Another instance found for interop - 2.7.1 Server/EL7 Client, tag 2.7.90. |
| Comment by Niu Yawei (Inactive) [ 11/Apr/16 ] |
|
Another instance: https://testing.hpdd.intel.com/test_sets/8f52b0de-fe26-11e5-b1c9-5254006e85c2 lhsmtools_posix isn't found, It looks like a testing environment problem. |
| Comment by Andreas Dilger [ 29/Apr/16 ] |
|
This test is failing quite regularly, about 100 times in the past 4 weeks. Many of the failures are in "full" test runs, so I'm not sure if there is a different test configuration there (RHEL7 vs. RHEL6) or some other potential problem. |
| Comment by Andreas Dilger [ 29/Apr/16 ] |
|
Both sanity-hsm test_106 and test_403 also appear to be failing much more frequently on the PFL patches. I'm not sure why that is, but as many failures are on PFL patches as all other patches put together, so there may be something in the PFL patches that are triggering this problem more frequently. |
| Comment by Sarah Liu [ 04/May/16 ] |
|
from the previous comments, it looks like this issue only affects RHEL7 |
| Comment by John Hammond [ 10/May/16 ] |
|
This is likely due to a change in the output of pgrep. With the pgrep from CentOS 6.7: # pgrep -V pgrep (procps version 3.2.8) # pgrep -fl lhsmtool_posix 24726 lhsmtool_posix --daemon --hsm-root /tmp/arc1/shsm --bandwidth 1 /mnt/lustre2 With a newer pgrep: # pgrep -V pgrep from procps-ng 3.3.9 # pgrep -fl lhsmtool_posix 24735 lhsmtool_posix |
| Comment by Gerrit Updater [ 10/May/16 ] |
|
John L. Hammond (john.hammond@intel.com) uploaded a new patch: http://review.whamcloud.com/20093 |
| Comment by Gerrit Updater [ 10/May/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/20096 |
| Comment by Gerrit Updater [ 11/May/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20093/ |
| Comment by Andreas Dilger [ 11/May/16 ] |
|
It's too bad that this was failing for so long on RHEL7 but we didn't fix it until it broke master testing... At least it is now fixed. |
| Comment by Gerrit Updater [ 20/May/16 ] |
|
Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: http://review.whamcloud.com/20341 |
| Comment by Gerrit Updater [ 30/May/16 ] |
|
Amir Shehata (amir.shehata@intel.com) merged in patch http://review.whamcloud.com/20341/ |
| Comment by Gerrit Updater [ 27/Jul/16 ] |
|
Jinshan Xiong (jinshan.xiong@intel.com) merged in patch http://review.whamcloud.com/20295/ |
| Comment by nasf (Inactive) [ 05/Aug/16 ] |
|
Hit it on b2_8: |
| Comment by Saurabh Tandan (Inactive) [ 06/Sep/16 ] |
|
Still hit it on master: |