[LU-6186] EL7 client sanity-hsm test_70: Failed to start copytool monitor on Created: 30/Jan/15  Updated: 04/Feb/16  Resolved: 22/Jul/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

server: lustre-master build # 2830 RHEL6
client: EL7


Issue Links:
Related
Severity: 3
Rank (Obsolete): 17307

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/659ff066-a496-11e4-a1ef-5254006e85c2.

The sub-test test_70 failed with the following error:

test failed to respond and timed out

test log

== sanity-hsm test 70: Copytool logs JSON register/unregister events to FIFO ========================= 00:43:52 (1422089032)
CMD: onyx-43vm5 pkill -INT -x lhsmtool_posix
CMD: onyx-43vm5 mktemp --tmpdir=/tmp -d sanity-hsm.test_70.XXXX
CMD: onyx-43vm5 mkfifo -m 0644 /tmp/sanity-hsm.test_70.74FK/fifo
CMD: onyx-43vm5 cat /tmp/sanity-hsm.test_70.74FK/fifo > /tmp/sanity-hsm.test_70.74FK/events & echo \$! > /tmp/sanity-hsm.test_70.74FK/monitor_pid
ps: write error: Bad file descriptor
 sanity-hsm test_70: @@@@@@ FAIL: Failed to start copytool monitor on onyx-43vm5 


 Comments   
Comment by Andreas Dilger [ 02/Feb/15 ]

It looks like this might be a bug in sanity-hsm.sh:

copytool_monitor_setup() {
        :
        :
                ps -p $HSMTOOL_MONITOR_PDSH >&- ||
                        error "Failed to start copytool monitor on $agent"

According to bash(1) the use of >&- means:

       Each redirection that may be preceded by a file descriptor number may
       instead be preceded by a word of the form {varname}.  In this case, for
       each redirection operator except >&- and <&-, the shell will allocate a
       file descriptor greater than or equal to 10 and assign it to varname.
       If  >&- or <&- is preceded by {varname}, the value of varname defines
       the file descriptor to close.

Since the ps p option is using $HSMTOOL_MONITOR_PDSH there is no {varname} for the >& option. I don't know what this test is doing, but I suspect that the RHEL7 bash is being more strict about handling this error due to recent security bugs in bash.

Comment by Bruno Faccini (Inactive) [ 24/Feb/15 ]

Seems that RHEL7 bash may require the filedescriptor number/variable to be specified now.
So

"ps -p $HSMTOOL_MONITOR_PDSH 1>&-"

should make it.

Comment by Gerrit Updater [ 24/Feb/15 ]

Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/13857
Subject: LU-6186 tests: specify fd in bash close syntax
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4bec1be8287d5f68b6a85253a29ae66b0e170b90

Comment by Gerrit Updater [ 15/Mar/15 ]

Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/14084
Subject: LU-6186 tests: force test of LU-6186 on el7
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a083876f739bc20eaafa7960c413cd6ec3cdaace

Comment by Sarah Liu [ 29/Jun/15 ]

another instance: https://testing.hpdd.intel.com/test_sets/9dd2d66c-1311-11e5-8d21-5254006e85c2

Comment by Gerrit Updater [ 22/Jul/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13857/
Subject: LU-6186 tests: avoid errors in using >&- bash close syntax
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 68e6eb752f0bbc6e400403a8be4ed671938af075

Comment by Peter Jones [ 22/Jul/15 ]

Landed for 2.8

Generated at Sat Feb 10 01:58:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.