[LU-11906] "ambiguous redirect" from yml.sh when running in multinode setup Created: 30/Jan/19  Updated: 31/Jan/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0, Lustre 2.13.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

when trying to run any Lustre tests with bash as the shell (or via auster) and ost_HOST and/or mds_HOST set to a different node, the scripts fail in yaml.sh line 11:

/home/green/git/lustre-release/lustre/tests/yaml.sh: line 11: $logdir/node.$host.yml: ambiguous redirect

The line in question is

        echo "$line" | sed "s/^${host}: //" | sed "s/^${host}://" \
            >> $logdir/node.$host.yml;

if I put quotes around that "$logdir/node.$host.yml" - the error goes away and everything works as intended.

I guess this is the proper way to fix it?

Also this error does not show up if I use "sh" instead of "bash" to run the scripts, but I must use bash as otherwise flakey logic does not work.



 Comments   
Comment by Oleg Drokin [ 30/Jan/19 ]

also the reason for the warning is because the yaml file does not exist on the server node at the time this is executed.

Comment by Andreas Dilger [ 30/Jan/19 ]

It looks like the code in question is:

split_output() {
    while read line; do
        host=${line%%:*};
        echo "$line" | sed "s/^${host}: //" | sed "s/^${host}://" \
            >> $logdir/node.$host.yml;
    done
}

so it is reading "$line" from the standard input, then dropping everything after the first ":" to find "$host", so it is expecting output from pdsh or similar.

What is in "$host" really depends on what is in "$line" when the function is called. If there is a space before the first ":" then "$host" will have a space in it, and the redirect will be ">> $logdir/node.foo bar.yml" or similar, and it means there are two possible output files, which is of course not allowed. Quoting the output means you'll have an output filename with a space in it, which is probably why the error goes away.

Comment by Oleg Drokin [ 30/Jan/19 ]

Ok, something really strange is going on here. I added printing of the host and that's what I got:

            >> "$logdir/node.$host.yml";
echo we got host "*${host}*" >&2
oleg1-server.localnet: executing yml_node
we got host *Build*
we got host *Build*
we got host *lbats_build_id*
we got host *lbats_build_name*
we got host *architecture*
we got host *os*
we got host *os_distribution*
we got host *lustre_version*
we got host *lustre_build*
we got host *lustre_branch*
we got host *lustre_revision*
we got host *kernel_version*
we got host *file_system*
we got host **
we got host *Node*
we got host *lbats_build_id*
we got host *lbats_build_name*
we got host *architecture*
we got host *os*
we got host *os_distribution*
we got host *lustre_version*
we got host *lustre_build*
we got host *lustre_branch*
we got host *lustre_revision*
we got host *kernel_version*
we got host *file_system*
we got host **
we got host *Node*
we got host *node_name*
we got host *mem_size*
we got host *architecture*
we got host *networks*
we got host *node_name*
we got host *mem_size*
we got host *architecture*
we got host *networks*
we got host *- tcp*
we got host **
we got host *LustreEntities*
we got host *- tcp*
we got host **
we got host *LustreEntities*
Comment by Oleg Drokin [ 30/Jan/19 ]

Ok, so after some digging with JJames, it looks like the code depends on pdsh not being invoked with the -N option that supresses the host: prefix. I do run it with -N and so the code is utterly confusing. Not sure when did that change came to be because in the past -N was mandatory and in fact there are still checks for it in test-framework.sh

Comment by James Nunez (Inactive) [ 31/Jan/19 ]

As Oleg said, the issue here was due to using the ā€˜-N’ option to pdsh. The problem is that some routines in yaml.sh assume that all output from do_rpc_nodes() has the host name prepended to each line. Of course, using pdsh/ssh with the ā€˜-N’ option removes the hostname from each line. In yaml.sh, yml_nodes_file() sends data to split_output() that is assumed to have the host name in every line parsed

   8 split_output() {
   9     while read line; do
  10         host=${line%%:*};
  11         echo "$line" | sed "s/^${host}: //" | sed "s/^${host}://" \
  12             >> $logdir/node.$host.yml;
  13     done
  14 }
  15 
  16 yml_nodes_file() {
  17     export logdir=$1
  18 
  19     if [ -f $logdir/shared ]; then
  20         do_rpc_nodes $(comma_list $(all_nodes)) \
  21             "yml_node >> $logdir/node.\\\$(hostname -s).yml"
  22     else
  23         do_rpc_nodes $(comma_list $(all_nodes)) yml_node | split_output
  24     fi
  25     yml_entities
  26 }

Using the '-N' option for pdsh is not mandatory, but we should be able to write out results when it is specified.

Generated at Sat Feb 10 02:47:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.