[LU-11906] "ambiguous redirect" from yml.sh when running in multinode setup Created: 30/Jan/19 Updated: 31/Jan/19 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.13.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Oleg Drokin | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
when trying to run any Lustre tests with bash as the shell (or via auster) and ost_HOST and/or mds_HOST set to a different node, the scripts fail in yaml.sh line 11: /home/green/git/lustre-release/lustre/tests/yaml.sh: line 11: $logdir/node.$host.yml: ambiguous redirect The line in question is
echo "$line" | sed "s/^${host}: //" | sed "s/^${host}://" \
>> $logdir/node.$host.yml;
if I put quotes around that "$logdir/node.$host.yml" - the error goes away and everything works as intended. I guess this is the proper way to fix it? Also this error does not show up if I use "sh" instead of "bash" to run the scripts, but I must use bash as otherwise flakey logic does not work. |
| Comments |
| Comment by Oleg Drokin [ 30/Jan/19 ] |
|
also the reason for the warning is because the yaml file does not exist on the server node at the time this is executed. |
| Comment by Andreas Dilger [ 30/Jan/19 ] |
|
It looks like the code in question is:
split_output() {
while read line; do
host=${line%%:*};
echo "$line" | sed "s/^${host}: //" | sed "s/^${host}://" \
>> $logdir/node.$host.yml;
done
}
so it is reading "$line" from the standard input, then dropping everything after the first ":" to find "$host", so it is expecting output from pdsh or similar. What is in "$host" really depends on what is in "$line" when the function is called. If there is a space before the first ":" then "$host" will have a space in it, and the redirect will be ">> $logdir/node.foo bar.yml" or similar, and it means there are two possible output files, which is of course not allowed. Quoting the output means you'll have an output filename with a space in it, which is probably why the error goes away. |
| Comment by Oleg Drokin [ 30/Jan/19 ] |
|
Ok, something really strange is going on here. I added printing of the host and that's what I got:
>> "$logdir/node.$host.yml";
echo we got host "*${host}*" >&2
oleg1-server.localnet: executing yml_node we got host *Build* we got host *Build* we got host *lbats_build_id* we got host *lbats_build_name* we got host *architecture* we got host *os* we got host *os_distribution* we got host *lustre_version* we got host *lustre_build* we got host *lustre_branch* we got host *lustre_revision* we got host *kernel_version* we got host *file_system* we got host ** we got host *Node* we got host *lbats_build_id* we got host *lbats_build_name* we got host *architecture* we got host *os* we got host *os_distribution* we got host *lustre_version* we got host *lustre_build* we got host *lustre_branch* we got host *lustre_revision* we got host *kernel_version* we got host *file_system* we got host ** we got host *Node* we got host *node_name* we got host *mem_size* we got host *architecture* we got host *networks* we got host *node_name* we got host *mem_size* we got host *architecture* we got host *networks* we got host *- tcp* we got host ** we got host *LustreEntities* we got host *- tcp* we got host ** we got host *LustreEntities* |
| Comment by Oleg Drokin [ 30/Jan/19 ] |
|
Ok, so after some digging with JJames, it looks like the code depends on pdsh not being invoked with the -N option that supresses the host: prefix. I do run it with -N and so the code is utterly confusing. Not sure when did that change came to be because in the past -N was mandatory and in fact there are still checks for it in test-framework.sh |
| Comment by James Nunez (Inactive) [ 31/Jan/19 ] |
|
As Oleg said, the issue here was due to using the ā-Nā option to pdsh. The problem is that some routines in yaml.sh assume that all output from do_rpc_nodes() has the host name prepended to each line. Of course, using pdsh/ssh with the ā-Nā option removes the hostname from each line. In yaml.sh, yml_nodes_file() sends data to split_output() that is assumed to have the host name in every line parsed
8 split_output() {
9 while read line; do
10 host=${line%%:*};
11 echo "$line" | sed "s/^${host}: //" | sed "s/^${host}://" \
12 >> $logdir/node.$host.yml;
13 done
14 }
15
16 yml_nodes_file() {
17 export logdir=$1
18
19 if [ -f $logdir/shared ]; then
20 do_rpc_nodes $(comma_list $(all_nodes)) \
21 "yml_node >> $logdir/node.\\\$(hostname -s).yml"
22 else
23 do_rpc_nodes $(comma_list $(all_nodes)) yml_node | split_output
24 fi
25 yml_entities
26 }
Using the '-N' option for pdsh is not mandatory, but we should be able to write out results when it is specified. |