[LU-10628] /usr/bin/obdfilter-survey case=network exits with non zero status, No device found for name echotmp: Invalid argument Created: 07/Feb/18  Updated: 26/Feb/18  Resolved: 26/Feb/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Elena Gryaznova Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Epic/Theme: patch
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Its been observed that whenever /usr/bin/obdfilter-survey case=network is run, the test exits with non zero exit status

[root@fre0111 ~]# NETTYPE=tcp thrlo=2 nobjhi=1 thrhi=4 size=477 case=network rslt_loc=/tmp targets="192.168.101.10" /usr/bin/obdfilter-survey
No device found for name echotmp: Invalid argument

https://testing.hpdd.intel.com/test_logs/f1123de0-0b9d-11e8-a6ad-52540065bddc/show_text

+ NETTYPE=tcp thrlo=8 nobjhi=1 thrhi=16 size=1024 case=network rslt_loc=/tmp targets="10.9.4.213" /usr/bin/obdfilter-survey
No device found for name echotmp: Invalid argument

The reason of this behavior:
/usr/bin/obdfilter-survey contains "set -e" , which causes the exit as the statement "lctl dl| grep osc | grep -v mdt | grep UP" returns 1.

xtrace :

+ '[' network == network ']'
+ server_nid=192.168.101.10
+ '[' -z 192.168.101.10 ']'
     # check for obdecho module on server
+ dsh 192.168.101.10 root 'lsmod | grep obdecho > /dev/null'
+ local node=192.168.101.10
+ local user=root
+ shift 2
+ local 'command=lsmod | grep obdecho > /dev/null'
+ command='export PATH=/sbin:/usr/sbin:$PATH; lsmod | grep obdecho > /dev/null'
+ case $DSH in
+ '[' -n root ']'
+ user=root@
...
+ local 'command=lctl dl | grep obdecho > /dev/null 2>&1'
+ command='export PATH=/sbin:/usr/sbin:$PATH; lctl dl | grep obdecho > /dev/null 2>&1'
+ case $DSH in
+ '[' -n root ']'
+ user=root@
+ ssh root@192.168.101.10 'export PATH=/sbin:/usr/sbin:$PATH; lctl dl | grep obdecho > /dev/null 2>&1'
+ dsh 192.168.101.10 root 'lctl dl | grep ost > /dev/null 2>&1'
+ local node=192.168.101.10
+ local user=root
+ shift 2
+ local 'command=lctl dl | grep ost > /dev/null 2>&1'
+ command='export PATH=/sbin:/usr/sbin:$PATH; lctl dl | grep ost > /dev/null 2>&1'
+ case $DSH in
+ '[' -n root ']'
+ user=root@
+ ssh root@192.168.101.10 'export PATH=/sbin:/usr/sbin:$PATH; lctl dl | grep ost > /dev/null 2>&1'

        ######## Now start client setup #########

++ lctl dl
++ grep -v mdt
++ grep osc
++ grep UP

+ osc_names_str=
+ cleanup 0 1
+ trap 0
...
+ local exit_status=0
+ local host
+ case=network
+ shift
+ (( i = 0 ))
+ (( i < 1 ))
+ host=localhost
+ [[ -n '' ]]
+ (( i++ ))
+ (( i < 1 ))
+ pidcount=0
+ (( i = 0 ))
+ (( i < 0 ))
+ '[' network == network ']'
+ cleanup_network 1
+ local clean_srv_OSS=1
+ lctl  
No device found for name echotmp: Invalid argument


 Comments   
Comment by Gerrit Updater [ 07/Feb/18 ]

Elena Gryaznova (c17455@cray.com) uploaded a new patch: https://review.whamcloud.com/31196
Subject: LU-10628 tests: obdfilter-survey case=network fail
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 67c85ff734170df272524e09480a755ec223b247

Comment by Elena Gryaznova [ 25/Feb/18 ]

With the proposed fix is not useful without LU-7420 fix.
LU-7420 patch https://review.whamcloud.com/#/c/18443/ contains the fix for the described defect.
Should we close this ticket? Or should we wait LU-7420 first?

Thanks.

Comment by Peter Jones [ 26/Feb/18 ]

If I understand correctly then I think that this should be marked as a duplicate of LU-7420

Generated at Sat Feb 10 02:36:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.