[LU-12488] strange breakage in sanityn test 93 Created: 28/Jun/19  Updated: 14/Dec/19  Resolved: 14/Dec/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0, Lustre 2.12.3
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Looking into test logs of sanityn test 93 we can see it's trying to call some unknown command in some unclear circumstances.

I don't see anything obviously wrong though.

== sanityn test 93: alloc_rr should not allocate on same ost ========================================= 11:49:54 (1561736994)
fail_loc=0x00000163
fail_loc=0x0
oleg58-server: sh: line 1: 17: command not found
pdsh@oleg58-client: oleg58-server: ssh exited with exit code 127
/mnt/lustre/f93.sanityn-1/file1
lmm_stripe_count:  2
lmm_stripe_size:   1048576
lmm_pattern:       raid0
lmm_layout_gen:    0
lmm_stripe_offset: 1
	obdidx		 objid		 objid		 group
	     1	          6319	       0x18af	             0
	     0	          6336	       0x18c0	             0

this appears to be only happening in ldiskfs-dne situation for me but I am not sure if it's ldiskfs or DNE that's the requirement.

http://testing.linuxhacker.ru:3333/lustre-reports/774/testresults/sanityn-ldiskfs-DNE-centos7_x86_64-centos7_x86_64-retry1/sanityn.test_93.test_log.oleg58-client.log



 Comments   
Comment by Patrick Farrell (Inactive) [ 28/Jun/19 ]

The issue is something with this line:

        do_facet $SINGLEMDS "lctl set_param -n \
                'lod.lustre-MDT*/qos_threshold_rr' $old_rr" 

Note that 17 is the default value of qos_threshold_rr.  So somehow $old_rr is its own command.

If you compare to the previous set_param in this function:

        do_facet $SINGLEMDS lctl set_param -n \
                'lod.lustre-MDT*/qos_threshold_rr' 100 

Something that stands out in the problematic line is the wrapped double and then single quotes.

Presumably that's the problem, confusing some version of bash.

I did a quick patch.

Comment by Patrick Farrell (Inactive) [ 28/Jun/19 ]

Ah, nevermind.

It's this:
local old_rr=$(do_facet $SINGLEMDS $LCTL get_param -n \
lod.lustre-MDT*/qos_threshold_rr | sed -e 's/%//')

If you've got more than one MDT, old_rr isn't one value, it's several with a newline in between them.

Comment by Gerrit Updater [ 28/Jun/19 ]

Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35366
Subject: LU-12488 tests: Fix sanityn 93 for dne configs
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9c0890b5557830851ba42a7707877062160b2a69

Comment by Gerrit Updater [ 14/Dec/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35366/
Subject: LU-12488 tests: Fix sanityn 93 for DNE configs
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 98e7614b9f422d45f5d1789eb550d1b7947522b1

Comment by Peter Jones [ 14/Dec/19 ]

Landed for 2.14

Generated at Sat Feb 10 02:53:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.