[LU-6921] sanityn 77f test failed Lustre: DEBUG MARKER: sanityn test_77f: @@@@@@ FAIL: failed to operate on TBF rules Created: 28/Jul/15  Updated: 18/May/16  Resolved: 20/Oct/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Major
Reporter: Vinayak (Inactive) Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: patch

Attachments: File sanityn test_77f.tar    
Issue Links:
Related
is related to LU-6668 Add tests for TBF Resolved
is related to LU-6939 nrs_tbf.c:155:nrs_tbf_cli_reset()) AS... Resolved
is related to LU-7044 Interop 2.5.3<->master sanityn test_7... Resolved
Severity: 3
Epic: test
Rank (Obsolete): 9223372036854775807

 Description   

stdout.log

== sanityn test 77f: check TBF JobID nrs policy == 15:43:56 (1438011836)
ost.OSS.ost_io.nrs_policies=tbf jobid
ost.OSS.ost_io.nrs_policies=tbf jobid
error: set_param: ost/OSS/ost_io/nrs_tbf_rule: Found no match
 sanityn test_77f: @@@@@@ FAIL: failed to operate on TBF rules 
  Trace dump:
  = /usr/lib64/lustre/tests/../tests/test-framework.sh:4732:error_noexit()
  = /usr/lib64/lustre/tests/../tests/test-framework.sh:4763:error()
  = /usr/lib64/lustre/tests/sanityn.sh:2943:tbf_rule_operate()
  = /usr/lib64/lustre/tests/sanityn.sh:3008:test_77f()
  = /usr/lib64/lustre/tests/../tests/test-framework.sh:5010:run_one()
  = /usr/lib64/lustre/tests/../tests/test-framework.sh:5047:run_one_logged()
  = /usr/lib64/lustre/tests/../tests/test-framework.sh:4864:run_test()
  = /usr/lib64/lustre/tests/sanityn.sh:3043:main()
Dumping lctl log to /tmp/test_logs/1438011830/sanityn.test_77f.*.1438011837.log
FAIL 77f (1s)
sanityn: FAIL: test_77f failed to operate on TBF rules
Stopping clients: fre0107,fre0108 /mnt/lustre2 (opts:)
Stopping client fre0108 /mnt/lustre2 opts:


 Comments   
Comment by Vinayak (Inactive) [ 28/Jul/15 ]

sanityn test_77e, sanityn test_77g failed with the same reason :

== sanityn test 77e: check TBF NID nrs policy == 14:29:25 (1438073965)
ost.OSS.ost_io.nrs_policies=tbf nid
ost.OSS.ost_io.nrs_policies=tbf nid
error: set_param: ost/OSS/ost_io/nrs_tbf_rule: Found no match
 sanityn test_77e: @@@@@@ FAIL: failed to operate on TBF rules 
  Trace dump:
  = /home/build/lustre-xx/lustre/tests/../tests/test-framework.sh:4732:error_noexit()
  = /home/build/lustre-xx/lustre/tests/../tests/test-framework.sh:4763:error()
  = /home/build/lustre-xx/lustre/tests/sanityn.sh:2943:tbf_rule_operate()
  = /home/build/lustre-xx/lustre/tests/sanityn.sh:2957:test_77e()
  = /home/build/lustre-xx/lustre/tests/../tests/test-framework.sh:5010:run_one()
  = /home/build/lustre-xx/lustre/tests/../tests/test-framework.sh:5047:run_one_logged()
  = /home/build/lustre-xx/lustre/tests/../tests/test-framework.sh:4864:run_test()
  = /home/build/lustre-xx/lustre/tests/sanityn.sh:2987:main()
Dumping lctl log to /tmp/test_logs/1438073937/sanityn.test_77e.*.1438073966.log
FAIL 77e (2s)
cleanup: ======================================================
== sanityn test complete, duration 30 sec == 14:29:27 (1438073967)
sanityn: FAIL: test_77e failed to operate on TBF rules
cli-1: warning: 'lctl conf_param' is deprecated, use 'lctl set_param -P' instead
cli-1: warning: 'lctl conf_param' is deprecated, use 'lctl set_param -P' instead

== sanityn test 77g: Change TBF type directly == 15:02:47 (1438075967)
ost.OSS.ost_io.nrs_policies=tbf nid
ost.OSS.ost_io.nrs_policies=tbf nid
ost.OSS.ost_io.nrs_policies=tbf jobid
ost.OSS.ost_io.nrs_policies=tbf jobid
error: set_param: ost/OSS/ost_io/nrs_tbf_rule: Found no match
 sanityn test_77g: @@@@@@ FAIL: failed to operate on TBF rules 
  Trace dump:
  = /home/build/lustre-xx/lustre/tests/../tests/test-framework.sh:4732:error_noexit()
  = /home/build/lustre-xx/lustre/tests/../tests/test-framework.sh:4763:error()
  = /home/build/lustre-xx/lustre/tests/sanityn.sh:2943:tbf_rule_operate()
  = /home/build/lustre-xx/lustre/tests/sanityn.sh:3064:test_77g()
  = /home/build/lustre-xx/lustre/tests/../tests/test-framework.sh:5010:run_one()
  = /home/build/lustre-xx/lustre/tests/../tests/test-framework.sh:5047:run_one_logged()
  = /home/build/lustre-xx/lustre/tests/../tests/test-framework.sh:4864:run_test()
  = /home/build/lustre-xx/lustre/tests/sanityn.sh:3076:main()
Dumping lctl log to /tmp/test_logs/1438075942/sanityn.test_77g.*.1438075968.log
FAIL 77g (2s)

Comment by Andreas Dilger [ 06/Aug/15 ]

HI Li Xi, Wang Shilong,
could you please take a look at this failure in the new TBF tests.

Comment by Li Xi (Inactive) [ 07/Aug/15 ]

Strange, looks like ost/OSS/ost_io/nrs_tbf_rule is missing

Comment by Li Xi (Inactive) [ 13/Aug/15 ]

Hi Vinayak, which branch did you test? Was it a branch with TBF NRS policy?

Comment by Vinayak (Inactive) [ 13/Aug/15 ]

Hi Li Xi
>> which branch did you test?
latest master

>> Was it a branch with TBF NRS policy?
Do we know if complete TBF NFS policy is present for sure in master? I don't know how can we find that out.

Followed this to find TBF related changes.

[root@cli-1 lustre-release]# git branch -a | grep master
* master
  remotes/origin/HEAD -> origin/master
  remotes/origin/master

[root@cli-1 lustre-release]# git log --oneline | grep -i TBF
fb14b7b LU-6668 test: regression tests for NRS TBF policy
e7ab554 LU-5580 ptlrpc: policy switch directly in tbf
75752e9 LU-3319 procfs: Move NRS TBF proc handling to seq_files
0539dc5 LU-4832 ptlrpc: fix incorrect name string in nrs_tbf
33e35c0 LU-3558 ptlrpc: Add the NRS TBF policy

Please let me know if you want any other info or anything you want me to check on my side.
and also please correct me If I am missing any thing.

Comment by Gerrit Updater [ 08/Sep/15 ]

Vinayak (vinayakswami.hariharmath@seagate.com) uploaded a new patch: http://review.whamcloud.com/16305
Subject: LU-6921 test: failed to operate on TBF rules
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e9064c1edc56a21a7687002894d044a3f8f9f1c5

Comment by Vinayak (Inactive) [ 08/Sep/15 ]

Can anyone please let me know what is the behavior of this part (type) of script in sanityn.sh, test_77e, 77f, 77g

        tbf_rule_operate ost0 "start\ localhost\ {0@lo}\ 1000"

It is failing with
"error: set_param: ost/OSS/ost_io/nrs_tbf_rule: Found no match" on my local set up.

Is this behavior same on your side also ? Looks like ost0 is not correctly interpreted on my side.

If passes,
do we need to specify anything explicitly to make test work for ost0 (any changes to frame work or any environment variable need to set etc).

I am using 4 node set up (2 OSTs, 1 MDS, 2 clients)

Comment by Vinayak (Inactive) [ 28/Sep/15 ]

Hello Andreas,

I have rebased the patch.

http://review.whamcloud.com/#/c/16305/. Please let me know if anything else to be done.

Comment by Saurabh Tandan (Inactive) [ 29/Sep/15 ]

Encountered same issue for sanity test_77g.

20:26:40:CMD: onyx-38vm4 lctl set_param ost.OSS.ost_io.nrs_tbf_rule=start\ dd_runas\ {dd.500}\ 50
20:26:40:onyx-38vm4: error: set_param: setting /proc/fs/lustre/ost/OSS/ost_io/nrs_tbf_rule=start dd_runas {dd.500} 50: Invalid argument
20:26:41:ost.OSS.ost_io.nrs_tbf_rule=start dd_runas {dd.500} 50
20:26:42: sanityn test_77g: @@@@@@ FAIL: failed to operate on TBF rules 
Comment by Vinayak (Inactive) [ 29/Sep/15 ]

Hello Suarabh,

Can you please try the patch http://review.whamcloud.com/#/c/16305/ and check if it fixes the problem.

Thanks,

Comment by Kalpak Shah (Inactive) [ 20/Oct/15 ]

http://review.whamcloud.com/#/c/16305/ is ready to be merged - Andreas and Li have given positive reviews.

Comment by Gerrit Updater [ 20/Oct/15 ]

Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/16305/
Subject: LU-6921 test: failed to operate on TBF rules
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: abbef8759e93c31a2c88ba650a04ae9076600afa

Comment by Andreas Dilger [ 20/Oct/15 ]

I finally figured out why this test wasn't failing in our testing - in facet_host() it uses $ost_HOST for any facet named ostX if there isn't an explicit $ost0_HOST set in the configuration.

In any case, the patch has been landed to master for 2.8.0.

Generated at Sat Feb 10 02:04:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.