[LU-2848] Failure on test suite sanity test_151: NOT IN CACHE before: , after: Created: 21/Feb/13  Updated: 19/Mar/13  Resolved: 19/Mar/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Li Wei (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-2902 sanity test_156: NOT IN CACHE: before... Resolved
Related
is related to LU-2902 sanity test_156: NOT IN CACHE: before... Resolved
is related to LU-2735 sanity.sh test_151: NOT IN CACHE: bef... Resolved
Severity: 3
Rank (Obsolete): 6892

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/b63e5d08-7bf0-11e2-897d-52540035b04c.

The sub-test test_151 failed with the following error:

NOT IN CACHE: before: , after:

Got following error when testing lnet network hash with router. Servers(MDS and OSS) are IB and clients are using tcp. module parameter rnet_htable_size=1070

== sanity test 151: test cache on oss and controls ================================= 20:42:07 (1361421727)
3+0 records in
3+0 records out
12288 bytes (12 kB) copied, 0.00840027 s, 1.5 MB/s
 sanity test_151: @@@@@@ FAIL: NOT IN CACHE: before: , after:  
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:3971:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:3994:error()
  = /usr/lib64/lustre/tests/sanity.sh:8373:test_151()
  = /usr/lib64/lustre/tests/test-framework.sh:4234:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:4267:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4137:run_test()
  = /usr/lib64/lustre/tests/sanity.sh:8393:main()
Dumping lctl log to /tmp/test_logs/2013-02-20/190622/sanity.test_151.*.1361421729.log
client-15: Host key verification failed.
client-15: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
client-15: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
fat-amd-1-ib: Host key verification failed.
fat-amd-1-ib: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
fat-amd-1-ib: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
fat-amd-3-ib: Host key verification failed.
fat-amd-3-ib: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
fat-amd-3-ib: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
client-5: Host key verification failed.
client-5: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
client-5: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]


 Comments   
Comment by Andreas Dilger [ 11/Mar/13 ]

Looks like this is a problem in the /proc parameters, possibly related to get_osd_param() not finding the stats. The "$BEFORE" and "$AFTER" variables are empty, so roc_hit() is not finding the "cache_hit" statistic in "

{obdfilter,osd-*}

.$FSNAME-OST*.stats".

Some other minor cleanups needed in this test:

  • use $(roc_hit) instead of `roc_hit`
  • use 'skip "not cache-capable obdfilter"' and 'skip "oss cache is disabled"' instead of 'echo'
  • wrap at 80 columns
  • tabs for indentation
Comment by Andreas Dilger [ 11/Mar/13 ]

This might also be a test environment issue, or is this because the remote node was down?

client-15: Host key verification failed.
client-15: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
client-15: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
fat-amd-1-ib: Host key verification failed.
fat-amd-1-ib: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
fat-amd-1-ib: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
fat-amd-3-ib: Host key verification failed.
fat-amd-3-ib: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
fat-amd-3-ib: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
Comment by Li Wei (Inactive) [ 11/Mar/13 ]

http://review.whamcloud.com/5680 (Diagnostic patch)

Comment by Keith Mannthey (Inactive) [ 11/Mar/13 ]

I am working on LU-2902 this is the same basic issue test_156 of sanity.

`roc_hit` and data we get back from it are having issues.

I have local debug ongoing to see if I can trigger it and a debug patch at http://review.whamcloud.com/5648.

I just thought I would cross point as it looks to be the same basic issue.

Comment by Li Wei (Inactive) [ 11/Mar/13 ]

Thanks for letting me know; I'll leave it to you then.

Comment by Peter Jones [ 19/Mar/13 ]

duplicate of LU-2902

Generated at Sat Feb 10 01:28:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.