[LU-2848] Failure on test suite sanity test_151: NOT IN CACHE before: , after: Created: 21/Feb/13 Updated: 19/Mar/13 Resolved: 19/Mar/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Li Wei (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 6892 | ||||||||||||||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/b63e5d08-7bf0-11e2-897d-52540035b04c. The sub-test test_151 failed with the following error:
Got following error when testing lnet network hash with router. Servers(MDS and OSS) are IB and clients are using tcp. module parameter rnet_htable_size=1070 == sanity test 151: test cache on oss and controls ================================= 20:42:07 (1361421727) 3+0 records in 3+0 records out 12288 bytes (12 kB) copied, 0.00840027 s, 1.5 MB/s sanity test_151: @@@@@@ FAIL: NOT IN CACHE: before: , after: Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:3971:error_noexit() = /usr/lib64/lustre/tests/test-framework.sh:3994:error() = /usr/lib64/lustre/tests/sanity.sh:8373:test_151() = /usr/lib64/lustre/tests/test-framework.sh:4234:run_one() = /usr/lib64/lustre/tests/test-framework.sh:4267:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:4137:run_test() = /usr/lib64/lustre/tests/sanity.sh:8393:main() Dumping lctl log to /tmp/test_logs/2013-02-20/190622/sanity.test_151.*.1361421729.log client-15: Host key verification failed. client-15: rsync: connection unexpectedly closed (0 bytes received so far) [sender] client-15: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6] fat-amd-1-ib: Host key verification failed. fat-amd-1-ib: rsync: connection unexpectedly closed (0 bytes received so far) [sender] fat-amd-1-ib: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6] fat-amd-3-ib: Host key verification failed. fat-amd-3-ib: rsync: connection unexpectedly closed (0 bytes received so far) [sender] fat-amd-3-ib: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6] client-5: Host key verification failed. client-5: rsync: connection unexpectedly closed (0 bytes received so far) [sender] client-5: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6] |
| Comments |
| Comment by Andreas Dilger [ 11/Mar/13 ] |
|
Looks like this is a problem in the /proc parameters, possibly related to get_osd_param() not finding the stats. The "$BEFORE" and "$AFTER" variables are empty, so roc_hit() is not finding the "cache_hit" statistic in " {obdfilter,osd-*}.$FSNAME-OST*.stats". Some other minor cleanups needed in this test:
|
| Comment by Andreas Dilger [ 11/Mar/13 ] |
|
This might also be a test environment issue, or is this because the remote node was down? client-15: Host key verification failed. client-15: rsync: connection unexpectedly closed (0 bytes received so far) [sender] client-15: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6] fat-amd-1-ib: Host key verification failed. fat-amd-1-ib: rsync: connection unexpectedly closed (0 bytes received so far) [sender] fat-amd-1-ib: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6] fat-amd-3-ib: Host key verification failed. fat-amd-3-ib: rsync: connection unexpectedly closed (0 bytes received so far) [sender] fat-amd-3-ib: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6] |
| Comment by Li Wei (Inactive) [ 11/Mar/13 ] |
|
http://review.whamcloud.com/5680 (Diagnostic patch) |
| Comment by Keith Mannthey (Inactive) [ 11/Mar/13 ] |
|
I am working on `roc_hit` and data we get back from it are having issues. I have local debug ongoing to see if I can trigger it and a debug patch at http://review.whamcloud.com/5648. I just thought I would cross point as it looks to be the same basic issue. |
| Comment by Li Wei (Inactive) [ 11/Mar/13 ] |
|
Thanks for letting me know; I'll leave it to you then. |
| Comment by Peter Jones [ 19/Mar/13 ] |
|
duplicate of |