[LU-9438] sanity-lfsck test_17: (1.2) f1 (wrong) size should be 1048576, but got Created: 02/May/17 Updated: 19/Dec/17 Resolved: 19/Dec/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0 |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Casper | Assignee: | Bob Glossman (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
trevis-35, full, SLES12 clients |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
https://testing.hpdd.intel.com/test_sessions/20ddc92f-b9fe-482d-ac1b-1602a513c824 From test_log: CMD: trevis-35vm7 /usr/sbin/lctl set_param fail_val=0 fail_loc=0x1614 fail_val=0 fail_loc=0x1614 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00496945 s, 211 MB/s total: 1 open/close in 0.00 seconds: 479.18 ops/second error: set_param: setting /sys/fs/lustre/ldlm/namespaces/lustre-MDT0000-mdc-ffff88002c485000/lru_size=clear: Invalid argument ldlm.namespaces.lustre-MDT0000-mdc-ffff88002c485000.lock_unused_count=2 CMD: trevis-35vm7 /usr/sbin/lctl set_param fail_loc=0 fail_val=0 fail_loc=0 fail_val=0 /mnt/lustre/d17.sanity-lfsck/f0 and /mnt/lustre/d17.sanity-lfsck/guard use the same OST-objects /mnt/lustre/d17.sanity-lfsck/f1 and /mnt/lustre/d17.sanity-lfsck/guard use the same OST-objects ls: cannot access '/mnt/lustre/d17.sanity-lfsck/f1': Input/output error /usr/lib64/lustre/tests/sanity-lfsck.sh: line 1906: [: -eq: unary operator expected sanity-lfsck test_17: @@@@@@ FAIL: (1.2) f1 (wrong) size should be 1048576, but got Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4931:error() = /usr/lib64/lustre/tests/sanity-lfsck.sh:1907:test_17() = /usr/lib64/lustre/tests/test-framework.sh:5207:run_one() = /usr/lib64/lustre/tests/test-framework.sh:5246:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:5093:run_test() = /usr/lib64/lustre/tests/sanity-lfsck.sh:1940:main() |
| Comments |
| Comment by James Nunez (Inactive) [ 05/May/17 ] |
|
sanity-lfsck test 17 started failing with this error on April 13, 2017. In all cases of this test failure, the client is SLES11SP* or SLES12SP*. There are no failures of this test with this error for RHEL/CentOS clients. The logs for the earliest failures are at: |
| Comment by Peter Jones [ 08/May/17 ] |
|
Bob Could you please look into this one? Thanks Peter |
| Comment by Bob Glossman (Inactive) [ 08/May/17 ] |
|
looking at the specific code in the test it looks like if the file $DIR/$tdir/f1 doesn't exist it would have precisely the effects captured in the fail logs. I'm unable to determine why this file would exist when running on RHEL but not on SLES. Might be the fail output would make more sense if there were conditional tests to check for and correctly report on the absence of files like f0 and f1 and not just assume that ls command on such files would get reasonable reports on stdout with a size that can be parsed. |
| Comment by Andreas Dilger [ 12/May/17 ] |
|
Well, if f1 doesn't exist it would return ENOENT instead of EIO, so the problem isn't that the file is missing. John pointed out the earlier error message: error: set_param: setting /sys/fs/lustre/ldlm/namespaces/lustre-MDT0000-mdc-ffff88002c485000/lru_size=clear: Invalid argument ldlm.namespaces.lustre-MDT0000-mdc-ffff88002c485000.lock_unused_count=2 So it isn't clear why the "clear" failed to cancel the locks? That is something that could easily be attributed to a change in /proc handling for SLES12 and needs to be investigated. |
| Comment by Bob Glossman (Inactive) [ 23/May/17 ] |
|
no progress. low priority. seems like a test only problem most likely, I can make the 'clear' operation fail, but the test passes anyway. Don't think those 2 effects are directly related. |
| Comment by Bob Glossman (Inactive) [ 19/Dec/17 ] |
|
No instances of this failure have been seen since June 2017. |