[LU-171] Test failure on test suite runtests Created: 28/Mar/11 Updated: 06/Apr/11 Resolved: 06/Apr/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 10271 |
| Description |
|
This issue was created by maloo for Prakash Surya <surya1@llnl.gov> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/30252906-594f-11e0-a272-52540025f9af. The cache is not being flushed to disk before the Lustre filesystem is unmounted. After the Lustre unmount and remount, the files on the Lustre filesystem are all empty. This is what causes the file diffs to fail. If I add the command 'echo 3 > /proc/sys/vm/drop_caches' just before the command to unmount Lustre, the tests will pass. |
| Comments |
| Comment by Peter Jones [ 30/Mar/11 ] |
|
Yu Jian Could you please look into this one? Thanks Peter |
| Comment by Jian Yu [ 01/Apr/11 ] |
|
Hi Prakash, Did you run 'sync' before 'echo 3 > /proc/sys/vm/drop_caches'? The latter command is just to drop clean pagecache, dentries and inodes from memory. It does not free dirty objects, nor flush the data out to disk. In addition, I saw this message in the test output: |
| Comment by Prakash Surya (Inactive) [ 01/Apr/11 ] |
|
Yu Jian, As long as 'echo 3 > /proc/sys/vm/drop_caches' is being called, the test will pass whether sync is called or not. I have tried both, calling sync before and not calling it at all, and it makes no difference. Also, the test with not pass without 'echo 3 > /proc/sys/vm/drop_caches', even if 'sync' is still called. Just in case this proves useful, here are the results of the four tests: As far as the 'cp: cannot create regular file `/sbin/./mount.lustre': Permission denied' error, I believe that is because of the way our node is configured. I am running these tests on a diskless node, and the /sbin directory is mounted read only, which causes the copy to fail. Although, the mount.lustre binary is already installed in /sbin on the diskless image (which explains the 'Permission denied' error, rather than a 'Read-only file system' error). |
| Comment by Jian Yu [ 02/Apr/11 ] |
|
Thanks Prakash for the tests. From the Maloo reports, I found the kernel version was 2.6.32-14chaos. And the failure reported in this ticket is very similar to the one in bug 23064. There are two patches for this bug: The first patch was pushed to master branch on Nov. 4, 2010. The second one was ported to master branch and landed on Mar. 24, 2011 with the other patches in http://review.whamcloud.com/307. Could you please check whether the Lustre codes you used have the patches or not? If yes, could you please set "PTLDEBUG=-1", reproduce the issue again and upload the lctl debug log file (gathered by the test script right after the test failed) to this ticket? FYI, the issue could not be reproduced on RHEL6/x86_64 with kernel 2.6.32-71.18.2.el6 against the latest master codes on our test node. Here is the successful report: |
| Comment by Prakash Surya (Inactive) [ 06/Apr/11 ] |
|
Thanks for the info Yu Jian. I pulled down the latest master branch this morning and have not been able to reproduce the issue. Both patches you refer to in your previous comment are definitely in this tree. I beleive the previous tree I was working on did not have the patches from http://review.whamcloud.com/307, which is likely the reason for the failed test. Here are the new successful test results: Thanks for the help! I imagine this ticket can be marked as resolved. |
| Comment by Peter Jones [ 06/Apr/11 ] |
|
Thanks for letting us know Prakash! |