[LU-8969] sanity test_56x: read failed: Invalid argument Created: 23/Dec/16 Updated: 28/Mar/17 Resolved: 09/Jan/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
This issue was created by maloo for Bob Glossman <bob.glossman@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/3f9ace90-c8ca-11e6-8911-5254006e85c2. The sub-test test_56x failed with the following error: migrate failed rc = 22 fails in the same test as previous bug Info required for matching: sanity 56x |
| Comments |
| Comment by Bob Glossman (Inactive) [ 27/Dec/16 ] |
|
another on master, sles12sp2: |
| Comment by Bob Glossman (Inactive) [ 28/Dec/16 ] |
|
have discovered by trial and error that eliminating the use of O_DIRECT in lfs.c makes this problem go away. not entirely sure why. |
| Comment by Bob Glossman (Inactive) [ 30/Dec/16 ] |
|
This appears to be due to a difference in arg checking in linux 4.4 vs. earlier linux versions. I constructed a simple test program that just opens an existing file with O_DIRECT and reads through it 1MB at a time. This is similar to migrate_copy_data(). On earlier versions of linux the read returns 0 on the read at EOF as expected regardless of the length of the file. On linux 4.4, the sles12sp2 kernel version, it returns 0 on the read at EOF only if the file length is a multiple of page_size. If it isn't then the read fails, returning -1 with an errno of EINVAL. This is true of all fs types, not just lustre. I strongly suspect that in linux 4.4 the kernel does more careful and complete checking of the read args for O_DIRECT reads and returns EINVAL on an EOF read when EOF isn't a page_size boundary. This makes sense since such a read violates the requirements of O_DIRECT I/O that enforces restrictions on the file offsets of I/O. I speculate that in earlier linux'es the checking for EOF on reads was done before the checking of args, and in linux 4.4 the checking on args is done first. Again, I emphasize this difference in behavior is seen on all fs types, not just lustre. This is strong evidence that it isn't primarily a lustre bug but due to the I/O logic in the underlying kernel. This being the case it makes sense that turning off O_DIRECT makes the problem go away. The only question is if that solution is permissible in the lfs_migrate() code. |
| Comment by James A Simmons [ 30/Dec/16 ] |
|
I have seen this failure for some time in my upstream client testing. I always assumed it was some faulty port. Thanks for tracking this down. |
| Comment by Bob Glossman (Inactive) [ 30/Dec/16 ] |
|
will push a patch turning off O_DIRECT. I expect reviewers will tell me if that's not allowed. |
| Comment by Gerrit Updater [ 30/Dec/16 ] |
|
Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/24549 |
| Comment by Gerrit Updater [ 30/Dec/16 ] |
|
Yang Sheng (yang.sheng@intel.com) uploaded a new patch: https://review.whamcloud.com/24552 |
| Comment by James A Simmons [ 04/Jan/17 ] |
|
Are both patches needed? |
| Comment by Bob Glossman (Inactive) [ 04/Jan/17 ] |
|
No. My patch is Abandoned. Only https://review.whamcloud.com/24552 is needed. |
| Comment by Gerrit Updater [ 09/Jan/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24552/ |
| Comment by Yang Sheng [ 09/Jan/17 ] |
|
Patch landed. Close this ticket. |