[LU-8969] sanity test_56x: read failed: Invalid argument Created: 23/Dec/16  Updated: 28/Mar/17  Resolved: 09/Jan/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/3f9ace90-c8ca-11e6-8911-5254006e85c2.

The sub-test test_56x failed with the following error:

migrate failed rc = 22

fails in the same test as previous bug LU-8559, but different errors.

Info required for matching: sanity 56x



 Comments   
Comment by Bob Glossman (Inactive) [ 27/Dec/16 ]

another on master, sles12sp2:
https://testing.hpdd.intel.com/test_sets/ef072448-cbed-11e6-8580-5254006e85c2

Comment by Bob Glossman (Inactive) [ 28/Dec/16 ]

have discovered by trial and error that eliminating the use of O_DIRECT in lfs.c makes this problem go away. not entirely sure why.

Comment by Bob Glossman (Inactive) [ 30/Dec/16 ]

This appears to be due to a difference in arg checking in linux 4.4 vs. earlier linux versions. I constructed a simple test program that just opens an existing file with O_DIRECT and reads through it 1MB at a time. This is similar to migrate_copy_data().

On earlier versions of linux the read returns 0 on the read at EOF as expected regardless of the length of the file.

On linux 4.4, the sles12sp2 kernel version, it returns 0 on the read at EOF only if the file length is a multiple of page_size. If it isn't then the read fails, returning -1 with an errno of EINVAL. This is true of all fs types, not just lustre.

I strongly suspect that in linux 4.4 the kernel does more careful and complete checking of the read args for O_DIRECT reads and returns EINVAL on an EOF read when EOF isn't a page_size boundary. This makes sense since such a read violates the requirements of O_DIRECT I/O that enforces restrictions on the file offsets of I/O. I speculate that in earlier linux'es the checking for EOF on reads was done before the checking of args, and in linux 4.4 the checking on args is done first. Again, I emphasize this difference in behavior is seen on all fs types, not just lustre. This is strong evidence that it isn't primarily a lustre bug but due to the I/O logic in the underlying kernel.

This being the case it makes sense that turning off O_DIRECT makes the problem go away. The only question is if that solution is permissible in the lfs_migrate() code.

Comment by James A Simmons [ 30/Dec/16 ]

I have seen this failure for some time in my upstream client testing. I always assumed it was some faulty port. Thanks for tracking this down.

Comment by Bob Glossman (Inactive) [ 30/Dec/16 ]

will push a patch turning off O_DIRECT. I expect reviewers will tell me if that's not allowed.

Comment by Gerrit Updater [ 30/Dec/16 ]

Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/24549
Subject: LU-8969 utils: avoid use of O_DIRECT in lfs_migrate
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 95843e1f6d8f46a2e305cbfdcb19dc0bfb3d5ca6

Comment by Gerrit Updater [ 30/Dec/16 ]

Yang Sheng (yang.sheng@intel.com) uploaded a new patch: https://review.whamcloud.com/24552
Subject: LU-8969 llite: sanity test_56x read failed: Invalid argument
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 23c9778cc006ea75cd2f01d7baa4561c4d5219a7

Comment by James A Simmons [ 04/Jan/17 ]

Are both patches needed?

Comment by Bob Glossman (Inactive) [ 04/Jan/17 ]

No. My patch is Abandoned. Only https://review.whamcloud.com/24552 is needed.

Comment by Gerrit Updater [ 09/Jan/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24552/
Subject: LU-8969 llite: sanity test_56x read failed: Invalid argument
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2730656aa4d4326e67615e9652c576545403ed15

Comment by Yang Sheng [ 09/Jan/17 ]

Patch landed. Close this ticket.

Generated at Sat Feb 10 02:22:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.