[LU-16101] sanity test_27J: read should fail Created: 23/Aug/22 Updated: 08/Feb/24 Resolved: 17/Jul/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.2, Lustre 2.15.3, Lustre 2.15.4 |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Andreas Dilger |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
SLES15 SP4 client |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for jianyu <yujian@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/4751c6e1-6efd-4bec-8150-035008e531bb test_27J failed with the following error: lov_foreign_magic: 0x0BD70BD0 lov_xattr_size: 89 lov_foreign_size: 73 lov_foreign_type: 1 lov_foreign_flags: 0x0000DA08 lfm_magic: 0x0BD70BD0 lfm_length: 73 lfm_type: 0x00000000 (none) lfm_flags: 0x0000DA08 lfm_value: '138822a8-8810-46e8-9f71-e80e38c85596@4921d343-f166-41b7-83de-3ada0c94dfbd' lfs setstripe: setstripe error for '/mnt/lustre/d27J.sanity/f27J.sanity': stripe already set lfs setstripe: setstripe error for '/mnt/lustre/d27J.sanity/f27J.sanity2': stripe already set sanity test_27J: @@@@@@ FAIL: /mnt/lustre/d27J.sanity/f27J.sanity: read should fail Trace dump: VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Minh Diep [ 20/Sep/22 ] |
|
it seems this starting to fail on master after landing |
| Comment by Peter Jones [ 29/Oct/22 ] |
| Comment by James A Simmons [ 29/Oct/22 ] |
|
The only change was in the vvp layer due to an export issue. I doubt its due to this patch. Instead its a regression showing up on this platform. |
| Comment by Neil Brown [ 06/Dec/22 ] |
|
This test failure is due to upstream Commit 8c8387ee3f55 ("mm: stop filemap_read() from grabbing a superfluous page"). This landed in v5.16, and SUSE has backported it to our SLE-15-SP4 kernels. In earlier kernels the read will fail because filemap_read() will call the ->readpage method which detects the problem and reports -ENODATA. In later kernels filemap_read() doesn't bother calling ->readpage because the size of the file is recorded in the inode as zero. As ->readpage is not called, -ENODATA is not reported - there is no error. To trigger a read error, we would need to make the file appear to be larger than 0. If the file size isn't available conveniently we might have to bypass generic_file_read_iter() - for foreign files at least. Or maybe my i_size of foreign files MAX_INT |
| Comment by Sarah Liu [ 21/Dec/22 ] |
|
similar in sanity-lfsck on 2.15.2-rc1 https://testing.whamcloud.com/test_sets/2b0561f0-3a4a-4646-81af-8f3307966170 |
| Comment by Peter Jones [ 11/Feb/23 ] |
|
Can we add this test to the always_except list for SLES15 SP4 while we are working on the proper fix? It seems to be causing quite a bit of disruption... |
| Comment by Gerrit Updater [ 11/Feb/23 ] |
|
"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49970 |
| Comment by Gerrit Updater [ 11/Feb/23 ] |
|
"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49971 |
| Comment by Andreas Dilger [ 12/Feb/23 ] |
Jian's patch will skip this subtest for SLES15sp4 so that it doesn't always fail, but it doesn't fix the problem. Presumably there is something that needs to be done in llite to update the inode with the actual file size, instead of it being zero? |
| Comment by Gerrit Updater [ 17/Feb/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49970/ |
| Comment by Gerrit Updater [ 11/Apr/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49971/ |
| Comment by Jian Yu [ 28/Apr/23 ] |
RHEL 9.2 Beta release with kernel 5.14.0-283.el9 also has this commit: kernel.spec * Mon Oct 24 2022 Frantisek Hrbata <fhrbata@redhat.com> [5.14.0-179.el9] <~snip~> - mm: stop filemap_read() from grabbing a superfluous page (Chris von Recklinghausen) [2120352] |
| Comment by Andreas Dilger [ 28/Apr/23 ] |
|
I was wondering if the test could be changed, rather than expect the read should return an error, it should check that the read returns 0 bytes. However, it seems reasonable that the DAOS code (and user applications) should receive an error if it reads from a layout that is not available. Otherwise, applications may assume the file is corrupted instead of just not mapped in correctly. Consider the behavior for files that are HSM released - we don't want the clients/applications to "successfully" return 0 bytes when reading such a file, but (normally) block until the file is restored, or in the worst case (some kind of client bug or copytool error) return an error because the file is inaccessible. |
| Comment by Gerrit Updater [ 04/Jul/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51567 |
| Comment by Gerrit Updater [ 14/Jul/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51567/ |
| Comment by Gerrit Updater [ 17/Jul/23 ] |
|
|
| Comment by Patrick Farrell [ 17/Jul/23 ] |
|
No we didn't - sorry, sloppy reading on my part. 5.14 is included in the skip range... |
| Comment by Andreas Dilger [ 17/Jul/23 ] |
|
The current patch is already skipping this subtest for all kernels between 5.12.0-6.2.0, which is from when this change was first introduced until Yingjin's fix was landed, so I don't think anything else is needed here. |
| Comment by Bruno Faccini (Inactive) [ 26/Sep/23 ] |
|
> sarah Sarah Liu added a comment - 21/Dec/22 7:15 PM - edited right, sanity-lfsck/test_38() needs the same change for same reason. I have opened LU-17146 to address this. |
| Comment by Guillaume Courrier [ 08/Feb/24 ] |
|
This issue was hit in this patch: https://review.whamcloud.com/c/fs/lustre-release/+/49236 |