[LU-16541] sanity test_64f: buffered io, not write rpc: grants mismatch: 12656640, expected 4218880 Created: 09/Feb/23 Updated: 20/Dec/23 Resolved: 25/Aug/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.16.0 |
| Fix Version/s: | Lustre 2.16.0, Lustre 2.15.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | Patrick Farrell |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | arm | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for Arshad <arshad.hussain@aeoncomputing.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/a0c04b66-87b8-4b06-aee8-8ae97f9e229d test_64f failed with the following error: buffered io, not write rpc: grants mismatch: 12656640, expected 4218880 Test session details: <<Please provide additional information about the failure here>> VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Chris Horn [ 02/Mar/23 ] |
|
+1 on master https://testing.whamcloud.com/test_sets/3612a72a-bde4-4fcd-9cb9-4eb4ed7ceab8 |
| Comment by Nikitas Angelinas [ 16/Aug/23 ] |
|
+1 on master: https://testing.whamcloud.com/test_sets/6649c5ad-a4db-40ba-abe4-67822a7227c2 |
| Comment by Aurelien Degremont [ 17/Aug/23 ] |
|
+1 on master: https://testing.whamcloud.com/test_sets/e4028aaf-757e-46c5-9ea9-440a7bda4e21 |
| Comment by James A Simmons [ 21/Aug/23 ] |
|
Sadly its not just ARM. Looking for grant mismatch you will a bunch of duplicate tickets. |
| Comment by Andreas Dilger [ 21/Aug/23 ] |
|
Patrick, could you please take a look at this. This subtest is now the top cause of failures, when it previously was only failing on aarch64. Since it is running as part of sanity, this subtest is run 9x per patch review test (unless run with 'trivial') so with a failure rate around 1/16 runs it is almost guaranteed to affect every patch. it would make sense to run test result searches on a per-week basis to see if you can identify when the subtest first started failing on x86, and then use that to identify culprit patches that landed in that time period: Doing a quick search showed that all of the failures on master in the week of 2023-04-16 were for your LU-13805 unaligned DIO patch series at that time, but none of those patches have landed, unless something was split out into a separate patch. However, it may be possible to do some differential analysis between the start of the failures on master vs. b_es6_0 to see when particular patches landed to each branch. |
| Comment by Patrick Farrell [ 21/Aug/23 ] |
|
Yeah, I actually started running this locally and couldn't reproduce, but I'm happy to give this a try. I will see if I can figure something out from the landings, but I may also just go directly at the bug as well. |
| Comment by Andreas Dilger [ 21/Aug/23 ] |
|
It looks like the spike in subtest failures on master started on 2023-08-02. |
| Comment by Patrick Farrell [ 21/Aug/23 ] |
|
Ah, thank you. |
| Comment by Gerrit Updater [ 21/Aug/23 ] |
|
"Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52022 |
| Comment by Gerrit Updater [ 21/Aug/23 ] |
|
"Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52023 |
| Comment by Oleg Drokin [ 21/Aug/23 ] |
|
he other annoyance with this test failure btw is if it hits, it takes 1 hour for the test to finish which makes it a timeout on janitor: sanity.test_64f.test_log.oleg346-client.log2023-08-19 04:46 1.6K sanity.test_64g.test_log.oleg346-client.log2023-08-19 05:46 1.6K Example here, but they ar eall like this. I wonder if the unbounded wait just waits for something else that happens to run for an hour? Nothing obvious in the test output: the one hour duration also holds for the maloo |
| Comment by Gerrit Updater [ 22/Aug/23 ] |
|
"Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52040 |
| Comment by Gerrit Updater [ 25/Aug/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52040/ |
| Comment by Peter Jones [ 25/Aug/23 ] |
|
Merged for 2.16 |
| Comment by Gerrit Updater [ 25/Aug/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52096 |
| Comment by Gerrit Updater [ 20/Dec/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52096/ |