[LU-17450] sanity: interop test failures with master+2.15 Created: 20/Jan/24  Updated: 23/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Critical
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-13805 i/o path: Unaligned direct i/o Open
is related to LU-14361 Add support for statahead pattern wit... Open
is related to LU-17216 enable_health_write, health_check imp... Open
is related to LU-16194 Define negative PFL extent start/end ... Reopened
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/dc77145c-b7d3-4010-a7a2-f8435f9353ff

Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/101478 - 4.18.0-477.27.1.el8_8.x86_64
servers: https://build.whamcloud.com/job/lustre-b2_15/81 - 4.18.0-513.9.1.el8_lustre.x86_64

there are a number of sanity interop test failures with 2.15.4 servers

Please review the test failures to determine for each one:

  • when did the failure first start happening?
  • is this a new test added since 2.15.50 was forked from b2_15?
  • should the test be skipped because of an older MDS or OSS version?
  • is this a legitimate regression?
  • if not a clear case of new test with old server, ensure an LU ticket is open for it with details and add to always_except


 Comments   
Comment by Andreas Dilger [ 21/Jan/24 ]

Make this a blocker for 2.16.0 since we can't release it until we are sure it is not introducing any interop issues, which would be much harder to fix afterward.

It looks at a minimum that there is something wrong with file migrate, but I haven't looked at all of the failures in this interop session yet. There is a small chance that one or two subtest failures relate to the patch that was being tested (which is why I requested interop testing in the first place), but many of the failures have been present for weeks and are not isolated to a single subtest.

Comment by Andreas Dilger [ 23/Jan/24 ]

test_56x, test_56xa, test_56xc - not sure (they are old tests)
test_65p - from LU-16194 lod: define negative extent offset as invalid
test_70a - from LU-17216 ofd: make enable_health_write tunable, requested interop check there
test_119h, test_119i, test_398d, test_398o - from LU-13805 clio: bounce buffer for unaligned DIO
test_123g, test_123h, test_123i - from LU-14361 statahead: Add test for statahead advise

Comment by Andreas Dilger [ 23/Jan/24 ]

I've requested patches to add interop checks for the identified source patches, but not sure why test_56x is failing. Those tests look related to lfs_migrate, but needs some more investigation why it is failing.

Generated at Sat Feb 10 03:35:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.