[LU-14312] Interop: sanity test 272b fails with 'failed to migrate to the new composite layout' Created: 08/Jan/21  Updated: 23/Jan/21  Resolved: 23/Jan/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: interop

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity test_272b fails for interop testing between a master, future 2.14.0, client and older (< 2.13.56 ) servers starting on 01 NOV 2020 for Lustre 2.13.56.63.

Looking at a recent failure at https://testing.whamcloud.com/test_sets/6e4f8226-6269-4979-a03f-af7e98a714be, we see from the suite_log that there is an issue with getting a lock for a DoM file

== sanity test 272b: DoM migration: DOM file to the OST-striped file (plain) ========================= 03:42:54 (1608781374)
CMD: trevis-19vm4 lctl get_param -n osd*.*MDT0000.kbytesfree
1+0 records in
1+0 records out
2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0255055 s, 82.2 MB/s
CMD: trevis-19vm4 lctl get_param -n osd*.*MDT0000.kbytesfree
error: lfs migrate: /mnt/lustre/d272b.sanity/dom: data copy failed: No locks available
 sanity test_272b: @@@@@@ FAIL: failed to migrate to the new composite layout 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6273:error()
  = /usr/lib64/lustre/tests/sanity.sh:20792:test_272b()

Looking at patches that landed during this time related to DoM and locks, we see two related patches:
0a3c72f13045 LU-13645 ldlm: extra checks for DOM locks
067404403634 LU-13645 ldlm: group locks for DOM IBIT lock

Logs for more failures are at
https://testing.whamcloud.com/test_sets/87ee1f60-3a95-411f-af39-9ce5095c85bb
https://testing.whamcloud.com/test_sets/dab54121-a23a-4809-9a3c-93f05b40ecad
https://testing.whamcloud.com/test_sets/9bcc5049-96f5-40c9-9cc1-55ef00f350d7



 Comments   
Comment by Peter Jones [ 09/Jan/21 ]

Mike

Thoughts on this one?

Peter

Comment by Mikhail Pershin [ 11/Jan/21 ]

This is compatibility issue with old servers, I will check that locally to get more detailsĀ 

Comment by Gerrit Updater [ 19/Jan/21 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41268
Subject: LU-14312 ldlm: don't change GROUP lock GID on client
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6a25e5d71d3e2110ff10a63524c3e7ca7e3ac3dd

Comment by Mikhail Pershin [ 19/Jan/21 ]

James, it that possible to check the patch somehow, if we have a configuration when problem occurs always or very often? I wasn't able to reproduce problem locally so need to confirm the patch solves that

Comment by James Nunez (Inactive) [ 19/Jan/21 ]

Mike, It looks like this test fails 100% of the time for master clients and 2.12.5/6 and 2.13.0 severs. Let me add a test parameters line to the patch to mimic this set up.

Comment by Mikhail Pershin [ 20/Jan/21 ]

James, from test results it looks like patch does the job, there are still failed interop tests in sanity which are not related to patch as first sign, could you check and confirm they are not something new but expected?

Comment by Gerrit Updater [ 23/Jan/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41268/
Subject: LU-14312 ldlm: don't change GROUP lock GID on client
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: da18ad5628556cb8eb0bcba786743b2d205a39d9

Comment by Peter Jones [ 23/Jan/21 ]

Landed for 2.14

Generated at Sat Feb 10 03:08:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.