[LU-11642] Data lost after migrate striped dir Created: 08/Nov/18  Updated: 21/Nov/18  Resolved: 21/Nov/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Critical
Reporter: Sarah Liu Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None
Environment:

server and client are 2.11.56 . tag-2.11.56_55_g4afee32


Issue Links:
Related
is related to LU-11520 separate tests and bugfixes Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

system setup as 2 MDS with 2 MDTs on each, 1 OSS with 1 OST, zfs as underline fs. 1 client .

test steps:
1. make striped dir cross MDT0 and MDT1 on MDS1
2. touch a file and write some data
3. migrate the dir to MDT1 and MDT3 successful
4. check the data and it lost.

[root@trevis-60vm7 lustre]# lfs mkdir -c 2 -i 0 test
[root@trevis-60vm7 lustre]# lfs getdirstripe test
lmv_stripe_count: 2 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx		 FID[seq:oid:ver]
     0		 [0x200000400:0x6:0x0]		
     1		 [0x240000401:0x6:0x0]		
[root@trevis-60vm7 lustre]# cd test/
[root@trevis-60vm7 test]# touch foo
[root@trevis-60vm7 test]# echo abcd > foo
[root@trevis-60vm7 test]# cat foo
abcd
[root@trevis-60vm7 test]# cd ..
[root@trevis-60vm7 lustre]# ls -al test/
total 34
drwxr-xr-x 2 root root 22528 Nov  8 01:29 .
drwxr-xr-x 4 root root 10752 Nov  8 01:29 ..
-rw-r--r-- 1 root root     5 Nov  8 01:29 foo
[root@trevis-60vm7 lustre]# lfs migrate -m 1,3 test/
[root@trevis-60vm7 lustre]# lfs getdirstripe test
lmv_stripe_count: 2 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx		 FID[seq:oid:ver]
     1		 [0x240000400:0x6:0x0]		
     3		 [0x2c0000402:0x6:0x0]		
[root@trevis-60vm7 lustre]# ls -al test/
total 33
drwxr-xr-x 2 root root 22528 Nov  8 01:29 .
drwxr-xr-x 4 root root 10752 Nov  8 01:30 ..
-rw-r--r-- 1 root root     0 Nov  8 01:29 foo
[root@trevis-60vm7 lustre]# cat test/foo 
[root@trevis-60vm7 lustre]# 

5. create another file with data under test dir, the dir is cross MDT1, and MDT3 right now
6. migrate the dir back to MDT0 and MDT1
7. check the new file and data is still there

[root@trevis-60vm7 lustre]# cd test/
[root@trevis-60vm7 test]# touch foo-2
[root@trevis-60vm7 test]# echo 1234 > foo-2 
[root@trevis-60vm7 test]# cat foo-2 
1234
[root@trevis-60vm7 test]# ls -al
total 34
drwxr-xr-x 2 root root 22528 Nov  8 01:31 .
drwxr-xr-x 4 root root 10752 Nov  8 01:30 ..
-rw-r--r-- 1 root root     0 Nov  8 01:29 foo
-rw-r--r-- 1 root root     5 Nov  8 01:31 foo-2
[root@trevis-60vm7 test]# cd ..
[root@trevis-60vm7 lustre]# lfs migrate -m 0,1 test/
[root@trevis-60vm7 lustre]# lfs getdirstripe test/
lmv_stripe_count: 2 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx		 FID[seq:oid:ver]
     0		 [0x200000400:0x7:0x0]		
     1		 [0x240000401:0x7:0x0]		
[root@trevis-60vm7 lustre]# ls -al test/
total 39
drwxr-xr-x 2 root root 22528 Nov  8 01:31 .
drwxr-xr-x 4 root root 10752 Nov  8 01:31 ..
-rw-r--r-- 1 root root     0 Nov  8 01:29 foo
-rw-r--r-- 1 root root     5 Nov  8 01:31 foo-2
[root@trevis-60vm7 lustre]# cat foo-2
cat: foo-2: No such file or directory
[root@trevis-60vm7 lustre]# cat test/foo-2 
1234
[root@trevis-60vm7 lustre]# 


 Comments   
Comment by Sarah Liu [ 08/Nov/18 ]

the same issue also happened on ldiskfs.

I also tried migrate the dir from MDT 0,1 to MDT 2,3, data was still there

Comment by Andreas Dilger [ 08/Nov/18 ]

It would be useful to put the above testing into a sanity test that is expected to fail, as described in LU-11520. Then we have a reproducer for the problem, which makes it easier for developers to be sure they have a fix.

Comment by Joseph Gmitter (Inactive) [ 09/Nov/18 ]

Hi Lai,

Can you please have a look into this as a priority since it involved data correction?

Thanks.

Joe

Comment by Gerrit Updater [ 12/Nov/18 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33640
Subject: LU-11642 mdt: revoke remote LOOKUP lock in dir layout shrink
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9af7de4dde63782028440c3b5862136bff4ff60d

Comment by Gerrit Updater [ 12/Nov/18 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33641
Subject: LU-11642 lmv: allocate fid on parent MDT in migrate
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5dac67893433c2882470aed210eabef048909760

Comment by Lai Siyao [ 12/Nov/18 ]

Hi Sarah, I just uploaded two patches, will you apply them and test again?

Comment by Sarah Liu [ 12/Nov/18 ]

Sure, I will test it again and update the ticket, thank you.

Comment by Sarah Liu [ 13/Nov/18 ]

I have tested with lustre-reviews build 59952, the issue has been fixed.

Comment by Gerrit Updater [ 21/Nov/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33640/
Subject: LU-11642 mdt: revoke remote LOOKUP lock in dir layout shrink
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 640ed6104453e912a5c7766d265a36a30a31761d

Comment by Gerrit Updater [ 21/Nov/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33641/
Subject: LU-11642 lmv: allocate fid on parent MDT in migrate
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a857446dc6480841ab1e832970d3958f3962a885

Comment by Peter Jones [ 21/Nov/18 ]

Landed for 2.12

Generated at Sat Feb 10 02:45:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.