Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.12.4
-
None
-
CentOS 7.6 Kernel 3.10.0-957.27.2.el7_lustre.pl2.x86_64
-
3
-
9223372036854775807
Description
Hello!
When using lfs migrate -m to migrate directories across MDTs, we sometimes face LU-13298 (lfs migrate does not work yet with DoM files) for which we do have a workaround (ie. we restripe the files first without DoM). However, we are now having a different problem this time, I think.
We're trying to migrate files from MDT0003 to MDT0001. While running a migration of a full user directory as follow:
lfs migrate -m 1 /fir/users/apatel6
we hit "operation not permitted" errors on multiple directories, and even retrying the migration is leading to the same error:
[root@fir-rbh01 storage]# lfs migrate -m 1 /fir/users/apatel6/data/10-scalingNEB/01-relaxwater/02-N /fir/users/apatel6/data/10-scalingNEB/01-relaxwater/02-N migrate failed: Operation not permitted (-1) [root@fir-rbh01 storage]# lfs getdirstripe /fir/users/apatel6/data/10-scalingNEB/01-relaxwater/02-N lmv_stripe_count: 2 lmv_stripe_offset: 3 lmv_hash_type: fnv_1a_64,migrating mdtidx FID[seq:oid:ver] 3 [0x2800394ad:0x3c7c:0x0] 3 [0x280038894:0x124ee:0x0]
I also noticed when writing this ticket that something seems wrong here as there are two mdtidx = "3". Usually, when a directory is migrating from 3 to 1, we can see mdtidx 1 and 3.
Quick check of the FIDs above:
[root@fir-rbh01 storage]# lfs fid2path /fir 0x2800394ad:0x3c7c:0x0 /fir/users/apatel6/data/10-scalingNEB/01-relaxwater/02-N [root@fir-rbh01 storage]# lfs fid2path /fir 0x280038894:0x124ee:0x0 /fir/users/apatel6/data/10-scalingNEB/01-relaxwater/02-N
MDT0001 (not MDT0003!) shows this log message when attemping the failed command:
Apr 29 08:35:06 fir-md1-s2 kernel: LustreError: 22437:0:(mdd_dir.c:4496:mdd_migrate()) fir-MDD0001: '02-N' migration was interrupted, run 'lfs migrate -m 3 -c 1 -H 2 02-N' to finish migration.
I don't see anything else, but there might be debug flags that could be interesting?
In any case, let me know how we could help troubleshoot this issue. We're using Lustre 2.12.4 here even on the client that performs the lfs migrate. Thanks!
Hi Hongchao,
Since my last message, we have upgraded to 2.12.5 and I cannot reproduce the problem with the empty directory. It has now been successfully migrated to MDT1.
However, we still have issues with EPERM errors even in 2.12.5.
For example, I tried again today, and it still doesn't work for this directory:
It looks like you spotted the problem (a previous migration was running). Is there a way to fix the problem so that we can migrate this directory to MDT3 for example?
Thanks!
Stephane