[LU-13298] lfs migrate -m "migrate failed: Operation not supported (-95)" on DoM files Created: 26/Feb/20  Updated: 04/Oct/23  Resolved: 06/Jan/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.3
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Stephane Thiell Assignee: Lai Siyao
Resolution: Duplicate Votes: 0
Labels: None
Environment:

CentOS 7.6


Issue Links:
Related
is related to LU-13492 lfs migrate -m returns Operation not ... Open
is related to LU-13691 Allow for lfs migrate between MDTs to... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

During a lfs migrate -m from MDT0002 to MDT0003 of a full directory, we can see a bunch of errors -95:

lfs migrate -m 3 /fir/users/apatel6
...
/fir/users/apatel6/data/600_N_dec_carbons/08-EXX-sensitivity/02-MN4graphene/00-PBE-noNUPDOWN/05-OOH/07-Pd/spincheck/03/01-EXX-10/INCAR migrate failed: Operation not supported (-95)

It seems to be a regular file:

# stat /fir/users/apatel6/data/600_N_dec_carbons/08-EXX-sensitivity/02-MN4graphene/00-PBE-noNUPDOWN/05-OOH/07-Pd/spincheck/03/01-EXX-10/INCAR
  File: ‘/fir/users/apatel6/data/600_N_dec_carbons/08-EXX-sensitivity/02-MN4graphene/00-PBE-noNUPDOWN/05-OOH/07-Pd/spincheck/03/01-EXX-10/INCAR’
  Size: 708       	Blocks: 8          IO Block: 4194304 regular file
Device: e64e03a8h/3863872424d	Inode: 198161414156736433  Links: 1
Access: (0644/-rw-r--r--)  Uid: (298250/ apatel6)   Gid: (32269/ norskov)
Access: 2020-01-17 14:32:11.000000000 -0800
Modify: 2019-08-24 15:41:31.000000000 -0700
Change: 2019-08-24 15:41:31.000000000 -0700
 Birth: -

on a regular directory:

# stat /fir/users/apatel6/data/600_N_dec_carbons/08-EXX-sensitivity/02-MN4graphene/00-PBE-noNUPDOWN/05-OOH/07-Pd/spincheck/03/01-EXX-10
  File: ‘/fir/users/apatel6/data/600_N_dec_carbons/08-EXX-sensitivity/02-MN4graphene/00-PBE-noNUPDOWN/05-OOH/07-Pd/spincheck/03/01-EXX-10’
  Size: 8192      	Blocks: 16         IO Block: 4096   directory
Device: e64e03a8h/3863872424d	Inode: 180147914721584182  Links: 2
Access: (0755/drwxr-xr-x)  Uid: (298250/ apatel6)   Gid: (32269/ norskov)
Access: 2020-02-26 08:32:37.000000000 -0800
Modify: 2019-08-24 19:16:56.000000000 -0700
Change: 2019-08-24 19:16:56.000000000 -0700
 Birth: - 

so we're wondering why...

Checking the file's parent directory shows that it has been flagged for migration:

# lfs getdirstripe /fir/users/apatel6/data/600_N_dec_carbons/08-EXX-sensitivity/02-MN4graphene/00-PBE-noNUPDOWN/05-OOH/07-Pd/spincheck/03/01-EXX-10
lmv_stripe_count: 2 lmv_stripe_offset: 3 lmv_hash_type: fnv_1a_64,migrating
mdtidx		 FID[seq:oid:ver]
     3		 [0x280038894:0xf2d0:0x0]		
     2		 [0x2c002cafe:0xf400:0x0]

The parent of the parent too:

# lfs getdirstripe /fir/users/apatel6/data/600_N_dec_carbons/08-EXX-sensitivity/02-MN4graphene/00-PBE-noNUPDOWN/05-OOH/07-Pd/spincheck/03
lmv_stripe_count: 2 lmv_stripe_offset: 3 lmv_hash_type: fnv_1a_64,migrating
mdtidx		 FID[seq:oid:ver]
     3		 [0x280038894:0xf2cd:0x0]		
     2		 [0x2c002cafe:0xd455:0x0]

But the parent of the parent of the parent is already done (is that normal?):

# lfs getdirstripe /fir/users/apatel6/data/600_N_dec_carbons/08-EXX-sensitivity/02-MN4graphene/00-PBE-noNUPDOWN/05-OOH/07-Pd/spincheck
lmv_stripe_count: 0 lmv_stripe_offset: 3 lmv_hash_type: none

We have 4 MDTs on this filesystem. We haven't upgraded to 2.12.4 yet but it is planned within the next month.



 Comments   
Comment by Lai Siyao [ 27/Feb/20 ]

Do you enable DoM? Currently DoM file migration is not supported yet. If so, then it's normal some subdirectories are not fully migrated.

Directory migration can be run again upon failure, and in you case you can run 'lfs migrate -m 3 /fir/users/apatel6/data/600_N_dec_carbons/08-EXX-sensitivity/02-MN4graphene/00-PBE-noNUPDOWN/05-OOH/07-Pd/spincheck/03'.

Directory in 'migrating' status can still be accessed.

Comment by Stephane Thiell [ 27/Feb/20 ]

Hi Lai,
Thanks for the quick reply. Yes, we used to have DoM by default on this filesystem and so millions of files have a DoM components (but we do not use DoM by default anymore due to performance issues with shared files). Thanks I totally forgot about file migration not being supported yet with DoM files. Indeed, this specific file has a DoM component:

# lfs getstripe /fir/users/apatel6/data/600_N_dec_carbons/08-EXX-sensitivity/02-MN4graphene/00-PBE-noNUPDOWN/05-OOH/07-Pd/spincheck/03/01-EXX-10/INCAR
/fir/users/apatel6/data/600_N_dec_carbons/08-EXX-sensitivity/02-MN4graphene/00-PBE-noNUPDOWN/05-OOH/07-Pd/spincheck/03/01-EXX-10/INCAR
  lcm_layout_gen:    6
  lcm_mirror_count:  1
  lcm_entry_count:   6
    lcme_id:             1
    lcme_mirror_id:      0
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:   131072
      lmm_stripe_count:  0
      lmm_stripe_size:   131072
      lmm_pattern:       mdt
      lmm_layout_gen:    0
      lmm_stripe_offset: 0

    lcme_id:             2
    lcme_mirror_id:      0
    lcme_flags:          0
    lcme_extent.e_start: 131072
    lcme_extent.e_end:   16777216
      lmm_stripe_count:  1
      lmm_stripe_size:   4194304
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: -1

    lcme_id:             3
    lcme_mirror_id:      0
    lcme_flags:          0
    lcme_extent.e_start: 16777216
    lcme_extent.e_end:   1073741824
      lmm_stripe_count:  2
      lmm_stripe_size:   4194304
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: -1

    lcme_id:             4
    lcme_mirror_id:      0
    lcme_flags:          0
    lcme_extent.e_start: 1073741824
    lcme_extent.e_end:   34359738368
      lmm_stripe_count:  4
      lmm_stripe_size:   4194304
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: -1

    lcme_id:             5
    lcme_mirror_id:      0
    lcme_flags:          0
    lcme_extent.e_start: 34359738368
    lcme_extent.e_end:   274877906944
      lmm_stripe_count:  8
      lmm_stripe_size:   4194304
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: -1

    lcme_id:             6
    lcme_mirror_id:      0
    lcme_flags:          0
    lcme_extent.e_start: 274877906944
    lcme_extent.e_end:   EOF
      lmm_stripe_count:  16
      lmm_stripe_size:   4194304
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: -1

We'll see what we can do with these files and then I'll run lfs migrate again.

Comment by Stephane Thiell [ 28/Feb/20 ]

To work around this issue, we have been able to restripe DoM files using lfs migrate -c 1 and then migrate the directory to another MDT with no error anymore.

Comment by Andreas Dilger [ 06/Jan/23 ]

Close this issue and track the issue with "lfs migrate -m" on DoM files under LU-13691.

Generated at Sat Feb 10 03:00:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.