[LU-17403] lfs migrate: cannot get group lock: No space left on device Created: 08/Jan/24 Updated: 08/Jan/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Stephane Thiell | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 7.9 (3.10.0-1160.90.1.el7_lustre.pl1.x86_64) |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Happy New Year We are seeing a new problem on our Fir filesystem (full 2.15.3) when lfs migrating some files. The symptom is ENOSPC when trying to lfs migrate, which makes me think of [root@fir-rbh03 ~]# lfs migrate -c 1 /fir/users/anovosel/Seisbench_DATA/stead_mem.csv lfs migrate: cannot get group lock: No space left on device (28) error: lfs migrate: /fir/users/anovosel/Seisbench_DATA/stead_mem.csv: cannot get group lock: No space left on device These files are using PFL and a common point between them is that both first and second components are initialized but NOT the last one. For example: [root@fir-rbh03 ~]# lfs getstripe /fir/users/anovosel/Seisbench_DATA/stead_mem.csv
/fir/users/anovosel/Seisbench_DATA/stead_mem.csv
lcm_layout_gen: 6
lcm_mirror_count: 1
lcm_entry_count: 3
lcme_id: 1
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 4194304
lmm_stripe_count: 1
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 125
lmm_pool: ssd
lmm_objects:
- 0: { l_ost_idx: 125, l_fid: [0x1007d0000:0x3e14ca6:0x0] }
lcme_id: 2
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 4194304
lcme_extent.e_end: 17179869184
lmm_stripe_count: 2
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 74
lmm_pool: hdd
lmm_objects:
- 0: { l_ost_idx: 74, l_fid: [0x1004a0000:0x778f9c9:0x0] }
- 1: { l_ost_idx: 75, l_fid: [0x1004b0000:0x73f371a:0x0] }
lcme_id: 3
lcme_mirror_id: 0
lcme_flags: 0
lcme_extent.e_start: 17179869184
lcme_extent.e_end: EOF
lmm_stripe_count: 16
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: -1
lmm_pool: hdd
We have four ldiskfs MDTs, and I have example of files like that on MDT 0, 2 and 3. We don't have the ea_inode feature set but our inode size is 1KB: [root@fir-md1-s1 Seisbench_DATA]# dumpe2fs -h /dev/mapper/md1-rbod1-mdt0
dumpe2fs 1.47.0-wc2 (25-May-2023)
Filesystem volume name: fir-MDT0000
Last mounted on: /
Filesystem UUID: 2f44ac0b-e931-4a58-90a4-d4f1765176bb
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr dir_index filetype needs_recovery extent 64bit mmp flex_bg dirdata large_dir sparse_super large_file huge_file uninit_bg dir_nlink quota project
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 3745217760
Block count: 4681213440
Reserved block count: 234060672
Free blocks: 3721821762
Free inodes: 3623118029
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 26216
Inode blocks per group: 6554
Flex block group size: 16
Filesystem created: Tue Dec 1 09:29:39 2020
Last mount time: Wed Jul 5 22:09:02 2023
Last write time: Wed Jul 5 22:09:02 2023
Mount count: 26
Maximum mount count: -1
Last checked: Tue Dec 1 09:29:39 2020
Check interval: 0 (<none>)
Lifetime writes: 35 TB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 1024
Required extra isize: 32
Desired extra isize: 32
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: b8d9b0f5-1004-482d-83a0-44b8305a24cd
Journal backup: inode blocks
MMP block number: 28487
MMP update interval: 5
User quota inode: 3
Group quota inode: 4
Project quota inode: 12
Journal features: journal_incompat_revoke journal_64bit
Total journal size: 4096M
Total journal blocks: 1048576
Max transaction length: 1048576
Fast commit length: 0
Journal sequence: 0x0e6dad3b
Journal start: 356385
MMP_block:
mmp_magic: 0x4d4d50
mmp_check_interval: 10
mmp_sequence: 0x3131f5
mmp_update_date: Mon Jan 8 11:02:45 2024
mmp_update_time: 1704740565
mmp_node_name: fir-md1-s1
mmp_device_name: dm-0
Under ldiskfs: [root@fir-md1-s1 Seisbench_DATA]# pwd /mnt/fir/ldiskfs/mdt/0/ROOT/users/[0x200000400:0x5:0x0]:0/anovosel/Seisbench_DATA [root@fir-md1-s1 Seisbench_DATA]# stat stead_mem.csv File: ‘stead_mem.csv’ Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: fd00h/64768d Inode: 419466 Links: 1 Access: (0644/-rw-r--r--) Uid: (419500/anovosel) Gid: (18036/ beroza) Access: 2023-10-12 17:49:40.000000000 -0700 Modify: 2023-10-11 15:45:05.000000000 -0700 Change: 2023-10-30 04:12:08.000000000 -0700 Birth: - [root@fir-md1-s1 Seisbench_DATA]# getfattr -m '.*' -d stead_mem.csv # file: stead_mem.csv trusted.link=0s3/HqEQEAAAA3AAAAAAAAAAAAAAAAAAAAAB8AAAACAAU7gAAAQQIAAAAAc3RlYWRfbWVtLmNzdg== trusted.lma=0sAAAAAAAAAADBnAUAAgAAABcFAAAAAAAA trusted.lov=0s0AvWC6ABAAAGAAAAAAADAAAAAAAAAAAAAAAAAAAAAAABAAAAEAAAAAAAAAAAAAAAAABAAAAAAACwAAAASAAAAAAAAABrAAAAAAAAAAAAAAACAAAAEAAAAAAAQAAAAAAAAAAAAAQAAAD4AAAAYAAAAAAAAAAAAAAAAAAAAAAAAAADAAAAAAAAAAAAAAAEAAAA//////////9YAQAASAAAAAAAAAAAAAAAAAAAAAAAAADQC9MLAQAAABcFAAAAAAAAwZwFAAIAAAAAAEAAAQAAAHNzZAAAAAAAAAAAAAAAAACmTOEDAAAAAAAAAAAAAAAAAAAAAH0AAADQC9MLAQAAABcFAAAAAAAAwZwFAAIAAAAAAEAAAgAAAGhkZAAQAP//AAAAAAAAAADJ+XgHAAAAAAAAAAAAAAAAAAAAAEoAAAAaNz8HAAAAAAAAAAAAAAAAAAAAAEsAAADQC9MLAQAAABcFAAAAAAAAwZwFAAIAAAAAAEAAEAD//2hkZAD/////IGiiAv////8AAAAAAAAAAAAAAAB1AAAAN8UpBv////8= trusted.projid="419500" trusted.som=0sBAAAAAAAAADUpL4WAAAAAGhfCwAAAAAA Note: the hdd pool (last component) only have OSTs with max_create_count=0 but this PFL setting is very common and worked on many other files. |
| Comments |
| Comment by Andreas Dilger [ 08/Jan/24 ] |
|
I think your analysis is correct that the missing PFL "eof" component is the cause of the migration error. When testing a "short" PFL layout I see something similar, though with a different error code: # lfs migrate -c 1 /mnt/testfs/pfl-short lfs migrate: cannot get group lock: Invalid argument (22) error: lfs migrate: /mnt/testfs/pfl-short: cannot get group lock: Invalid argument # lfs setstripe -E 1M -c 1 -E 1G -c 4 /mnt/testfs/pfl-short It is possible to add the last component to an existing file to resolve this error: # lfs setstripe --component-add -E eof -c 30 /mnt/testfs/pfl-short # lfs migrate -c 1 /mnt/testfs/pfl-short At its core, this issue relates to the same problem as O_APPEND issue - we depend on holding the DLM locks on the whole file to ensure that nobody else is modifying it while the migration is in progress. That could be fixed by ensuring the layout doesn't change, but it hasn't been a priority since very few files actually exist without the last component. |
| Comment by Stephane Thiell [ 08/Jan/24 ] |
|
Thanks Andreas and interesting! I tried to delete the last component to test this: [root@fir-rbh03 ~]# lfs setstripe --comp-del -I 3 /fir/users/anovosel/Seisbench_DATA/stead_mem.csv
[root@fir-rbh03 ~]# lfs getstripe /fir/users/anovosel/Seisbench_DATA/stead_mem.csv
/fir/users/anovosel/Seisbench_DATA/stead_mem.csv
lcm_layout_gen: 7
lcm_mirror_count: 1
lcm_entry_count: 2
lcme_id: 1
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 4194304
lmm_stripe_count: 1
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 125
lmm_pool: ssd
lmm_objects:
- 0: { l_ost_idx: 125, l_fid: [0x1007d0000:0x3e14ca6:0x0] }
lcme_id: 2
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 4194304
lcme_extent.e_end: 17179869184
lmm_stripe_count: 2
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 74
lmm_pool: hdd
lmm_objects:
- 0: { l_ost_idx: 74, l_fid: [0x1004a0000:0x778f9c9:0x0] }
- 1: { l_ost_idx: 75, l_fid: [0x1004b0000:0x73f371a:0x0] }
But then I got the same Invalid argument error you see: [root@fir-rbh03 ~]# lfs migrate -c 1 /fir/users/anovosel/Seisbench_DATA/stead_mem.csv lfs migrate: cannot get group lock: Invalid argument (22) error: lfs migrate: /fir/users/anovosel/Seisbench_DATA/stead_mem.csv: cannot get group lock: Invalid argument And your workaround to add the last component then worked: [root@fir-rbh03 ~]# lfs setstripe --component-add -E eof -c 16 /fir/users/anovosel/Seisbench_DATA/stead_mem.csv
[root@fir-rbh03 ~]# lfs getstripe /fir/users/anovosel/Seisbench_DATA/stead_mem.csv
/fir/users/anovosel/Seisbench_DATA/stead_mem.csv
lcm_layout_gen: 8
lcm_mirror_count: 1
lcm_entry_count: 3
lcme_id: 1
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 4194304
lmm_stripe_count: 1
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 125
lmm_pool: ssd
lmm_objects:
- 0: { l_ost_idx: 125, l_fid: [0x1007d0000:0x3e14ca6:0x0] }
lcme_id: 2
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 4194304
lcme_extent.e_end: 17179869184
lmm_stripe_count: 2
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 74
lmm_pool: hdd
lmm_objects:
- 0: { l_ost_idx: 74, l_fid: [0x1004a0000:0x778f9c9:0x0] }
- 1: { l_ost_idx: 75, l_fid: [0x1004b0000:0x73f371a:0x0] }
lcme_id: 8
lcme_mirror_id: 0
lcme_flags: 0
lcme_extent.e_start: 17179869184
lcme_extent.e_end: EOF
lmm_stripe_count: 16
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: -1
[root@fir-rbh03 ~]# lfs migrate -c 1 /fir/users/anovosel/Seisbench_DATA/stead_mem.csv
[root@fir-rbh03 ~]# lfs getstripe /fir/users/anovosel/Seisbench_DATA/stead_mem.csv
/fir/users/anovosel/Seisbench_DATA/stead_mem.csv
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 10
lmm_stripe_offset: 134
lmm_pool: ssd
obdidx objid objid group
134 25241290 0x18126ca 0
|
| Comment by Stephane Thiell [ 08/Jan/24 ] |
|
Andreas, we noticed something: when the last component is uninitialized, lfs migrate only works when no lmm_pool is assigned. Demonstration with another file in that situation: # lfs getstripe /fir/users/anovosel/.ipynb_checkpoints/11_PhaseHunter_SCEDC-checkpoint.ipynb
/fir/users/anovosel/.ipynb_checkpoints/11_PhaseHunter_SCEDC-checkpoint.ipynb
lcm_layout_gen: 6
lcm_mirror_count: 1
lcm_entry_count: 3
lcme_id: 1
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 4194304
lmm_stripe_count: 1
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 117
lmm_pool: ssd
lmm_objects:
- 0: { l_ost_idx: 117, l_fid: [0x100750000:0x62aaa56:0x0] }
lcme_id: 2
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 4194304
lcme_extent.e_end: 17179869184
lmm_stripe_count: 2
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 78
lmm_pool: hdd
lmm_objects:
- 0: { l_ost_idx: 78, l_fid: [0x1004e0000:0x76d69de:0x0] }
- 1: { l_ost_idx: 76, l_fid: [0x1004c0000:0x77418bb:0x0] }
lcme_id: 3
lcme_mirror_id: 0
lcme_flags: 0
lcme_extent.e_start: 17179869184
lcme_extent.e_end: EOF
lmm_stripe_count: 16
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: -1
lmm_pool: hdd <<<
# lfs migrate -c 1 /fir/users/anovosel/.ipynb_checkpoints/11_PhaseHunter_SCEDC-checkpoint.ipynb
lfs migrate: cannot get group lock: No space left on device (28)
error: lfs migrate: /fir/users/anovosel/.ipynb_checkpoints/11_PhaseHunter_SCEDC-checkpoint.ipynb: cannot get group lock: No space left on device
Delete last component and add it in the hdd pool: # lfs setstripe --comp-del -I 3 /fir/users/anovosel/.ipynb_checkpoints/11_PhaseHunter_SCEDC-checkpoint.ipynb
# lfs setstripe --component-add -E eof -c 16 --pool hdd /fir/users/anovosel/.ipynb_checkpoints/11_PhaseHunter_SCEDC-checkpoint.ipynb
# lfs getstripe /fir/users/anovosel/.ipynb_checkpoints/11_PhaseHunter_SCEDC-checkpoint.ipynb
/fir/users/anovosel/.ipynb_checkpoints/11_PhaseHunter_SCEDC-checkpoint.ipynb
lcm_layout_gen: 8
lcm_mirror_count: 1
lcm_entry_count: 3
lcme_id: 1
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 4194304
lmm_stripe_count: 1
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 117
lmm_pool: ssd
lmm_objects:
- 0: { l_ost_idx: 117, l_fid: [0x100750000:0x62aaa56:0x0] }
lcme_id: 2
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 4194304
lcme_extent.e_end: 17179869184
lmm_stripe_count: 2
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 78
lmm_pool: hdd
lmm_objects:
- 0: { l_ost_idx: 78, l_fid: [0x1004e0000:0x76d69de:0x0] }
- 1: { l_ost_idx: 76, l_fid: [0x1004c0000:0x77418bb:0x0] }
lcme_id: 8
lcme_mirror_id: 0
lcme_flags: 0
lcme_extent.e_start: 17179869184
lcme_extent.e_end: EOF
lmm_stripe_count: 16
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: -1
lmm_pool: hdd
Confirmation of failure: [root@fir-rbh03 Seisbench_DATA]# lfs migrate -c 1 /fir/users/anovosel/.ipynb_checkpoints/11_PhaseHunter_SCEDC-checkpoint.ipynb lfs migrate: cannot get group lock: No space left on device (28) error: lfs migrate: /fir/users/anovosel/.ipynb_checkpoints/11_PhaseHunter_SCEDC-checkpoint.ipynb: cannot get group lock: No space left on device Remove last component and add it without the pool: # lfs setstripe --comp-del -I 8 /fir/users/anovosel/.ipynb_checkpoints/11_PhaseHunter_SCEDC-checkpoint.ipynb
# lfs setstripe --component-add -E eof -c 16 /fir/users/anovosel/.ipynb_checkpoints/11_PhaseHunter_SCEDC-checkpoint.ipynb
# lfs getstripe /fir/users/anovosel/.ipynb_checkpoints/11_PhaseHunter_SCEDC-checkpoint.ipynb
/fir/users/anovosel/.ipynb_checkpoints/11_PhaseHunter_SCEDC-checkpoint.ipynb
lcm_layout_gen: 10
lcm_mirror_count: 1
lcm_entry_count: 3
lcme_id: 1
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 4194304
lmm_stripe_count: 1
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 117
lmm_pool: ssd
lmm_objects:
- 0: { l_ost_idx: 117, l_fid: [0x100750000:0x62aaa56:0x0] }
lcme_id: 2
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 4194304
lcme_extent.e_end: 17179869184
lmm_stripe_count: 2
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 78
lmm_pool: hdd
lmm_objects:
- 0: { l_ost_idx: 78, l_fid: [0x1004e0000:0x76d69de:0x0] }
- 1: { l_ost_idx: 76, l_fid: [0x1004c0000:0x77418bb:0x0] }
lcme_id: 10
lcme_mirror_id: 0
lcme_flags: 0
lcme_extent.e_start: 17179869184
lcme_extent.e_end: EOF
lmm_stripe_count: 16
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: -1
Now, lfs migrate works: [root@fir-rbh03 Seisbench_DATA]# lfs migrate -c 1 /fir/users/anovosel/.ipynb_checkpoints/11_PhaseHunter_SCEDC-checkpoint.ipynb [root@fir-rbh03 Seisbench_DATA]# I believe this might be because we have max_create_count=0 set on all the OSTs of this hdd OST pool, as we're in the process of decommissioning all its OSTs. |
| Comment by Stephane Thiell [ 08/Jan/24 ] |
|
Steps to reproduce:
Result: # lfs migrate -c 1 /elm/pfl-test lfs migrate: cannot get group lock: No space left on device (28) error: lfs migrate: /elm/pfl-test: cannot get group lock: No space left on device Tested with 2.15.59 |
| Comment by Stephane Thiell [ 08/Jan/24 ] |
|
By emptying the hdd OST pool, we are able to migrate those files again. No more ENOSPC errors! # lctl pool_remove fir.hdd fir-OST[0024-005f]_UUID # lctl pool_list fir.hdd Pool: fir.hdd # It's a bit of a corner case but glad that we were able to understand it. |
| Comment by Andreas Dilger [ 08/Jan/24 ] |
I think this is mostly working as expected? If you are trying to allocate from a pool that is "full" (or otherwise unusable), then 28 = ENOSPC should be returned when no objects can be allocated from that pool. "lfs migrate" should be using the pool (if any) from the last in use component, rather than the last "existing" component, but it doesn't quite seem to be the case here.
It would be interesting to get the "lfs getstripe" on this file after the migration. Is it using pool "ssd" because that is the one used on the first component, or inheriting from the parent directory because of "-c 1"? Note that you can configure "pool spilling" so that allocations that target one pool ("hdd" in this case) are redirected to a different pool ("ssd" in this case) when the pool usage is above a usage threshold: lctl set_param lov.*.pool.hdd.spill_target=ssd lov.*.pool.hdd.spill_threshold_pct=1 Note that it isn't possible to set the spill_threshold_pct=0, since that also means "disabled", but I don't think that matters here. It looks like the pool spilling properly handles the case of all OSTs in the pool being marked with max_create_count=0, or at least with patch https://review.whamcloud.com/50250 " |
| Comment by Stephane Thiell [ 08/Jan/24 ] |
Yes, I think it's working as expected in the end, I was confused at first but it now makes sense and I also missed that the pool might be preserved if not explictly specified so in that case ENOSPC makes sense!
It is using the pool "ssd": # lfs getstripe /fir/users/anovosel/.ipynb_checkpoints/11_PhaseHunter_SCEDC-checkpoint.ipynb /fir/users/anovosel/.ipynb_checkpoints/11_PhaseHunter_SCEDC-checkpoint.ipynb lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 14 lmm_stripe_offset: 129 lmm_pool: ssd obdidx objid objid group 129 25344189 0x182b8bd 0 If inherited from the parent directory, our default striping is set at the root level, which indeed has a first component with the pool "ssd": # lfs getstripe -d /fir
lcm_layout_gen: 0
lcm_mirror_count: 1
lcm_entry_count: 3
lcme_id: N/A
lcme_mirror_id: N/A
lcme_flags: 0
lcme_extent.e_start: 0
lcme_extent.e_end: 4194304
stripe_count: 1 stripe_size: 4194304 pattern: raid0 stripe_offset: -1 pool: ssd
lcme_id: N/A
lcme_mirror_id: N/A
lcme_flags: 0
lcme_extent.e_start: 4194304
lcme_extent.e_end: 17179869184
stripe_count: 2 stripe_size: 4194304 pattern: raid0 stripe_offset: -1
lcme_id: N/A
lcme_mirror_id: N/A
lcme_flags: 0
lcme_extent.e_start: 17179869184
lcme_extent.e_end: EOF
stripe_count: 16 stripe_size: 4194304 pattern: raid0 stripe_offset: -1
Noted about the "pool spilling" feature. Thanks Andreas! |