[LU-15072] Pool spill is activated just by changing threshold Created: 07/Oct/21 Updated: 22/Apr/23 Resolved: 22/Apr/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.0 |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | James Nunez (Inactive) | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
I have an empty pool that consists on one OST: # lfs df -p scratch.pool1 UUID 1K-blocks Used Available Use% Mounted on scratch-MDT0000_UUID 17423180 3536 15848428 1% /lustre/scratch[MDT:0] scratch-MDT0001_UUID 17423180 2860 15849104 1% /lustre/scratch[MDT:1] scratch-MDT0002_UUID 27541652 3680 25055796 1% /lustre/scratch[MDT:2] scratch-MDT0003_UUID 27541652 3692 25055784 1% /lustre/scratch[MDT:3] scratch-OST0002_UUID 35055368 1672 33202304 1% /lustre/scratch[OST:2] filesystem_summary: 35055368 1672 33202304 1% /lustre/scratch On the MDSs, I set spill_threshold_pct to 6% and spilling is not active: # lctl set_param lod.scratch-MDT*-mdtlov.pool.pool1.spill_threshold_pct=6 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_threshold_pct=6 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_threshold_pct=6 # lctl get_param lod.scratch-MDT*.pool.pool1.* lod.scratch-MDT0000-mdtlov.pool.pool1.spill_is_active=0 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_target=pool2 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_threshold_pct=6 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_is_active=0 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_target=pool2 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_threshold_pct=6 I then change the spill_theshold_pct to 5% and spilling is activated. Note that no data was written to pool1 nor to the OST: # lctl set_param lod.scratch-MDT*-mdtlov.pool.pool1.spill_threshold_pct=5 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_threshold_pct=5 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_threshold_pct=5 # lctl get_param lod.scratch-MDT*.pool.pool1.* lod.scratch-MDT0000-mdtlov.pool.pool1.spill_is_active=1 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_target=pool2 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_threshold_pct=5 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_is_active=1 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_target=pool2 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_threshold_pct=5 Change the threshold back to 6% and spilling is not active again: # lctl set_param lod.scratch-MDT*-mdtlov.pool.pool1.spill_threshold_pct=6 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_threshold_pct=6 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_threshold_pct=6 # lctl get_param lod.scratch-MDT*.pool.pool1.* lod.scratch-MDT0000-mdtlov.pool.pool1.spill_is_active=0 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_target=pool2 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_threshold_pct=6 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_is_active=0 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_target=pool2 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_threshold_pct=6 With an empty OST pool, pool spilling should not be activated when the threshold is changed and the values are all above 1%. |
| Comments |
| Comment by Alex Zhuravlev [ 07/Oct/21 ] |
|
I'm adding another status to procfs to see statfs data spilling code sees. that may be related to grants. |
| Comment by Gerrit Updater [ 07/Oct/21 ] |
|
"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45153 |
| Comment by Alex Zhuravlev [ 07/Oct/21 ] |
|
James, thanks for the report. could you please apply the patch above and dump spill internal info via lctl get_param lod.*.pool.*.spill_status ? |
| Comment by James Nunez (Inactive) [ 07/Oct/21 ] |
|
Just created pools pool0, pool1, pool2 with assignment OST0000, OST0001 and OST00002, respectively. No data is written to any pool and nothing written to the file system: # lctl get_param lod.scratch-MDT*.pool.*.spill* lod.scratch-MDT0000-mdtlov.pool.pool0.spill_is_active=0 lod.scratch-MDT0000-mdtlov.pool.pool0.spill_status= 0: 17771696 18786160 18786160 - 17771696 = 1014464 ?? 0 -- off 0% expired -14015 lod.scratch-MDT0000-mdtlov.pool.pool0.spill_target= lod.scratch-MDT0000-mdtlov.pool.pool0.spill_threshold_pct=0 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_is_active=0 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_status= 1: 17771692 18786160 18786160 - 17771692 = 1014468 ?? 0 -- off 0% expired -14015 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_target= lod.scratch-MDT0000-mdtlov.pool.pool1.spill_threshold_pct=0 lod.scratch-MDT0000-mdtlov.pool.pool2.spill_is_active=0 lod.scratch-MDT0000-mdtlov.pool.pool2.spill_status= 2: 17771692 18786160 18786160 - 17771692 = 1014468 ?? 0 -- off 0% expired -14015 lod.scratch-MDT0000-mdtlov.pool.pool2.spill_target= lod.scratch-MDT0000-mdtlov.pool.pool2.spill_threshold_pct=0 lod.scratch-MDT0002-mdtlov.pool.pool0.spill_is_active=0 lod.scratch-MDT0002-mdtlov.pool.pool0.spill_status= 0: 0 0 0 - 0 = 0 ?? 0 -- off 0% expired -14015 lod.scratch-MDT0002-mdtlov.pool.pool0.spill_target= lod.scratch-MDT0002-mdtlov.pool.pool0.spill_threshold_pct=0 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_is_active=0 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_status= 1: 0 0 0 - 0 = 0 ?? 0 -- off 0% expired -14015 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_target= lod.scratch-MDT0002-mdtlov.pool.pool1.spill_threshold_pct=0 lod.scratch-MDT0002-mdtlov.pool.pool2.spill_is_active=0 lod.scratch-MDT0002-mdtlov.pool.pool2.spill_status= 2: 0 0 0 - 0 = 0 ?? 0 -- off 0% expired -14015 lod.scratch-MDT0002-mdtlov.pool.pool2.spill_target= lod.scratch-MDT0002-mdtlov.pool.pool2.spill_threshold_pct=0 Set poolN.spill_target=pool(N+1) for N = 0, 1 and set spill_threshold_pct=5: # lctl get_param lod.scratch-MDT*.pool.*.spill* lod.scratch-MDT0000-mdtlov.pool.pool0.spill_is_active=1 lod.scratch-MDT0000-mdtlov.pool.pool0.spill_status= 0: 17771696 18786160 18786160 - 17771696 = 1014464 ?? 939308 -- on 5% expired 5 lod.scratch-MDT0000-mdtlov.pool.pool0.spill_target=pool1 lod.scratch-MDT0000-mdtlov.pool.pool0.spill_threshold_pct=5 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_is_active=1 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_status= 1: 17771692 18786160 18786160 - 17771692 = 1014468 ?? 939308 -- on 5% expired 5 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_target=pool2 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_threshold_pct=5 lod.scratch-MDT0000-mdtlov.pool.pool2.spill_is_active=1 lod.scratch-MDT0000-mdtlov.pool.pool2.spill_status= 2: 17771692 18786160 18786160 - 17771692 = 1014468 ?? 939308 -- on 5% expired 5 lod.scratch-MDT0000-mdtlov.pool.pool2.spill_target= lod.scratch-MDT0000-mdtlov.pool.pool2.spill_threshold_pct=5 lod.scratch-MDT0002-mdtlov.pool.pool0.spill_is_active=1 lod.scratch-MDT0002-mdtlov.pool.pool0.spill_status= 0: 17771696 18786160 18786160 - 17771696 = 1014464 ?? 939308 -- on 5% expired 5 lod.scratch-MDT0002-mdtlov.pool.pool0.spill_target=pool1 lod.scratch-MDT0002-mdtlov.pool.pool0.spill_threshold_pct=5 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_is_active=1 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_status= 1: 17771692 18786160 18786160 - 17771692 = 1014468 ?? 939308 -- on 5% expired 5 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_target=pool2 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_threshold_pct=5 lod.scratch-MDT0002-mdtlov.pool.pool2.spill_is_active=1 lod.scratch-MDT0002-mdtlov.pool.pool2.spill_status= 2: 17771692 18786160 18786160 - 17771692 = 1014468 ?? 939308 -- on 5% expired 5 lod.scratch-MDT0002-mdtlov.pool.pool2.spill_target= lod.scratch-MDT0002-mdtlov.pool.pool2.spill_threshold_pct=5 Set spill_threshold_pct=6: # lctl get_param lod.scratch-MDT*.pool.*.spill* lod.scratch-MDT0000-mdtlov.pool.pool0.spill_is_active=0 lod.scratch-MDT0000-mdtlov.pool.pool0.spill_status= 0: 17771696 18786160 18786160 - 17771696 = 1014464 ?? 1127169 -- off 6% expired 5 lod.scratch-MDT0000-mdtlov.pool.pool0.spill_target=pool1 lod.scratch-MDT0000-mdtlov.pool.pool0.spill_threshold_pct=6 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_is_active=0 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_status= 1: 17771692 18786160 18786160 - 17771692 = 1014468 ?? 1127169 -- off 6% expired 5 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_target=pool2 lod.scratch-MDT0000-mdtlov.pool.pool1.spill_threshold_pct=6 lod.scratch-MDT0000-mdtlov.pool.pool2.spill_is_active=0 lod.scratch-MDT0000-mdtlov.pool.pool2.spill_status= 2: 17771692 18786160 18786160 - 17771692 = 1014468 ?? 1127169 -- off 6% expired 5 lod.scratch-MDT0000-mdtlov.pool.pool2.spill_target= lod.scratch-MDT0000-mdtlov.pool.pool2.spill_threshold_pct=6 lod.scratch-MDT0002-mdtlov.pool.pool0.spill_is_active=0 lod.scratch-MDT0002-mdtlov.pool.pool0.spill_status= 0: 17771696 18786160 18786160 - 17771696 = 1014464 ?? 1127169 -- off 6% expired 5 lod.scratch-MDT0002-mdtlov.pool.pool0.spill_target=pool1 lod.scratch-MDT0002-mdtlov.pool.pool0.spill_threshold_pct=6 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_is_active=0 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_status= 1: 17771692 18786160 18786160 - 17771692 = 1014468 ?? 1127169 -- off 6% expired 5 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_target=pool2 lod.scratch-MDT0002-mdtlov.pool.pool1.spill_threshold_pct=6 lod.scratch-MDT0002-mdtlov.pool.pool2.spill_is_active=0 lod.scratch-MDT0002-mdtlov.pool.pool2.spill_status= 2: 17771692 18786160 18786160 - 17771692 = 1014468 ?? 1127169 -- off 6% expired 5 lod.scratch-MDT0002-mdtlov.pool.pool2.spill_target= lod.scratch-MDT0002-mdtlov.pool.pool2.spill_threshold_pct=6 |
| Comment by Alex Zhuravlev [ 08/Oct/21 ] |
0: 17771696 18786160 18786160 - 17771696 = 1014464 ?? 939308 -- on 5% expired 5 total is 18345 MB and 17355MB is available, thus used is 990MB or 5.4% |
| Comment by Alex Zhuravlev [ 08/Oct/21 ] |
# lfs df -h UUID bytes Used Available Use% Mounted on lustre-MDT0000_UUID 122.1M 1.6M 109.5M 2% /mnt/lustre[MDT:0] lustre-OST0000_UUID 305.8M 1.2M 278.0M 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 305.8M 1.2M 278.0M 1% /mnt/lustre[OST:1] this is on a local tiny setup where we do not create a huge journal. |
| Comment by Alex Zhuravlev [ 08/Oct/21 ] |
|
important notice - status is not recalculated just when space usage changes, rather it's done upon object allocation (e.g. when you create a new file/mirror) and changing spilling configuration. |
| Comment by Andreas Dilger [ 08/Oct/21 ] |
|
Alex, I think it makes sense to compute the used percentage in the same way as "df" and "lfs df" does, so that users are not confused by this. |
| Comment by Alex Zhuravlev [ 08/Oct/21 ] |
|
OK, I'll check the details |
| Comment by Alex Zhuravlev [ 12/Oct/21 ] |
|
jamesanunez could you please show lfs df for the case? |
| Comment by Gerrit Updater [ 22/Apr/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/45153/ |
| Comment by Peter Jones [ 22/Apr/23 ] |
|
Landed for 2.16 |