[LU-16500] "lfs migrate <file>" preserves specific layout too much Created: 21/Jan/23  Updated: 08/Feb/23  Resolved: 08/Feb/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.16.0
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Andreas Dilger Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-16522 "lfs setstripe -i N" with deactivated... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Running "lfs migrate <file>" without any SETSTRIPE arguments to balance space usage keeps the PFL file layout, but preserves the OST selection "too well". The "lfs migrate" command is intended to be used for space balancing OSTs, so it would be expected to change the starting OST and other OSTs in use to better balance, but the OSTs are preserved exactly and this makes the migration virtually useless in this case:

# lfs df
UUID                   1K-blocks        Used   Available Use% Mounted on
testfs-MDT0000_UUID       125056        2268      111552   2% /mnt/testfs[MDT:0]
testfs-OST0000_UUID       313104        1664      284280   1% /mnt/testfs[OST:0]
testfs-OST0001_UUID       313104      206464       79480  73% /mnt/testfs[OST:1]
testfs-OST0002_UUID       313104        1668      284276   1% /mnt/testfs[OST:2]
testfs-OST0003_UUID       313104        1668      284276   1% /mnt/testfs[OST:3]

filesystem_summary:      1252416      211464      932312  19% /mnt/testfs

# lfs setstripe -E 1M -c 1 -E 16M -c 1 -E eof -c 1 /mnt/testfs/junk3
# fallocate -l 17M /mnt/testfs/junk3
# lfs getstripe /mnt/testfs/junk3 | grep l_ost_idx
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x5d:0x0] }
      - 0: { l_ost_idx: 1, l_fid: [0x100010000:0xc1:0x0] }
      - 0: { l_ost_idx: 2, l_fid: [0x100020000:0x22:0x0] }
# lfs migrate /mnt/testfs/junk3
# lfs getstripe /mnt/testfs/junk3 | grep l_ost_idx
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x5e:0x0] }
      - 0: { l_ost_idx: 1, l_fid: [0x100010000:0xc2:0x0] }
      - 0: { l_ost_idx: 2, l_fid: [0x100020000:0x23:0x0] }
# lfs migrate /mnt/testfs/junk3
# lfs getstripe /mnt/testfs/junk3 | grep l_ost_idx
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x5f:0x0] }
      - 0: { l_ost_idx: 1, l_fid: [0x100010000:0xc3:0x0] }
      - 0: { l_ost_idx: 2, l_fid: [0x100020000:0x24:0x0] }
# lfs migrate /mnt/testfs/junk3
# lfs getstripe /mnt/testfs/junk3 | grep l_ost_idx
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x60:0x0] }
      - 0: { l_ost_idx: 1, l_fid: [0x100010000:0xc4:0x0] }
      - 0: { l_ost_idx: 2, l_fid: [0x100020000:0x25:0x0] }

One would expect with the OST0001 imbalance that it would be skipped for allocation, but the "lfs migrate" command must be using a specific layout. Instead, it should be clearing the specific OST indices from the layout and only using it as a template.

Creating a new file correctly avoids the full OST, so this isn't a problem with the QOS space balancing:

# lfs setstripe -E 1M -c 1 -E 16M -c 1 -E eof -c 1 /mnt/testfs/junk3.2
# fallocate -l 17M /mnt/testfs/junk3.2
# lfs getstripe /mnt/testfs/junk3.2 | grep l_ost_idx
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x62:0x0] }
      - 0: { l_ost_idx: 3, l_fid: [0x100030000:0x5f:0x0] }
      - 0: { l_ost_idx: 2, l_fid: [0x100020000:0x27:0x0] }


 Comments   
Comment by Jian Yu [ 27/Jan/23 ]

In my testing, specifying an actual layout like "lfs migrate -c2 <file>" will avoid the "specific OST" problem.

This is because from_copy in lfs_setstripe_internal() will keep false if any of the SETSTRIPE arguments is specified:

lfs_setstripe_internal()
        // ......
        bool from_copy = false;
        // ......

        /* lfs migrate $filename should keep the file's layout by default */
        if (migrate_mode && !layout && !from_yaml &&
            !setstripe_args_specified(&lsa) && !lsa.lsa_pool_name)
                from_copy = true;
        // ......

        for (fname = argv[optind]; fname != NULL; fname = argv[++optind]) {
                if (from_copy) {
                        layout = llapi_layout_get_by_path(template ?: fname, 0);
                        if (!layout) {
                                fprintf(stderr,
                                        "%s: can't create composite layout from file %s: %s\n",
                                        progname, template ?: fname,
                                        strerror(errno));
                                result = -errno;
                                goto error;
                        }
                }

                if (migrate_mdt_mode) {
                        result = llapi_migrate_mdt(fname, &migrate_mdt_param);
                } else if (migrate_mode) {
                        result = lfs_migrate(fname, migration_flags, param,
                                             layout);
        // ......

I'm adding llapi_layout_ost_index_set(layout, 0, LLAPI_LAYOUT_DEFAULT) before calling lfs_migrate() to strip the source layout of specific OST object/index values before using it to create the volatile file in lfs_migrate()->migrate_open_files()->llapi_layout_file_open().

Comment by Gerrit Updater [ 30/Jan/23 ]

"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49819
Subject: LU-16500 utils: set default ost index for lfs migrate
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 419038e0015e9eb51c294910a167967f5a77acea

Comment by Gerrit Updater [ 08/Feb/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49819/
Subject: LU-16500 utils: set default ost index for lfs migrate
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0568f4ca253049d324956e3d89ece0cbd2ff2155

Comment by Peter Jones [ 08/Feb/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:27:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.