[LU-15658] Interop sanity-flr test_0b test_0c test_0e test_0f: verify pool failed != flash Created: 17/Mar/22  Updated: 30/Nov/23  Resolved: 30/May/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0, Lustre 2.15.3
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Major
Reporter: Maloo Assignee: Vitaly Fertman
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-14480 Setting specific OST's under pool fai... Resolved
is related to LUDOC-511 Add documentation for special/reserve... Resolved
is related to LU-15707 Unable to create file without a pool ... Resolved
is related to LU-16894 The MDS should not limit the stripe c... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/0e8656f6-a118-4d42-8ce4-e179ce9c7e5c

test_0b failed with the following error:

verify pool failed on /mnt/lustre/d0b.sanity-flr/f0b.sanity-flr:  != flash

env
server: 2.15
client: 2.12.7 or client 2.14.0

CMD: trevis-80vm1.trevis.whamcloud.com lctl get_param -n lov.lustre-*.pools.test_0b 			| sort -u | tr '\n' ' ' 
Waiting 90 secs for update
CMD: trevis-80vm1.trevis.whamcloud.com lctl get_param -n lov.lustre-*.pools.test_0b 			| sort -u | tr '\n' ' ' 
CMD: trevis-80vm1.trevis.whamcloud.com lctl get_param -n lov.lustre-*.pools.test_0b 			| sort -u | tr '\n' ' ' 
Updated after 2s: wanted 'lustre-OST0000_UUID lustre-OST0001_UUID lustre-OST0002_UUID lustre-OST0003_UUID lustre-OST0004_UUID lustre-OST0005_UUID lustre-OST0006_UUID ' got 'lustre-OST0000_UUID lustre-OST0001_UUID lustre-OST0002_UUID lustre-OST0003_UUID lustre-OST0004_UUID lustre-OST0005_UUID lustre-OST0006_UUID '
/mnt/lustre/d0b.sanity-flr/f0b.sanity-flr
composite_header:
  lcm_magic:         0x0BD60BD0
  lcm_size:          1216
  lcm_flags:         ro
  lcm_layout_gen:    6
  lcm_mirror_count:  6
  lcm_entry_count:   6
components:
  - lcme_id:             131074
    lcme_mirror_id:      2
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:   EOF
    lcme_offset:         392
    lcme_size:           80
    sub_layout:
      lmm_magic:         0x0BD10BD0
      lmm_seq:           0x200039de1
      lmm_object_id:     0x2d72
      lmm_fid:           [0x200039de1:0x2d72:0x0]
      lmm_stripe_count:  2
      lmm_stripe_size:   4194304
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 2
      lmm_objects:
      - 0: { l_ost_idx: 2, l_fid: [0x100020000:0x27071:0x0] }
      - 1: { l_ost_idx: 3, l_fid: [0x100030000:0x2716f:0x0] }

 sanity-flr test_0b: @@@@@@ FAIL: verify pool failed on /mnt/lustre/d0b.sanity-flr/f0b.sanity-flr:  != flash 

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-flr test_0b - verify pool failed on /mnt/lustre/d0b.sanity-flr/f0b.sanity-flr: != flash



 Comments   
Comment by Andreas Dilger [ 17/Mar/22 ]

It looks like this has been failing since 2021-03-02, and it looks like the failure started with patch https://review.whamcloud.com/41815 "LU-14480 pool: wrong usage with ost list" (commit b384ea39e593cda1ac4d6fb8b955d0c7d1a1f67b).

There were two identical failures on the patch itself before it landed, and in full testing starting on the day it landed on master:
https://testing.whamcloud.com/search?server_branch_type_id=24a6947e-04a9-11e1-bb5f-52540025f9af&status%5B%5D=FAIL&test_set_script_id=38984ba6-84ac-11e7-b48c-5254006e85c2&sub_test_script_id=ea109e68-934f-11e7-b9b0-5254006e85c2&start_date=2020-03-01&end_date=2021-03-31&source=sub_tests#redirect

What is interesting is that the component shown in the error case is "mirror 2" of the file (stripe_size=4MB stripe_count=2 stripe_index=2), so this may be a problem with the test script assuming that the components/mirrors in the file exactly match the order of mirrors specified by "lfs setstripe". It may work most of the time, but in some cases the mirror is created in a different order?

The problem is that the older releases check for the "flash" pool on mirror 2, but the LU-14480 patch always removes the pool from a component when OSTs are explicitly specified, even if the OSTs are part of the pool. Since the pool name stored on a component may be used for other reasons after a file is created, it makes sense to keep it on the file unless it actually conflicts with the OST list.

Comment by Andreas Dilger [ 17/Mar/22 ]

Hi Vitaly, could you please take a look at this test failure. Per my previous comment, it looks like this started failing 100% in interop testing since patch https://review.whamcloud.com/41815 landed on 2021-03-22.

Comment by Gerrit Updater [ 23/Mar/22 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46913
Subject: LU-15658 lod: revert "LU-14480 pool: wrong usage with ost list"
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c0157b0ed7b4c07ed906598e3c75828751171da5

Comment by Andreas Dilger [ 23/Mar/22 ]

As mentioned in the commit message of the revert patch, I think there are a couple of things wrong with the previous patch:

  • it unconditionally removes the pool name from the layout, even when the OST index is in the pool, or if the pool is explicitly specified. This is causing test interop failures, and I think it is the wrong behavior:
    # lctl pool_new testfs.p1
    Pool testfs.p1 created
    # lctl pool_add testfs.p1 OST0000 OST0001
    OST testfs-OST0000_UUID added to pool testfs.p1
    OST testfs-OST0001_UUID added to pool testfs.p1
    # lfs setstripe -i1 /mnt/testfs/p1/f1i
    # lfs setstripe --pool p1 -o1 /mnt/testfs/p1/f1o
    # lfs getstripe /mnt/testfs/p1/f1i
    /mnt/testfs/p1/f1i
    lmm_stripe_count:  1
    lmm_stripe_offset: 1
    lmm_pool:          p1
            obdidx           objid           objid           group
                 1               3            0x3                0
    # lfs getstripe /mnt/testfs/p1/f1o
    /mnt/testfs/p1/f1o
    lmm_stripe_count:  1
    lmm_stripe_offset: 1
            obdidx           objid           objid           group
                 1               4            0x4                0
    
  • it introduces an inconsistency between "lfs setstripe -o M" and "lfs setstripe -i M" when creating a file in a directory, where "-o M" is now accepted to mean "create the file on OST M and ignore the pool", while "-i M" still returns an error:
    # lfs setstripe -i2 /mnt/testfs/p1/f2i
    lfs setstripe: setstripe error for '/mnt/testfs/p1/f2i': Invalid argument
    # lfs setstripe -o2 /mnt/testfs/p1/f2o
    # lfs getstripe /mnt/testfs/p1/f2o
    /mnt/testfs/p1/f2o
    lmm_stripe_count:  1
    lmm_stripe_offset: 2
            obdidx           objid           objid           group
                 2               3            0x3                0
    

The latter inconsistency became very apparent when I was looking into how the first issue might be fixed properly, by moving the pool check lower down in lod_qos_parse_config() to where the existing lod_check_index_in_pool() is called. It seems odd that the "check single OST" will return an error, while "check two OSTs" will succeed.

I'd also be OK to fix this in a different way than reverting the 46913 patch, but there needs to be some clear reason why this new approach (ignoring default pool on parent directory) is better than consistently returning an error for any invalid OST index, and making the error message more clear in "lfs setstripe" to the end user, or alternately accepting that an "out-of-pool" OST index overrides a default pool name. LUS-9579 was referenced in the commit message, but I can't see that. Is this a problem reported by an end user, or something found in internal testing, or what was the original motivation for this change?

Comment by Etienne Aujames [ 24/Mar/22 ]

I have backported the https://review.whamcloud.com/41815 on b2_12 because the behavior could introduce some inconsistencies when the pool is inherited: a file could be created in a pool with OSTs not in the pool. The CEA relies on poolname for OST space balancing between flash and HDD (robinhood) and for different type of user projects.
So they would prefer to remove the pool from the file to manage those special cases separately: "-o" usage is not common (for them) and really specific: this supposes that user know what he is doing.

I agree that returning an error only when OST list is incompatible with pool is a better solution. And it would be great to have a proper error message returned to the user for "lfs setstripe -o/-i" when OSTs are incompatible with the pool.

Nowadays, we can not override the pool inheritance with "lfs setstripe": we can force the file to use another pool, but we can not create a file without a pool set. Because "-p none" forces to inherit from the parent.
So if the user want to force the usage of specific OSTs (-o) that not match any pool, he has to create a pool with every OSTs in it.

Comment by Andreas Dilger [ 25/Mar/22 ]

Nowadays, we can not override the pool inheritance with "lfs setstripe": we can force the file to use another pool, but we can not create a file without a pool set.

This sounds like a bug that should be fixed. Could you please file a separate LU ticket for this, and look when this problem was introduced. At least using "lfs setstripe -p none" should allow creating a file without a pool.

Comment by Andreas Dilger [ 28/Mar/22 ]

I tested with master, 2.14+patches (EXAScaler), and 2.12.8, and wasn't able to create a file without a pool, if the parent or root had a pool specified. This seems like a major problem if specific OSTs cannot be selected, but this should be consistent with toher uses.

Comment by Etienne Aujames [ 28/Mar/22 ]

Hi Andreas,

I will create a ticket for this.
But it seemed this behavior was intended for "--pool=none":

man lfs-setstripe

-p, --pool <pool_name>
Allocate objects from the predefined OST pool pool_name for the layout of this file or component. The stripe_count, stripe_size, and start_ost_index can
be used to select a subset of the OSTs within the pool; the start_ost_index must be part of the pool or an error will be returned. It is possible to
specify a different pool for each component of a file. If no pool is specified, it will be inherited from the previous component (for later components
of a composite layout) or the parent or root directory (for plain raid0 layouts, or the first component of a composite file). Use pool_name='', or
pool_name=none (since Lustre 2.11) to force a component to inherit the pool from the parent or root directory instead of the previous component.

I have tried to read some 2.10 code, and it seemed this issue was present.

Comment by Andreas Dilger [ 28/Mar/22 ]

There has to be some way to create files on specific OSTs, even if there is a default pool specified on the root directory.

Comment by Gerrit Updater [ 31/Mar/22 ]

"Vitaly Fertman <vitaly.fertman@hpe.com>" uploaded a new patch: https://review.whamcloud.com/46967
Subject: LU-15658 lod: ost list and pool name conflict
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 07e4c402e3881408a491e464db5e785890619aad

Comment by Gerrit Updater [ 30/May/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46967/
Subject: LU-15658 lod: ost list and pool name conflict
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 06dd5a4638dd36640b146d4388c09a322873760b

Comment by Peter Jones [ 30/May/22 ]

Landed for 2.16

Comment by Gerrit Updater [ 21/Dec/22 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49475
Subject: LU-15658 lod: ost list and pool name conflict
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 12e294b2a466ebf3103e1d8029b2eabc9f70a0d6

Comment by Gerrit Updater [ 12/Jun/23 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51285
Subject: LU-15658 lod: ost list and pool name conflict
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: def45fbaf68108eacd3493c7cfb771143c2680c0

Generated at Sat Feb 10 03:20:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.