So this is fascinating.
Each MDT does not know about itself in the pool code, because it's the local device, so it's handled differently.
The list of targets (in the QOS/RR pool code) on an MDT is the other MDTs in the system.
The practical result of this is the first MDT is not selected by the allocation code on the MDT, so it only
gets one stripe on it.
eg, with 2 MDTs:
lmv_stripe_count: 8 lmv_stripe_offset: 1 lmv_hash_type: crush,overstriped
mdtidx FID[seq:oid:ver]
1 [0x240000400:0x2:0x0]
0 [0x200000401:0x2:0x0]
0 [0x200000401:0x3:0x0]
0 [0x200000401:0x4:0x0]
0 [0x200000401:0x5:0x0]
0 [0x200000401:0x6:0x0]
0 [0x200000401:0x7:0x0]
0 [0x200000401:0x8:0x0]
Or, with 4 MDTs, it can look like this:
lmv_stripe_count: 16 lmv_stripe_offset: 3 lmv_hash_type: crush,overstriped
mdtidx FID[seq:oid:ver]
3 [0x2c0000400:0x6:0x0]
0 [0x200000403:0x10:0x0]
1 [0x240000402:0x11:0x0]
2 [0x280000401:0x11:0x0]
0 [0x200000403:0x11:0x0]
1 [0x240000402:0x12:0x0]
2 [0x280000401:0x12:0x0]
0 [0x200000403:0x12:0x0]
1 [0x240000402:0x13:0x0]
2 [0x280000401:0x13:0x0]
0 [0x200000403:0x13:0x0]
1 [0x240000402:0x14:0x0]
2 [0x280000401:0x14:0x0]
0 [0x200000403:0x14:0x0]
1 [0x240000402:0x15:0x0]
2 [0x280000401:0x15:0x0]
Notice 3 is only used once.
Allocation of the first stripe is handled like this, without reference to the pool:
/* Allocate the first stripe locally */
rc = dt_fid_alloc(env, lod->lod_child, &fid, NULL, NULL);
if (rc < 0)
GOTO(out, rc);
stripes[0] = dt_locate_at(env, lod->lod_child, &fid,
dt->do_lu.lo_dev->ld_site->ls_top_dev, &conf);
then the qos/rr alloc code is called to allocate the rest of the stripes.
I'm not sure what to do about this - The device init process doesn't really seem something to mess with.
I'm thinking the right thing to do is special case this for RR + overstriping.
Basically, add one more to the range of indices that can be selected during RR, and if it's found, then do
a local allocation. It complicates the code slightly but it's the only solution that seems sane.
I'll do that if there's not an objection.
Landed for 2.16