I found the root cause.
in lod_qos_ost_in_use_clear(), the ost_in_use array is initialised to 0, and in lod_qos_prep_create()->old_alloc_specific(), the ost_idx is
for (i = 0; i < ost_count;
i++, array_idx = (array_idx + 1) % ost_count) {
ost_idx = osts->op_array[array_idx];
and the ost_idx will be checked upon ost_in_use array
if (lod_qos_is_ost_used(env, ost_idx, stripe_num))
continue;
If the stripe_offset starts from 0, and in the 1st iteration, stripe_num is also 0, and lod_qos_is_ost_used() will return false, then object will be allocated on the first OST device.
While if file stripe starting from a number other than 0, when the loop comes to which ost_idx is 0, the lod_qos_is_ost_used(env, 0, stripe_num) will return true, and the 1st OST device will be skipped.
The fix should be in lod_qos_ost_in_use_clear(). With following patch, the object stripe allocation will be correct.
diff --git a/lustre/lod/lod_qos.c b/lustre/lod/lod_qos.c
index 2b81ad8..2f46e7c 100644
--- a/lustre/lod/lod_qos.c
+++ b/lustre/lod/lod_qos.c
@@ -629,7 +629,7 @@ static inline int lod_qos_ost_in_use_clear(const struct lu_env *env, int stripes
CERROR("can't allocate memory for ost-in-use array\n");
return -ENOMEM;
}
- memset(info->lti_ea_store, 0, sizeof(int) * stripes);
+ memset(info->lti_ea_store, -1, sizeof(int) * stripes);
return 0;
}
Landed for 2.4