Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
Recently, at the CEA, we hit an issue to re-create pools after a --writeconf on a big configuration (a lot of targets and pools).
Errors were returned when adding too quickly OST in the pool (using separate commands). The workaround is to add delays between each command.
This was hit on a standalone MGS with a mounted client (I am not 100% sure) with a 2.12.9.
Since 2.12, there are several patches that could help:
LU-17182utils: pool_add send OSTs in one batchLU-15706llog: deal with "SKIP" pool llog records correctlyLU-14516mgc: configurable wait-to-reprocess timeLU-13686utils: pool_add/remove error code fix
But I found some issues when I tried to understand the "lctl pool_*" command:
- with a client (MDT and MGT share the same node), the sanity check before touching the MGS configuration is done in userspace by checking the lov client pool parameters. But nothing guarantees those parameters are sync with the MGS. Only the MGS configuration should be trusted, otherwise this could lead to inconsistencies (e.g: adding an OST to a non-existing pool). I think those kinds of behavior is more likely to be hit when executing several commands in a row (clients have to cancel their config lock and re-read their configuration for each command).
- on a separate MGS (without a client mounted), the MGS configuration is checked in userspace. But there are a lot of overheads. e.g: to add an OST, the MGS client configuration (fsname-client) is read 5 times (sanity check x3 + kernel x1 + check result x1). So when the configuration is big, this take time. And this use case is not documented.
- "lctl pool_add/pool_remove" do not check the ioctl return code (kernel).
- check_pool_cmd_result() does not re-compute the client wait delay with mgc_requeue_timeout_min parameter.