[LU-17308] makes "lctl pool_*" more reliable for big configurations Created: 21/Nov/23  Updated: 04/Feb/24  Resolved: 04/Feb/24

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Etienne Aujames Assignee: Etienne Aujames
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-17250 Add new MDT to existing filesystem mi... Open
is related to LU-8970 Kernel warning on client mount if poo... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Recently, at the CEA, we hit an issue to re-create pools after a --writeconf on a big configuration (a lot of targets and pools).

Errors were returned when adding too quickly OST in the pool (using separate commands). The workaround is to add delays between each command.

This was hit on a standalone MGS with a mounted client (I am not 100% sure) with a 2.12.9.
Since 2.12, there are several patches that could help:

  • LU-17182 utils: pool_add send OSTs in one batch
  • LU-15706 llog: deal with "SKIP" pool llog records correctly
  • LU-14516 mgc: configurable wait-to-reprocess time
  • LU-13686 utils: pool_add/remove error code fix

But I found some issues when I tried to understand the "lctl pool_*" command:

  1. with a client (MDT and MGT share the same node), the sanity check before touching the MGS configuration is done in userspace by checking the lov client pool parameters. But nothing guarantees those parameters are sync with the MGS. Only the MGS configuration should be trusted, otherwise this could lead to inconsistencies (e.g: adding an OST to a non-existing pool). I think those kinds of behavior is more likely to be hit when executing several commands in a row (clients have to cancel their config lock and re-read their configuration for each command).
  2. on a separate MGS (without a client mounted), the MGS configuration is checked in userspace. But there are a lot of overheads. e.g: to add an OST, the MGS client configuration (fsname-client) is read 5 times (sanity check x3 + kernel x1 + check result x1). So when the configuration is big, this take time. And this use case is not documented.
  3. "lctl pool_add/pool_remove" do not check the ioctl return code (kernel).
  4. check_pool_cmd_result() does not re-compute the client wait delay with mgc_requeue_timeout_min parameter.


 Comments   
Comment by Gerrit Updater [ 21/Nov/23 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53202
Subject: LU-17308 mgs: move pool_cmd check to the kernel
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 641a8a68655f57205f12b5f1731efd3b7f0825f0

Comment by Gerrit Updater [ 04/Feb/24 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53202/
Subject: LU-17308 mgs: move pool_cmd check to the kernel
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ce824977a212d243e15cf07e52a91984841f9b17

Comment by Peter Jones [ 04/Feb/24 ]

Merged for 2.16

Generated at Sat Feb 10 03:34:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.