[LU-9063] hsm: race on the coordinator's state Created: 30/Jan/17 Updated: 16/Dec/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | CEA | Assignee: | Hongchao Zhang |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | HSM, patch | ||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
There is nothing to prevent a coordinator from restarting after it was shut down. This is a problem at cleanup time for example when unmounting an mdt. The race is a bit difficult to trigger. The simplest way to do it is to add a delay in mdt_hsm_cdt_stop() right after the coordinator's state is set to CDT_STOPPED. After applying the patch (cf. attachment), concurrently run: while [ $(lctl get_param -n mdt.lustre-MDT0000.hsm_control) != "stopped" ]; do
sleep 1
done
lctl set_param mdt.lustre-MDT0000.hsm_control enabled
and lustre/tests/llmountcleanup.sh This should trigger the following: |
| Comments |
| Comment by Peter Jones [ 30/Jan/17 ] |
|
Hongchao Could you please advise on this one? Thanks Peter |
| Comment by Gerrit Updater [ 31/Jan/17 ] |
|
Vinayak (vinayakswami.hariharmath@seagate.com) uploaded a new patch: https://review.whamcloud.com/25170 |
| Comment by Gerrit Updater [ 06/Feb/17 ] |
|
Vinayak (vinayakswami.hariharmath@seagate.com) uploaded a new patch: https://review.whamcloud.com/25269 |
| Comment by Quentin Bouget [ 05/Apr/17 ] |
|
I think this patch would be a fix: https://review.whamcloud.com/#/c/22667 |
| Comment by Cory Spitz [ 16/Dec/23 ] |
|
Seems that this is an old issue that can be resolved. Quentin B. was probably right. |
| Comment by Cory Spitz [ 16/Dec/23 ] |
|
And I'm guessing that https://review.whamcloud.com/c/fs/lustre-release/+/25170 can be abandoned. |