[LU-9063] hsm: race on the coordinator's state - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
- HSM
- patch

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

There is nothing to prevent a coordinator from restarting after it was shut down. This is a problem at cleanup time for example when unmounting an mdt.

The race is a bit difficult to trigger. The simplest way to do it is to add a delay in mdt_hsm_cdt_stop() right after the coordinator's state is set to CDT_STOPPED.

After applying the patch (cf. attachment), concurrently run:

while [ $(lctl get_param -n mdt.lustre-MDT0000.hsm_control) != "stopped" ]; do
    sleep 1
done
lctl set_param mdt.lustre-MDT0000.hsm_control enabled

and

lustre/tests/llmountcleanup.sh

This should trigger the following:
kernel:LustreError: 20570:0:(mdt_coordinator.c:391:hsm_cdt_procfs_fini()) ASSERTION( cdt->cdt_state == CDT_STOPPED ) failed:
kernel:LustreError: 20570:0:(mdt_coordinator.c:391:hsm_cdt_procfs_fini()) LBUG

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

0001-LU-0000-hsm-add-a-delay-to-easily-trigger-a-race-at-.patch
1 kB
30/Jan/17 2:19 PM

Activity

People

Assignee:: Hongchao Zhang

Reporter:: CEA

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 30/Jan/17 2:31 PM

Updated:: 16/Dec/23 10:34 PM