Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
3
-
9223372036854775807
Description
If multiple targets are mounting on the same server, and one of the targets gets stuck accessing the MGS, or for other reasons during setup, a stack may be dumped like:
INFO: task mount.lustre:93138 blocked for more than 90 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:mount.lustre state:D stack:0 pid:93138 ppid:93135 flags:0x00004082 Call Trace: __schedule+0x2d1/0x870 schedule+0x55/0xf0 schedule_preempt_disabled+0xa/0x10 __mutex_lock.isra.11+0x349/0x420 mgc_fs_setup.isra.12+0x65/0x7a0 [mgc] mgc_set_info_async+0x99f/0xb30 [mgc] server_start_targets+0x452/0x2c30 [obdclass] server_fill_super+0x94e/0x10a0 [obdclass] lustre_fill_super+0x388/0x3d0 [lustre] mount_nodev+0x49/0xa0 legacy_get_tree+0x27/0x50 vfs_get_tree+0x25/0xc0 do_mount+0x2e9/0x950 ksys_mount+0xbe/0xe0
This is a fairly common occurrence in different situations and should be improved in a few ways:
- the mutex_lock(&cli->cl_mgc_mutex) in mgs_fs_setup() should be interruptible, so that a stuck mount can be killed without rebooting the server
- the cl_mgc_mutex is held for the full duration of the target's llog transfer from the MGS, blocking all other target mounts. This is ostensibly because the MGC is "attached" to a local target's CONFIG directory, and cannot process writes to different targets at the same time. It would be much better if this code was restructured so that targets could fetch and store their llogs concurrently from the MGS. Fault Tolerant MGS (LU-17819) would not help here, since the serialization is on the local node and there would still only be a single MGC service for each "client".
- we could consider to fetch the llog with bulk transfers instead of llog operations, but it might make the code messy