Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
If a single client is mounting multiple filesystems, but the servers are configured with different timeouts (e.g. local vs. WAN mounts) then the client will use the timeout value of the most recent filesystem mount. Since the obd_timeout value is global across all connections, if the most recent mount has a significantly higher timeout than the other filesystems, the client will not ping the servers often enough, resulting in client evictions during periods of inactivity.
The timeout parameter is currently implemented as a single value for all filesystems currently mounted, but it seems possible to potentially fix this to work better when filesystems with multiple timeouts are mounted. One possibility is to use the minimum timeout set between the multiple filesystem configurations as the client ping interval. Currently, since http://review.whamcloud.com/2881 the timeout is the maximum value from all different filesystems, which was to fix a problem with timeouts on the MGS. For clients it makes sense to use the minimum timeout between all filesystems mounted.
Of course, it would be even better if the timeout value in the config was stored on a per-mountpoint or per-import basis, so that it is possible to have different timeouts for local and remote filesystems, or large and small filesystems, or interactive and batch filesystems, etc. I don't know how easily that would be implemented, but it would be the better long-term solution than a single timeout for the whole client.
Attachments
Issue Links
- is related to
-
LU-16749 apply a filter to the configuration log.
- Open
-
LU-16002 Ping evictor delayed client eviction for 3 ping interval more than defined
- Resolved
-
LU-8750 Wrong obd_timeout on the client when we have 2 or more lustre fs
- Resolved
-
LU-15246 Add per device adaptive timeout parameters
- Resolved