Aurelien, note that lru_max_age should really only be a fallback upper limit for the dynamic LRU pool management. Unfortunately, the dynamic LRU code has not been working well for a long time (see LU-7266), and often users disable it by setting lru_max_age and lru_size=N.
However, that is sub-optimal since it means some clients may have too many locks, while others too few, and setting too high a limit causes memory pressure on the servers and/or clients.
What is really needed here is some investigation into the LDLM pool "Lock Volume" calculations to see why this is not working. The basic theory is that sum(age of locks) is a "volume" that the server distributes among clients, and the client can manage locks within that volume as it sees fit (many short-lived locks, few long-lived locks), and if the client lock volme is growing to exceed its assigned limit (due to aging of old locks and/or acquiring many new locks) then it should cancel the oldest unused locks to reduce the volume again. The client is really in the best position to judge which of its locks are most important, but as a workaround to memory pressure issues, LU-6529 was implemented to give the server the ability to cancel locks more aggressively to avoid OOM.
It may be that LDLM_POOL_MAX_AGE is just set much too high and/or the DLM server is allowing too much memory to be put toward locks (e.g. not considering multiple namespaces, or just assigning too large a fraction of RAM to LDLM vs. filesystem cache, etc), so the clients are not cancelling locks aggressively enough. There may also be issues w.r.t. hooking into the kernel slab cache shrinkers not working properly (this should reduce the lock volume on the server to force clients to cancel locks, and on the client to directly cancel locks).
The other area that could benefit is replacing the strict LRU managing the locks on the client. For clients doing things like filesystem scanning, strict LRU is not a very good algorithm, since that flushes out "valuable" locks too quickly (e.g. parent directory locks) and doesn't drop "boring" locks (e.g. the use-once locks for the individual files). Using a better caching algorithm (e.g. LFRU, 2Q/SLRU, ARC) would go a long way to improving lock cache usage on the client. ARC is probably the best choice, since it would be possible to keep the FIDs in the "ghost" lists without actually caching the lock/pages, and in case some frequently-used lock had to be cancelled due to contention it doesn't immediately lose the "value" that had been built up for that lock.
I'm going to close this as a duplicate of
LU-17428, which has a patch close to landing to reduce the default lru_max_age=600s.