To fix this problem properly, it may be enough to check here if there is any DLM lock on the inode, and drop the inode if not?
Something like the following in ll_drop_inode():
if (!md_lock_match(ll_i2mdexp(inode), ..., ll_i2fid(inode), ...))
return 1;
The main question is which lock mode/bits/flags to be used for the matched? The inode/layout lock should not be dropped if it is being used for IO or has dirty data (that is already impossible in an active syscall, but maybe between syscalls), but should be dropped when there are no more DLM locks (MDC or OSC) using the inode, since any future access will need an MDS RPC to revalidate anyway.
A possibly more efficient option would be to add a refcount to ll_inode_info with the number of DLM locks attached to it, so that it can be checked more efficiently instead of a lock match. The refcount could be added to the inode after lli_inode_magic, since there is a 4-byte hole.
The DLM locks shouldn't (I think?) __iget() the VFS inode itself for each lock to avoid a circular dependency where inodes cannot be dropped from cache when they have any DLM locks, since that may pin a lot of inodes. The counter argument would that the DLM locks would eventually be dropped from cache themselves (LRU, slab shrinker), so the inode refcount would be finite, but adds some interdependency.
Good news. With the dropping of RHLE7 we now have the super block shrinker for all platforms. To use it we need to move to the fc_context API. Then we can implement a shrinker. Having a debugfs interface will take some hoops to get working but it can be done.
I have some other items to work on but I do want to get some cycles for this.