Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.4.0
-
x86_64 client
-
1
-
3
-
8255
Description
We've observed that when using Robinhood to scan the Sequoia filesystem the Lustre 2.3.65 (and earlier) clients will thrash in vmalloc(). The issue is caused by Robinhood repeatedly calling IOC_MDC_GETFILESTRIPE to get the striping information for the files which it is scanning.
Normally this wouldn't be an issue but because Sequoia's filesystem has a large number of OSTs (768). And because we always allocate space in the reply buffer for the maximum numbers of OSTs. All of the reply buffers end up getting vmalloc()'ed instead of kmalloc()'ed by OBD_ALLOC_LARGE().
It's worth noting that we see this behavior even though we're running with the fix for the recent kernel vmalloc() regression. Going forward we should keep this in mind and never ever use vmalloc() if we can at all avoid it. It's not just slow, it's downright dangerous to use in any concurrent context.
I'll push a fix for this issue shortly for review.