Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.5.0
-
3
-
HSM
-
12689
Description
Issuing too many HSM requests (120 or more, it seems) leads to lnet errors. The corresponding requests, as well as subsequent ones, aren't delivered to the copytool.
LNetError: 7307:0:(lib-ptl.c:190:lnet_try_match_md()) Matching packet from 12345-0@lo, match 1460297122524484 length 6776 too big: 7600 left, 6144 allowed
Can be easily reproduced with lfs hsm_archive * against a hundred-ish files.
I'm not sure I'd call this patch a fix - it's really just a work around. A fix would have allowed arbitrary sized requests, or would have at least fixed the lfs command to submit a large request in multiple batches instead of failing.
In particular, since there is no way for a user (or tool like lfs) to determine what the max request size is, it's not possible to know exactly when a request is too large until it is too late. This makes it difficult to build usable tools on top of this without resort to arbitrarily small batch sizes or hard coded magic values.