Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.5.0
-
3
-
HSM
-
12689
Description
Issuing too many HSM requests (120 or more, it seems) leads to lnet errors. The corresponding requests, as well as subsequent ones, aren't delivered to the copytool.
LNetError: 7307:0:(lib-ptl.c:190:lnet_try_match_md()) Matching packet from 12345-0@lo, match 1460297122524484 length 6776 too big: 7600 left, 6144 allowed
Can be easily reproduced with lfs hsm_archive * against a hundred-ish files.
I tested John's fix on master and, as expected, it reduced the number of files I was allowed to request for archive. If I try to archive, say, 200 files, I get an "Argument list too long" (E2BIG) error as expected. Now the bulk archive request with the reduced number of files succeeds and does not hang the client.
Patch for master at http://review.whamcloud.com/#/c/9393/
Patch for b2_5 at http://review.whamcloud.com/#/c/9422/