Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.5.0
-
3
-
HSM
-
12689
Description
Issuing too many HSM requests (120 or more, it seems) leads to lnet errors. The corresponding requests, as well as subsequent ones, aren't delivered to the copytool.
LNetError: 7307:0:(lib-ptl.c:190:lnet_try_match_md()) Matching packet from 12345-0@lo, match 1460297122524484 length 6776 too big: 7600 left, 6144 allowed
Can be easily reproduced with lfs hsm_archive * against a hundred-ish files.
Jinshan, we limit on the client side in ll_dir_ioctl(). There we require that hur_len(hur) < MDS_MAXREQSIZE (5K). Since hsm_user_item is 32 bytes, this means hur_count < 160. Then the MDT tries to send the same number of 72 byte hsm_action_items to the copytool. So with 120 items we have a KUC buffer that's more than 72 * 120 = 8640 bytes. It looks like a cheap fix would be to use MDS_MAXREQSIZE / 4 in ll_dir_ioctl().