[LU-4639] HSM requests not delivered Created: 17/Feb/14 Updated: 21/Mar/14 Resolved: 04/Mar/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.1 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Henri Doreau (Inactive) | Assignee: | James Nunez (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | MB | ||
| Severity: | 3 |
| Project: | HSM |
| Rank (Obsolete): | 12689 |
| Description |
|
Issuing too many HSM requests (120 or more, it seems) leads to lnet errors. The corresponding requests, as well as subsequent ones, aren't delivered to the copytool. LNetError: 7307:0:(lib-ptl.c:190:lnet_try_match_md()) Matching packet from 12345-0@lo, match 1460297122524484 length 6776 too big: 7600 left, 6144 allowed Can be easily reproduced with lfs hsm_archive * against a hundred-ish files. |
| Comments |
| Comment by Peter Jones [ 17/Feb/14 ] |
|
James Please could you set up this reproducer? Thanks Peter |
| Comment by Amir Shehata (Inactive) [ 17/Feb/14 ] |
|
as far as I understand, this error message is in lnet_try_match_md(). This is when a message is received and LNet tries to mach a memory descriptor, which was setup to receive these messages. The MD is of a specific size. Now there are some flags that can be set to define what should be done if the message received is larger than the memory descriptor. It can truncate or drop. This particular MD seems to be setup to drop messages. I believe either the HSM needs to be aware of this limiation and make sure to send messages of a smaller size, or to increas the size of the MD |
| Comment by Aurelien Degremont (Inactive) [ 17/Feb/14 ] |
|
This limitation was known. But the resulting stale state was not. It is a little bit strange that upper layer knows lower layers constraints. |
| Comment by Amir Shehata (Inactive) [ 19/Feb/14 ] |
|
I wasn't involved in the HSM design. But if the limitation was known, what is the expected behavior when this scenario is hit? From the HSM feature perspective, is the message dropped silently at the ptl layer, and that's what's causing the problem? |
| Comment by Aurelien Degremont (Inactive) [ 19/Feb/14 ] |
|
Actually, it was not know during HSM design |
| Comment by Oleg Drokin [ 19/Feb/14 ] |
|
The packet drops hang everything because the client will keep retrying indefinitely, as server will not provide any indication it just dropped the request. |
| Comment by Jinshan Xiong (Inactive) [ 19/Feb/14 ] |
|
It looks like we've already limited the maximum size of HSM requests to MDS_MAXREQSIZE which is 5K bytes. Why does it add up to 6776 bytes at the end? |
| Comment by James Nunez (Inactive) [ 20/Feb/14 ] |
|
Yes, I'm able to recreate this problem trying archive ~111 files at once. There are several of the follow error on the agent: LNetError: 12269:0:(lib-ptl.c:190:lnet_try_match_md()) Matching packet from 12345-192.168.2.109@o2ib, match 1460420866039584 length 8288 too big: 7896 left, 6144 allowed On the MDS, I see the following in dmesg: Lustre: scratch-MDD0000: changelog on LustreError: 21923:0:(mdt_coordinator.c:950:mdt_hsm_cdt_start()) scratch-MDT0000: Coordinator already started LustreError: 21923:0:(obd_config.c:1350:class_process_proc_param()) mdt.: error writing proc entry 'hsm_control': rc = -114 Lustre: 1963:0:(client.c:1907:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1392918571/real 1392918571] req@ffff8806bd2bd400 x1460420865979988/t0(0) o8->scratch-OST0000-osc-MDT0000@192.168.2.111@o2ib:28/4 lens 400/544 e 0 to 1 dl 1392918596 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 1963:0:(client.c:1907:ptlrpc_expire_one_request()) Skipped 41 previous similar messages LNet: 1959:0:(o2iblnd_cb.c:381:kiblnd_handle_rx()) PUT_NACK from 192.168.2.110@o2ib Lustre: 21913:0:(client.c:1907:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1392931034/real 1392931034] req@ffff8806bd181800 x1460420866039584/t0(0) o107->scratch-MDT0000@192.168.2.110@o2ib:15/16 lens 8288/224 e 0 to 1 dl 1392931041 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Lustre: 21913:0:(client.c:1907:ptlrpc_expire_one_request()) Skipped 5 previous similar messages LNet: 1958:0:(o2iblnd_cb.c:381:kiblnd_handle_rx()) PUT_NACK from 192.168.2.110@o2ib |
| Comment by James Nunez (Inactive) [ 20/Feb/14 ] |
|
I tried to see if any of the files I tried to archive were actually archived or if I could archive a new file. For all files that I tried to archive in the bulk archive attempt above, I see: # lfs hsm_state u1* u1: (0x00000001) exists, archive_id:1 u10: (0x00000001) exists, archive_id:1 u100: (0x00000001) exists, archive_id:1 ... I can't archive any new files and there is no error message in the logs or console for this new request to archive. Maybe the request is queued up, but can't be executed because of the previous bulk request: # lfs hsm_state u2 u2: (0x00000000) # lfs hsm_archive --archive 1 u2 # lfs hsm_state u2 u2: (0x00000000) I rebooted the client node and tried the above again, but nothing changed. I looks like, as Oleg said, the client is stuck. I still see the following error message printed periodically in the agent logs: LNetError: 12270:0:(lib-ptl.c:190:lnet_try_match_md()) Matching packet from 12345-192.168.2.109@o2ib, match 1460420866039584 length 8288 too big: 7896 left, 6144 allowed LNetError: 12270:0:(lib-ptl.c:190:lnet_try_match_md()) Skipped 85 previous similar messages |
| Comment by John Hammond [ 25/Feb/14 ] |
|
Jinshan, we limit on the client side in ll_dir_ioctl(). There we require that hur_len(hur) < MDS_MAXREQSIZE (5K). Since hsm_user_item is 32 bytes, this means hur_count < 160. Then the MDT tries to send the same number of 72 byte hsm_action_items to the copytool. So with 120 items we have a KUC buffer that's more than 72 * 120 = 8640 bytes. It looks like a cheap fix would be to use MDS_MAXREQSIZE / 4 in ll_dir_ioctl(). |
| Comment by Jinshan Xiong (Inactive) [ 25/Feb/14 ] |
|
Good point John. |
| Comment by James Nunez (Inactive) [ 25/Feb/14 ] |
|
I tried the same thing on the latest build of b2_5, build #33, and got different error messages. The results seem the same; I can't archive any files from this client and the bulk archive command never completes, but I can write from this client to the file system. # ls u1* | wc
111 111 543
# lfs hsm_archive u1*
# lfs hsm_state u1*
u1: (0x00000009) exists archived, archive_id:1
u10: (0x00000009) exists archived, archive_id:1
u100: (0x00000009) exists archived, archive_id:1
...
There are no errors on the client, agent, nor MDS. So, with 111 files, archiving all the files worked. Then tried 120 files: # ls a* | wc
125 125 517
# lfs hsm_archive a*
# lfs hsm_state a*
a1: (0x00000000)
a10: (0x00000000)
a100: (0x00000000)
...
So, it looks like the archive of these 120 files did not work. The difference between b2_5 and the master results, posted previously in this ticket, are the errors in the logs. After the failed bulk archive attempt, there are no errors on the client nor on the agent. The errors on the MDS are different from master in that they complain about not finding an agent for archive 3: Lustre: 13480:0:(mdd_device.c:1351:mdd_changelog_user_purge()) Could not determine changelog records to purge; rc=0 Lustre: 13480:0:(mdd_device.c:1351:mdd_changelog_user_purge()) Skipped 14 previous similar messages LustreError: 21605:0:(mdt_hsm_cdt_agent.c:338:mdt_hsm_agent_send()) scratch-MDT0000: Cannot find agent for archive 3: rc = -11 LustreError: 21605:0:(mdt_hsm_cdt_agent.c:338:mdt_hsm_agent_send()) Skipped 2 previous similar messages LustreError: 21605:0:(mdt_hsm_cdt_agent.c:338:mdt_hsm_agent_send()) scratch-MDT0000: Cannot find agent for archive 3: rc = -11 I have a single archive with index 1. On the agent: # ps -ef | grep hsm | grep -v grep root 11891 1 0 13:29 ? 00:00:00 lhsmtool_posix --daemon --hsm_root /lustre/archive --archive 1 /lustre/scratch |
| Comment by James Nunez (Inactive) [ 27/Feb/14 ] |
|
I tested John's fix on master and, as expected, it reduced the number of files I was allowed to request for archive. If I try to archive, say, 200 files, I get an "Argument list too long" (E2BIG) error as expected. Now the bulk archive request with the reduced number of files succeeds and does not hang the client. Patch for master at http://review.whamcloud.com/#/c/9393/ |
| Comment by Aurelien Degremont (Inactive) [ 28/Feb/14 ] |
|
"/ 4" seems a little bit magic to me. It is possible to add a comment to both patches to explain why we are doing this? If my maths are correct, divide it by 3 should be enough to fit in MDS_MAXREQSIZE. This should give us 33% increase of file in batched hsm requests. Ideally, we should add a test for that also |
| Comment by Robert Read (Inactive) [ 21/Mar/14 ] |
|
I'm not sure I'd call this patch a fix - it's really just a work around. A fix would have allowed arbitrary sized requests, or would have at least fixed the lfs command to submit a large request in multiple batches instead of failing. In particular, since there is no way for a user (or tool like lfs) to determine what the max request size is, it's not possible to know exactly when a request is too large until it is too late. This makes it difficult to build usable tools on top of this without resort to arbitrarily small batch sizes or hard coded magic values. |