[LU-3876] flow control of HSM requests - Whamcloud Community JIRA

Details

Type: Technical task
Resolution: Fixed
Priority: Blocker
Fix Version/s: Lustre 2.5.0
Affects Version/s: None
Labels:
- HSM

Rank (Obsolete):
10057

Description

In a stress test I did today, I created 40K files and archive them with 2 clients. The requests were queued into MDT successfully but it caused other problems.

the first problem is the lprocfs implementation of agent_action. The symptom is:

[root@mds01 ~]# lctl get_param mdt.*.hsm.agent_actions
error: get_param: read('/proc/fs/lustre/mdt/hsm-MDT0000/hsm/agent_actions') failed: Cannot allocate memory

Though I didn't look at it yet, I think the root cause is that the llog is too long so it ran into a problem for some reason.

I think the more severe problem is flow control. It's not good to keep the requests in queue so much long, at least we should have a parameter to control how long the maximum length of queue will be.

Another problem I saw in the test is that:

LustreError: 27319:0:(mdt_coordinator.c:1418:mdt_hsm_update_request_state()) hsm-MDT0000: Cannot find running request for cookie 0x5226bb27 on fid=[0x200000400:0xee5:0x0]
LustreError: 27319:0:(mdt_coordinator.c:1418:mdt_hsm_update_request_state()) Skipped 74 previous similar messages

There were a huge number of this warning. I will dig it tomorrow

Attachments

Activity

People

Assignee:: Jinshan Xiong (Inactive)

Reporter:: Jinshan Xiong (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 04/Sep/13 6:04 AM

Updated:: 24/Sep/13 8:46 PM

Resolved:: 24/Sep/13 8:46 PM