HSM _not only_ small fixes and to do list goes here
(LU-3647)
|
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.5.0 |
| Type: | Technical task | Priority: | Blocker |
| Reporter: | Jinshan Xiong (Inactive) | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HSM | ||
| Rank (Obsolete): | 10057 |
| Description |
|
In a stress test I did today, I created 40K files and archive them with 2 clients. The requests were queued into MDT successfully but it caused other problems. the first problem is the lprocfs implementation of agent_action. The symptom is: [root@mds01 ~]# lctl get_param mdt.*.hsm.agent_actions
error: get_param: read('/proc/fs/lustre/mdt/hsm-MDT0000/hsm/agent_actions') failed: Cannot allocate memory
Though I didn't look at it yet, I think the root cause is that the llog is too long so it ran into a problem for some reason. I think the more severe problem is flow control. It's not good to keep the requests in queue so much long, at least we should have a parameter to control how long the maximum length of queue will be. Another problem I saw in the test is that: LustreError: 27319:0:(mdt_coordinator.c:1418:mdt_hsm_update_request_state()) hsm-MDT0000: Cannot find running request for cookie 0x5226bb27 on fid=[0x200000400:0xee5:0x0] LustreError: 27319:0:(mdt_coordinator.c:1418:mdt_hsm_update_request_state()) Skipped 74 previous similar messages There were a huge number of this warning. I will dig it tomorrow |
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 10/Sep/13 ] |
|
patch is at: http://review.whamcloud.com/7589 Just fix the problem of ENOMEM. More work will be needed to add flow control. |
| Comment by John Hammond [ 10/Sep/13 ] |
|
From the autotest logs I have also seen this file return -EIO causing sanity-hsm test 40 to pass when it should have failed. Does anyone have any idea why it might do so? |
| Comment by Jinshan Xiong (Inactive) [ 18/Sep/13 ] |
|
In 2.5, we're going to fix the problem of dumping a huge amount of agent_actions only. The real flow control will be fixed in 2.6 due to limited resource. |
| Comment by Jodi Levi (Inactive) [ 24/Sep/13 ] |
|
Patch landed to Master. Follow on work for 2.6 is being tracked in |