[LU-8464] Lustre I/O hung waiting for page Created: 02/Aug/16 Updated: 28/Oct/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Andriy Skulysh | Assignee: | Andriy Skulysh |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
> PID: 63193 TASK: ffff880902f7e040 CPU: 35 COMMAND: "python" > PID: 65447 TASK: ffff88032d84e040 CPU: 15 COMMAND: "slurmstepd" |
| Comments |
| Comment by Andriy Skulysh [ 02/Aug/16 ] |
|
> PID: 14502 TASK: ffff881fedf78040 CPU: 13 COMMAND: "ptlrpcd_11" |
| Comment by Andriy Skulysh [ 02/Aug/16 ] |
|
Thread PID: 65447 tries to migrate 2nd page from extent But these 2 pages are going to fit in one RPC. So PID 14502 can't complete IO because the 1st page was locked by pid 65447. |
| Comment by Gerrit Updater [ 03/Aug/16 ] |
|
Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: http://review.whamcloud.com/21652 |
| Comment by Jinshan Xiong (Inactive) [ 05/Aug/16 ] |
|
does migrate_pages() lock one page and then wait another page to complete write back? |
| Comment by Jinshan Xiong (Inactive) [ 17/Aug/16 ] |
|
I spent some time on this issue and found some thing new(what you didn't mentio on the ticket). I think this issue is due to an implementation of memory cgroup. As you can see from the code __unmap_and_move(): lock_page(page);
}
/* charge against new page */
mem_cgroup_prepare_migration(page, newpage, &mem);
it locks a page and charges mem cgroup, which in turns try to free a page from the cgroup. In the process of freeing page, it waits for the page write back to complete. This causes deadlock. Let me put things together. Ptlrpc thread: lock page A; set writeback to page A; unlock page A; lock page B <- blocked and migrating thread: /* try to migrate page B */ lock page B; /* since there is no free slot of this process' memory control group */ try to free page A; wait for A's writeback to complete; <- blocked free page A; wait for B's writeback to complete; It's a really bad choice for migrate_pages() to lock a page and wait for writeback on another one to complete. This problem is hard to fix in Lustre but way easier to get fixed in kernel, actually it turns out that linux-4.x kernels don't have this problem any more. I will take a further look to see since which kernel this problem has been fixed. |
| Comment by Ann Koehler (Inactive) [ 18/Aug/16 ] |
|
Thanks Jinshan. I'll pass this bug on to our kernel engineers. If you can identify the kernel where it's fixed, I'm sure that would be a big help. |
| Comment by Jinshan Xiong (Inactive) [ 18/Aug/16 ] |
|
it seems like the fix appears since 3.18 kernels. |
| Comment by Ann Koehler (Inactive) [ 18/Aug/16 ] |
|
The upstream commit that removed the mem_cgroup_prepare_migration() |
| Comment by Oleg Drokin [ 09/Jan/17 ] |
|
I opened a redhat bugzilla ticket about this to backport the patch into sme next rhel7.x kernel. (you probably cannot see it since by default all such tickets are private): |
| Comment by Andreas Dilger [ 28/Oct/22 ] |
|
Oleg, was this patch ever landed in newer el7 releases? I'm wondering if this should be closed as "Won't Fix" since the patch is not really needed anymore, AFAICS. |