[LU-3298] infinite loop in page fault on mmapped file. Created: 08/May/13 Updated: 09/Oct/21 Resolved: 09/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Oleg Drokin | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 8165 |
| Description |
|
The story on this one is not exactly clear yet. == sanityn test 18: mmap sanity check =================================== 17:19:48 (1367961588) mmap test1: basic mmap operation (PASS, 0.040102s) mmap test2: MAP_PRIVATE not write back (PASS, 0.011313s) mmap test3: concurrent mmap ops on two nodes (SKIPPED, 0s) mmap test4: c1 write to f1 from mmapped f2, c2 write to f1 from mmapped f1 (PASS, 2.1596s) mmap test5: read/write file to/from the buffer which mmapped to just this file (PASS, 0.060313s) the bug hit in test 6 11499 pts/0 R+ 1288:17 /home/green/git/lustre-release/lustre/tests/../tests/mmap_sanity -d /mnt/lustre -m /mnt/lustre2 -e 3 Investigation of logs uncovered this: 00000080:00008000:7.0:1367964803.444630:0:11499:0:(vvp_io.c:727:vvp_io_fault_start()) llite: fault and truncate race happened! 00000080:00000001:7.0:1367964803.444630:0:11499:0:(vvp_io.c:731:vvp_io_fault_start()) Process leaving via out (rc=1 : 1 : 0x1) The check for that is to see if page->mapping is NULL indicating truncate. After digging some more into it I found that after this return we return all the way to userspace, then page fault repeats and we find this exact page when doing old_page = vm_normal_page(vma, address, orig_pte); in do_wp_page in kernel. This is basically all info I have for this problem now, we'll see if it repeats anywhere. |
| Comments |
| Comment by Andreas Dilger [ 09/May/13 ] |
|
This seems related to the same stack that I posted in skype? |
| Comment by Jinshan Xiong (Inactive) [ 09/May/13 ] |
|
We did some investigation on this issue: one page was truncated but not removed from page table. So it fell into an infinite loop where it could find the page from page table but return in vvp_io_fault_start() because page->mapping is NULL. I'll work out a debug patch. |