[LU-11403] lov_io_init_empty() Page fault on a file without stripes Created: 19/Sep/18 Updated: 08/Jun/19 Resolved: 04/May/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Upstream |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.3 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Alex Zhuravlev | Assignee: | Patrick Farrell (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
REFORMAT=yes RACER_ENABLE_DOM=false MOUNT_2=yes SLOW=yes sh racer.sh
LustreError: 20867:0:(lov_io.c:1481:lov_io_init_empty()) Page fault on a file without stripes: [0x200000402:0x116f5:0x0]
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa0cc04e8>] ll_fault+0xb8/0x610 [lustre]
PGD 0
Oops: 0002 [#1] SMP
Modules linked in: zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) lustre(O) ofd(O) osp(O) lod(O) ost(O) mdt(O) mdd(O) mgs(O) osd_ldiskfs(O) ldiskfs(O) lquota(O) lfsck(O) obdecho(O) mgc(O) lov(O) mdc(O) osc(O) lmv(O) fid(O) fld(O) ptlrpc(O) obdclass(O) ksocklnd(O) lnet(O) libcfs(O)
CPU: 0 PID: 20867 Comm: systemd-coredum Tainted: P O ------------ 3.10.0 #5
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff8800aebaa0c0 ti: ffff88006485c000 task.ti: ffff88006485c000
RIP: 0010:[<ffffffffa0cc04e8>] [<ffffffffa0cc04e8>] ll_fault+0xb8/0x610 [lustre]
RSP: 0018:ffff88006485fc70 EFLAGS: 00010246
RAX: 0000000000000001 RBX: 0000000000000100 RCX: 00000000c0000100
RDX: ffffffff81ab94c0 RSI: ffff8800aebaa0c0 RDI: ffff88011fc15b00
RBP: ffff88006485fcd8 R08: ffff88006485c000 R09: 00000000000004c1
R10: 000000009837f050 R11: ffff88011fc15b70 R12: 0000000000000000
R13: ffff8801157a4b60 R14: ffff88006485fce8 R15: ffff8800aebaa0c0
FS: 00007f4c1846c940(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000689c0000 CR4: 00000000000007b0
Call Trace:
[<ffffffff811649f9>] __do_fault.isra.12+0x79/0xd0
[<ffffffff8116d563>] ? __vma_link_file+0x43/0x80
[<ffffffff81164a88>] do_read_fault.isra.14+0x38/0x170
[<ffffffff811703f7>] ? mmap_region+0x207/0x740
[<ffffffff8116a7ae>] handle_mm_fault+0x77e/0x11a0
[<ffffffff815b1078>] __do_page_fault+0x1b8/0x4b0
[<ffffffff815b13e3>] trace_do_page_fault+0x43/0xd0
[<ffffffff815b08ed>] do_async_page_fault+0x5d/0xa0
|
| Comments |
| Comment by Patrick Farrell (Inactive) [ 13/Feb/19 ] |
|
Not sure why we haven't been hitting this forever, but I'm pretty sure this is the bug. When there's no stripe, we return EFAULT (lov_io_init_empty): Which is translated to VM_FAULT_NOPAGE (to_fault_error) case -EFAULT:
result = VM_FAULT_NOPAGE;
break;
Then in ll_fault: if (!(result & (VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED))) {
struct page *vmpage = vmf->page; /* check if this page has been truncated */
lock_page(vmpage);
But the page is not set in this case. static int __do_fault(struct vm_area_struct *vma, unsigned long address,
pgoff_t pgoff, unsigned int flags,
struct page *cow_page, struct page **page,
void **entry, pmd_t *pmd, pte_t orig_pte)
{
struct vm_fault vmf;
int ret; vmf.virtual_address = (void __user *)(address & PAGE_MASK);
vmf.pgoff = pgoff;
vmf.flags = flags;
vmf.page = NULL; <----------------------
vmf.gfp_mask = __get_fault_gfp_mask(vma);
vmf.cow_page = cow_page;
vmf.orig_pte = orig_pte;
vmf.pmd = pmd;
vmf.vma = vma;
ret = vma->vm_ops->fault(vma, &vmf); <------------- ll_fault
Various errors can cause page to not get set. Fix seems clear - Check for the page. |
| Comment by Gerrit Updater [ 13/Feb/19 ] |
|
Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34247 |
| Comment by Alexander Zarochentsev [ 11/Mar/19 ] |
|
attached a simpler reproducer for the ll_fault() crash m.c https://review.whamcloud.com/34247 causes the reproducer to hang with repeatable messages in kernel log: [ 87.575164] LustreError: 5061:0:(lov_io.c:1481:lov_io_init_empty()) Page fault on a file without stripes: [0x200000401:0x1:0x0] [ 87.575546] LustreError: 5061:0:(lov_io.c:1481:lov_io_init_empty()) Skipped 251880 previous similar messages [ 89.577930] LustreError: 5061:0:(lov_io.c:1481:lov_io_init_empty()) Page fault on a file without stripes: [0x200000401:0x1:0x0] [ 89.578283] LustreError: 5061:0:(lov_io.c:1481:lov_io_init_empty()) Skipped 493482 previous similar messages [ 93.579391] LustreError: 5061:0:(lov_io.c:1481:lov_io_init_empty()) Page fault on a file without stripes: [0x200000401:0x1:0x0] [ 93.579750] LustreError: 5061:0:(lov_io.c:1481:lov_io_init_empty()) Skipped 946319 previous similar messages [ 101.581054] LustreError: 5061:0:(lov_io.c:1481:lov_io_init_empty()) Page fault on a file without stripes: [0x200000401:0x1:0x0] [ 101.581550] LustreError: 5061:0:(lov_io.c:1481:lov_io_init_empty()) Skipped 1945088 previous similar messages [ 117.585786] LustreError: 5061:0:(lov_io.c:1481:lov_io_init_empty()) Page fault on a file without stripes: [0x200000401:0x1:0x0] [ 117.586855] LustreError: 5061:0:(lov_io.c:1481:lov_io_init_empty()) Skipped 4004012 previous similar messages instead of crash. however , the hang is interruptible from the terminal. |
| Comment by Patrick Farrell (Inactive) [ 11/Mar/19 ] |
|
Oh, cool! Thanks very much to Panda for the reproducer and Zam for the report. I will take a look at the hang. (Agreed it's better than panicking, but we should be able to do better.) |
| Comment by Gerrit Updater [ 13/Apr/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34247/ |
| Comment by Peter Jones [ 13/Apr/19 ] |
|
Landed for 2.13 |
| Comment by Andreas Dilger [ 17/Apr/19 ] |
|
The sanity.sh test_61b added from this patch is failing regularly. My thinking is that it makes sense to instantiate the layout at the time that mmap() is called for the full range of the mapping, rather than waiting for the page fault. |
| Comment by Gerrit Updater [ 17/Apr/19 ] |
|
Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34698 |
| Comment by Gerrit Updater [ 04/May/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34698/ |
| Comment by Peter Jones [ 04/May/19 ] |
|
Take two |
| Comment by Gerrit Updater [ 21/May/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34914 |
| Comment by Gerrit Updater [ 21/May/19 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34935 |
| Comment by Gerrit Updater [ 08/Jun/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34935/ |
| Comment by Gerrit Updater [ 08/Jun/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34914/ |