[LU-8299] (vvp_io.c:922:vvp_io_fault_start()) ASSERTION( cl_index(obj, offset) == fio->ft_index ) Created: 17/Jun/16  Updated: 20/Feb/23  Resolved: 27/Feb/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0

Type: Bug Priority: Minor
Reporter: Patrick Farrell (Inactive) Assignee: Alexander Zarochentsev
Resolution: Fixed Votes: 0
Labels: None

Attachments: File mmap_test.c    
Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Full stack of the panic:

<0>LustreError: 15833:0:(vvp_io.c:1233:vvp_io_fault_start()) ASSERTION( cl_index(obj, offset) == fio->ft_index ) failed:
<0>LustreError: 15833:0:(vvp_io.c:1233:vvp_io_fault_start()) LBUG
<4>Pid: 15833, comm: a.out
<4>
<4>Call Trace:
<4> [<ffffffffa121c895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4> [<ffffffffa121ce97>] lbug_with_loc+0x47/0xb0 [libcfs]
<4> [<ffffffffa192b6fb>] vvp_io_fault_start+0x9ab/0xc70 [lustre]
<4> [<ffffffffa122b121>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
<4> [<ffffffffa1225778>] ? libcfs_log_return+0x28/0x40 [libcfs]
<4> [<ffffffffa135560a>] cl_io_start+0x6a/0x140 [obdclass]
<4> [<ffffffffa1359574>] cl_io_loop+0xb4/0x1b0 [obdclass]
<4> [<ffffffffa1909cee>] ll_fault+0x2de/0x510 [lustre]
<4> [<ffffffff8114a234>] __do_fault+0x54/0x530
<4> [<ffffffff8114a807>] handle_pte_fault+0xf7/0xb00
<4> [<ffffffff8117b2a9>] ? __mem_cgroup_try_charge+0x19/0x5d0
<4> [<ffffffffa121c27b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs]
<4> [<ffffffff8111f93e>] ? find_get_page+0x1e/0xa0
<4> [<ffffffff8152a0b9>] ? _cond_resched+0x9/0x40
<4> [<ffffffff81120e23>] ? filemap_fault+0xd3/0x500
<4> [<ffffffff8114b43a>] handle_mm_fault+0x22a/0x300
<4> [<ffffffff8104a8d8>] __do_page_fault+0x138/0x480
<4> [<ffffffff812307e4>] ? inode_has_perm+0x54/0xa0
<4> [<ffffffff8122ea61>] ? avc_has_perm+0x71/0x90
<4> [<ffffffff81334bbd>] ? tty_wakeup+0x3d/0x80
<4> [<ffffffff8152f23e>] do_page_fault+0x3e/0xa0
<4> [<ffffffff8152c5f5>] page_fault+0x25/0x30
<4> [<ffffffff8128e6ef>] ? copy_user_generic_unrolled+0x2f/0xb0
<4> [<ffffffff8133494f>] ? tty_write+0x1ff/0x2a0
<4> [<ffffffff81337850>] ? n_tty_write+0x0/0x460
<4> [<ffffffff81189278>] vfs_write+0xb8/0x1a0
<4> [<ffffffff81189c41>] sys_write+0x51/0x90
<4> [<ffffffff810e202e>] ? __audit_syscall_exit+0x25e/0x290
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

This is triggered by a trivial mmap test code which accidentally passes in an argument >2^63. I'll attach code for the test.

Running it like this:

truncate -s 8589934592 file
./mmap_test file 8589934500 8

Will cause that panic. The test is incorrect, but it still shouldn't panic nodes.

In vvp_io_fault_start, we calculate the index with cl_index. This treats its argument (offset) as signed... But offset (which is the same as ft_index here, which is that argument from userspace) is greater than 2^63, so it's negative, so this doesn't work.

/**
 * Converts a page index into a byte offset within object \a obj.
 */
pgoff_t cl_index(const struct cl_object *obj, loff_t offset)
{
        return offset >> PAGE_CACHE_SHIFT;
}
EXPORT_SYMBOL(cl_index);

When offset is negative, bit shift division is invalid. Oops.

This is corrected by making the function in question (cl_index) accept its argument as unsigned. This gets the bits twiddled correctly, but it's nasty, because offset in vvp_io_fault_start is still loff_t... and loff_t is still signed.

I don't have a larger solution, because loff_t being signed is something we get from the kernel. It seems wrong, but I'm not sure what to do about it.

I'll post my little patch and the powers that be can ponder the larger situation.



 Comments   
Comment by Gerrit Updater [ 17/Jun/16 ]

Patrick Farrell (paf@cray.com) uploaded a new patch: http://review.whamcloud.com/20861
Subject: LU-8299 clio: correct arithmetic in cl_index
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b72725d2325072b9beb2a9e5a48ed8843dacf9b5

Comment by Patrick Farrell (Inactive) [ 17/Jun/16 ]

Meant to include this code from vvp_io_fault_start in my original posting:

        /* offset of the last byte on the page */
        offset = cl_offset(obj, fio->ft_index + 1) - 1;
        LASSERT(cl_index(obj, offset) == fio->ft_index);

Note that the specific error here is that for our value, cl_index doesn't actually do the expected "divide by page size", so cl_index(obj, offset) has the same value as "offset".

Comment by Andreas Dilger [ 17/Jun/16 ]

My understanding is that the kernel allows negative file offsets for certain special cases (e.g. accessing kernel memory via /dev/kmem on some CPU architectures), but this should be blocked when accessing filesystems. Negative offsets are definitely blocked in llseek_execute(), but I'm not sure if we are missing some check at a high-level routine for mmap() that should do the same.

Comment by Cory Spitz [ 18/Jan/19 ]

bobijam and pfarrell what should be done with https://review.whamcloud.com/#/c/20861 ? Patrick contributed that a while back and it was last touched six months ago. It has some negative feedback. Who should address that and finish the work up?

Comment by Gerrit Updater [ 13/Feb/19 ]

Alexander Zarochentsev (c17826@cray.com) uploaded a new patch: https://review.whamcloud.com/34242
Subject: LU-8299 llite: ll_fault should fail for insane file offsets
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f49e3d20b7175f7686aff0a6e690468a5a7c99dd

Comment by Alexander Zarochentsev [ 13/Feb/19 ]

My understanding is that the kernel allows negative file offsets for certain special cases (e.g. accessing kernel memory via /dev/kmem on some CPU architectures), but this should be blocked when accessing filesystems.

yes, absolute file offset should not be negative. filemap_fault() has the following code :

int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
        int error;
        struct file *file = vma->vm_file;
        struct address_space *mapping = file->f_mapping;
        struct file_ra_state *ra = &file->f_ra;
        struct inode *inode = mapping->host;
        pgoff_t offset = vmf->pgoff;
        struct page *page;
        pgoff_t size;
        int ret = 0;

        size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
        if (offset >= size)
                return VM_FAULT_SIGBUS;

Lustre tries to handle such a large offset usual way and has all those problems with conversion page offset to file offset and back. I submitted a small patch to return an error early in ll_fault() for obviously insane page indexes https://review.whamcloud.com/34242 .

Comment by Alexander Zarochentsev [ 13/Feb/19 ]

running the reproducer with the fix from https://review.whamcloud.com/34242 :

[root@devvm1 tests]# ~zam/work/tasks/LUS-1392/mmap_test /mnt/lustre/bar 8589934500 8
 command line: /mnt/lustre/bar 8589934500 8
 file size 8589934592, offset -92, length 8589934684
write: Bad address
[root@devvm1 tests]# 
 
Comment by Patrick Farrell (Inactive) [ 13/Feb/19 ]

spitzcor I think the proper approach is to take Zam's patch and abandon the other one - It's a bit of a pain to get right, and with Zam's patch, it's irrelevant.  (I see you suggested that in Gerrit.)

Comment by James A Simmons [ 13/Feb/19 ]

Since we have a reproducer could we add that to the sanity test

Comment by Alexander Zarochentsev [ 13/Feb/19 ]

> Since we have a reproducer could we add that to the sanity test
yes. just uploaded new revision with mmap_sanity/tst9. below is how it fails w/o the fix
(machine hanged, I had to reboot it):

[root@devvm1 tests]# ./mmap_sanity -d /mnt/lustre
mmap test1: basic mmap operation (PASS, 0.022238s)
mmap test2: MAP_PRIVATE not write back (PASS, 0.006258s)
mmap test3: concurrent mmap ops on two nodes (SKIPPED, 0s)
mmap test4: c1 write to f1 from mmapped f2, c2 write to f1 from mmapped f1 (SKIPPED, 0s)
fwrite failed for '/sys/fs/lustre/ldlm/namespaces/lustre-OST0000-osc-MDT0000/lru_size': Bad file descriptor
mmap test5: read/write file to/from the buffer which mmapped to just this file (PASS, 0.008157s)
mmap test6: check mmap write/read content on two nodes (SKIPPED, 0s)
mmap test7: file i/o with an unmapped buffer (PASS, 0.007631s)
mmap test8: SIGBUS for beyond file size (PASS, 0.001838s)
packet_write_wait: Connection to 192.168.56.101 port 22: Broken pipe
C02T94HYGTFM:lustre-wc-rel c17826$ 
Comment by James A Simmons [ 13/Feb/19 ]

awesome!!!

Comment by Gerrit Updater [ 27/Feb/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34242/
Subject: LU-8299 llite: ll_fault should fail for insane file offsets
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ada3b33b52cdc6420e3b186c0aba4dd0b0337d8c

Comment by Peter Jones [ 27/Feb/19 ]

Landed for 2.13

Generated at Sat Feb 10 02:16:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.