Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8299

(vvp_io.c:922:vvp_io_fault_start()) ASSERTION( cl_index(obj, offset) == fio->ft_index )

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Full stack of the panic:

      <0>LustreError: 15833:0:(vvp_io.c:1233:vvp_io_fault_start()) ASSERTION( cl_index(obj, offset) == fio->ft_index ) failed:
      <0>LustreError: 15833:0:(vvp_io.c:1233:vvp_io_fault_start()) LBUG
      <4>Pid: 15833, comm: a.out
      <4>
      <4>Call Trace:
      <4> [<ffffffffa121c895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4> [<ffffffffa121ce97>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4> [<ffffffffa192b6fb>] vvp_io_fault_start+0x9ab/0xc70 [lustre]
      <4> [<ffffffffa122b121>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      <4> [<ffffffffa1225778>] ? libcfs_log_return+0x28/0x40 [libcfs]
      <4> [<ffffffffa135560a>] cl_io_start+0x6a/0x140 [obdclass]
      <4> [<ffffffffa1359574>] cl_io_loop+0xb4/0x1b0 [obdclass]
      <4> [<ffffffffa1909cee>] ll_fault+0x2de/0x510 [lustre]
      <4> [<ffffffff8114a234>] __do_fault+0x54/0x530
      <4> [<ffffffff8114a807>] handle_pte_fault+0xf7/0xb00
      <4> [<ffffffff8117b2a9>] ? __mem_cgroup_try_charge+0x19/0x5d0
      <4> [<ffffffffa121c27b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs]
      <4> [<ffffffff8111f93e>] ? find_get_page+0x1e/0xa0
      <4> [<ffffffff8152a0b9>] ? _cond_resched+0x9/0x40
      <4> [<ffffffff81120e23>] ? filemap_fault+0xd3/0x500
      <4> [<ffffffff8114b43a>] handle_mm_fault+0x22a/0x300
      <4> [<ffffffff8104a8d8>] __do_page_fault+0x138/0x480
      <4> [<ffffffff812307e4>] ? inode_has_perm+0x54/0xa0
      <4> [<ffffffff8122ea61>] ? avc_has_perm+0x71/0x90
      <4> [<ffffffff81334bbd>] ? tty_wakeup+0x3d/0x80
      <4> [<ffffffff8152f23e>] do_page_fault+0x3e/0xa0
      <4> [<ffffffff8152c5f5>] page_fault+0x25/0x30
      <4> [<ffffffff8128e6ef>] ? copy_user_generic_unrolled+0x2f/0xb0
      <4> [<ffffffff8133494f>] ? tty_write+0x1ff/0x2a0
      <4> [<ffffffff81337850>] ? n_tty_write+0x0/0x460
      <4> [<ffffffff81189278>] vfs_write+0xb8/0x1a0
      <4> [<ffffffff81189c41>] sys_write+0x51/0x90
      <4> [<ffffffff810e202e>] ? __audit_syscall_exit+0x25e/0x290
      <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      This is triggered by a trivial mmap test code which accidentally passes in an argument >2^63. I'll attach code for the test.

      Running it like this:

      truncate -s 8589934592 file
      ./mmap_test file 8589934500 8
      

      Will cause that panic. The test is incorrect, but it still shouldn't panic nodes.

      In vvp_io_fault_start, we calculate the index with cl_index. This treats its argument (offset) as signed... But offset (which is the same as ft_index here, which is that argument from userspace) is greater than 2^63, so it's negative, so this doesn't work.

      /**
       * Converts a page index into a byte offset within object \a obj.
       */
      pgoff_t cl_index(const struct cl_object *obj, loff_t offset)
      {
              return offset >> PAGE_CACHE_SHIFT;
      }
      EXPORT_SYMBOL(cl_index);
      

      When offset is negative, bit shift division is invalid. Oops.

      This is corrected by making the function in question (cl_index) accept its argument as unsigned. This gets the bits twiddled correctly, but it's nasty, because offset in vvp_io_fault_start is still loff_t... and loff_t is still signed.

      I don't have a larger solution, because loff_t being signed is something we get from the kernel. It seems wrong, but I'm not sure what to do about it.

      I'll post my little patch and the powers that be can ponder the larger situation.

      Attachments

        Activity

          [LU-8299] (vvp_io.c:922:vvp_io_fault_start()) ASSERTION( cl_index(obj, offset) == fio->ft_index )
          pjones Peter Jones added a comment -

          Landed for 2.13

          pjones Peter Jones added a comment - Landed for 2.13

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34242/
          Subject: LU-8299 llite: ll_fault should fail for insane file offsets
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: ada3b33b52cdc6420e3b186c0aba4dd0b0337d8c

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34242/ Subject: LU-8299 llite: ll_fault should fail for insane file offsets Project: fs/lustre-release Branch: master Current Patch Set: Commit: ada3b33b52cdc6420e3b186c0aba4dd0b0337d8c

          awesome!!!

          simmonsja James A Simmons added a comment - awesome!!!

          > Since we have a reproducer could we add that to the sanity test
          yes. just uploaded new revision with mmap_sanity/tst9. below is how it fails w/o the fix
          (machine hanged, I had to reboot it):

          [root@devvm1 tests]# ./mmap_sanity -d /mnt/lustre
          mmap test1: basic mmap operation (PASS, 0.022238s)
          mmap test2: MAP_PRIVATE not write back (PASS, 0.006258s)
          mmap test3: concurrent mmap ops on two nodes (SKIPPED, 0s)
          mmap test4: c1 write to f1 from mmapped f2, c2 write to f1 from mmapped f1 (SKIPPED, 0s)
          fwrite failed for '/sys/fs/lustre/ldlm/namespaces/lustre-OST0000-osc-MDT0000/lru_size': Bad file descriptor
          mmap test5: read/write file to/from the buffer which mmapped to just this file (PASS, 0.008157s)
          mmap test6: check mmap write/read content on two nodes (SKIPPED, 0s)
          mmap test7: file i/o with an unmapped buffer (PASS, 0.007631s)
          mmap test8: SIGBUS for beyond file size (PASS, 0.001838s)
          packet_write_wait: Connection to 192.168.56.101 port 22: Broken pipe
          C02T94HYGTFM:lustre-wc-rel c17826$ 
          
          zam Alexander Zarochentsev added a comment - > Since we have a reproducer could we add that to the sanity test yes. just uploaded new revision with mmap_sanity/tst9. below is how it fails w/o the fix (machine hanged, I had to reboot it): [root@devvm1 tests]# ./mmap_sanity -d /mnt/lustre mmap test1: basic mmap operation (PASS, 0.022238s) mmap test2: MAP_PRIVATE not write back (PASS, 0.006258s) mmap test3: concurrent mmap ops on two nodes (SKIPPED, 0s) mmap test4: c1 write to f1 from mmapped f2, c2 write to f1 from mmapped f1 (SKIPPED, 0s) fwrite failed for '/sys/fs/lustre/ldlm/namespaces/lustre-OST0000-osc-MDT0000/lru_size': Bad file descriptor mmap test5: read/write file to/from the buffer which mmapped to just this file (PASS, 0.008157s) mmap test6: check mmap write/read content on two nodes (SKIPPED, 0s) mmap test7: file i/o with an unmapped buffer (PASS, 0.007631s) mmap test8: SIGBUS for beyond file size (PASS, 0.001838s) packet_write_wait: Connection to 192.168.56.101 port 22: Broken pipe C02T94HYGTFM:lustre-wc-rel c17826$

          Since we have a reproducer could we add that to the sanity test

          simmonsja James A Simmons added a comment - Since we have a reproducer could we add that to the sanity test

          spitzcor I think the proper approach is to take Zam's patch and abandon the other one - It's a bit of a pain to get right, and with Zam's patch, it's irrelevant.  (I see you suggested that in Gerrit.)

          pfarrell Patrick Farrell (Inactive) added a comment - - edited spitzcor I think the proper approach is to take Zam's patch and abandon the other one - It's a bit of a pain to get right, and with Zam's patch, it's irrelevant.  (I see you suggested that in Gerrit.)
          zam Alexander Zarochentsev added a comment - - edited

          running the reproducer with the fix from https://review.whamcloud.com/34242 :

          [root@devvm1 tests]# ~zam/work/tasks/LUS-1392/mmap_test /mnt/lustre/bar 8589934500 8
           command line: /mnt/lustre/bar 8589934500 8
           file size 8589934592, offset -92, length 8589934684
          write: Bad address
          [root@devvm1 tests]# 
           
          zam Alexander Zarochentsev added a comment - - edited running the reproducer with the fix from  https://review.whamcloud.com/34242 : [root@devvm1 tests]# ~zam/work/tasks/LUS-1392/mmap_test /mnt/lustre/bar 8589934500 8 command line: /mnt/lustre/bar 8589934500 8 file size 8589934592, offset -92, length 8589934684 write: Bad address [root@devvm1 tests]#

          My understanding is that the kernel allows negative file offsets for certain special cases (e.g. accessing kernel memory via /dev/kmem on some CPU architectures), but this should be blocked when accessing filesystems.

          yes, absolute file offset should not be negative. filemap_fault() has the following code :

          int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
          {
                  int error;
                  struct file *file = vma->vm_file;
                  struct address_space *mapping = file->f_mapping;
                  struct file_ra_state *ra = &file->f_ra;
                  struct inode *inode = mapping->host;
                  pgoff_t offset = vmf->pgoff;
                  struct page *page;
                  pgoff_t size;
                  int ret = 0;
          
                  size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
                  if (offset >= size)
                          return VM_FAULT_SIGBUS;
          
          

          Lustre tries to handle such a large offset usual way and has all those problems with conversion page offset to file offset and back. I submitted a small patch to return an error early in ll_fault() for obviously insane page indexes https://review.whamcloud.com/34242 .

          zam Alexander Zarochentsev added a comment - My understanding is that the kernel allows negative file offsets for certain special cases (e.g. accessing kernel memory via /dev/kmem on some CPU architectures), but this should be blocked when accessing filesystems. yes, absolute file offset should not be negative. filemap_fault() has the following code : int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf) { int error; struct file *file = vma->vm_file; struct address_space *mapping = file->f_mapping; struct file_ra_state *ra = &file->f_ra; struct inode *inode = mapping->host; pgoff_t offset = vmf->pgoff; struct page *page; pgoff_t size; int ret = 0; size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; if (offset >= size) return VM_FAULT_SIGBUS; Lustre tries to handle such a large offset usual way and has all those problems with conversion page offset to file offset and back. I submitted a small patch to return an error early in ll_fault() for obviously insane page indexes https://review.whamcloud.com/34242 .

          Alexander Zarochentsev (c17826@cray.com) uploaded a new patch: https://review.whamcloud.com/34242
          Subject: LU-8299 llite: ll_fault should fail for insane file offsets
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: f49e3d20b7175f7686aff0a6e690468a5a7c99dd

          gerrit Gerrit Updater added a comment - Alexander Zarochentsev (c17826@cray.com) uploaded a new patch: https://review.whamcloud.com/34242 Subject: LU-8299 llite: ll_fault should fail for insane file offsets Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f49e3d20b7175f7686aff0a6e690468a5a7c99dd
          spitzcor Cory Spitz added a comment -

          bobijam and pfarrell what should be done with https://review.whamcloud.com/#/c/20861 ? Patrick contributed that a while back and it was last touched six months ago. It has some negative feedback. Who should address that and finish the work up?

          spitzcor Cory Spitz added a comment - bobijam and  pfarrell what should be done with  https://review.whamcloud.com/#/c/20861 ? Patrick contributed that a while back and it was last touched six months ago. It has some negative feedback. Who should address that and finish the work up?

          People

            zam Alexander Zarochentsev
            paf Patrick Farrell (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: