Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3732

osd_io.c:320:osd_do_bio()) ASSERTION( iobuf->dr_rw == 0 ) failed: page_idx 4, block_idx 4, i 0

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.5.0
    • Using current master 2.4.53-22-g295968f on CentOS 6.4 2.6.32-358.11.1.el6.lustre.x86_64.
    • 3
    • 9631

    Description

      I don't have a simple reproducer but running trinity on a Lustre client mount will trigger this easily. I even turned off the weird and dangerous non-filesystem related stuff and I still see it.

      LustreError: 3395:0:(osd_io.c:320:osd_do_bio()) ASSERTION( iobuf->dr_rw == 0 ) failed: page_idx 4, block_idx 4, i 0
      LustreError: 3395:0:(osd_io.c:320:osd_do_bio()) LBUG
      Pid: 3395, comm: ll_ost_io01_001
      
      Call Trace:
       [<ffffffffa04ec895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa04ece97>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa0c7b228>] osd_do_bio+0x7f8/0x800 [osd_ldiskfs]
       [<ffffffffa0bf70bb>] ? __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
       [<ffffffffa0c2c348>] ? __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs]
       [<ffffffffa0c7dbb8>] osd_write_commit+0x328/0x610 [osd_ldiskfs]
       [<ffffffffa0e7ac84>] ofd_commitrw_write+0x684/0x11b0 [ofd]
       [<ffffffffa0e7d9ed>] ofd_commitrw+0x5cd/0xbb0 [ofd]
       [<ffffffffa06397e5>] ? lprocfs_counter_add+0x125/0x182 [lvfs]
       [<ffffffffa0dbe1e8>] obd_commitrw+0x128/0x3d0 [ost]
       [<ffffffffa0dc82d1>] ost_brw_write+0xea1/0x15d0 [ost]
       [<ffffffff81282b36>] ? vsnprintf+0x336/0x5e0
       [<ffffffffa07e2310>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
       [<ffffffffa0dce75e>] ost_handle+0x3a8e/0x4030 [ost]
       [<ffffffffa04f8d64>] ? libcfs_id2str+0x74/0xb0 [libcfs]
       [<ffffffffa0832598>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
       [<ffffffffa04ed54e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
       [<ffffffffa04fea6f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
       [<ffffffffa08299a9>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
       [<ffffffff81055ab3>] ? __wake_up+0x53/0x70
       [<ffffffffa083391d>] ptlrpc_main+0xabd/0x1700 [ptlrpc]
       [<ffffffffa0832e60>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
       [<ffffffff81096936>] kthread+0x96/0xa0
       [<ffffffff8100c0ca>] child_rip+0xa/0x20
       [<ffffffff810968a0>] ? kthread+0x0/0xa0
       [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      

      Attachments

        Issue Links

          Activity

            [LU-3732] osd_io.c:320:osd_do_bio()) ASSERTION( iobuf->dr_rw == 0 ) failed: page_idx 4, block_idx 4, i 0
            hdoreau Henri Doreau (Inactive) added a comment - - edited

            I stumbled upon this crash as well. Offset 0x7ffffffff000 does trigger it, but like for you 0x800000000000 works fine. It seems that ldiskfs_ext_new_extent_cb isn't even called when the crash occurs, leading to iobuf->dr_blocks containing only zeroes. I have extensively traced it but am unsure how to fix it best.

            hdoreau Henri Doreau (Inactive) added a comment - - edited I stumbled upon this crash as well. Offset 0x7ffffffff000 does trigger it, but like for you 0x800000000000 works fine. It seems that ldiskfs_ext_new_extent_cb isn't even called when the crash occurs, leading to iobuf->dr_blocks containing only zeroes. I have extensively traced it but am unsure how to fix it best.
            jhammond John Hammond added a comment -

            OK but there may be more than one supported range. Using an offset of 0x7ffffffff000 or 0x800000000000 is fine. However 0x7ffffffff001 triggers the same assertion.

            jhammond John Hammond added a comment - OK but there may be more than one supported range. Using an offset of 0x7ffffffff000 or 0x800000000000 is fine. However 0x7ffffffff001 triggers the same assertion.

            check my math please:

            (gdb) p (0x7fffffffffffULL / 4096) >> 32
            $5 = 7

            while with ldiskfs:

            /*

            • Maximum number of logical blocks in a file; ldiskfs_extent's ee_block is
            • __le32.
              */
              #define EXT_MAX_BLOCKS 0xffffffff

            I guess someone (ldiskfs or fsfilt) should be checking the offset is in supported range.

            bzzz Alex Zhuravlev added a comment - check my math please: (gdb) p (0x7fffffffffffULL / 4096) >> 32 $5 = 7 while with ldiskfs: /* Maximum number of logical blocks in a file; ldiskfs_extent's ee_block is __le32. */ #define EXT_MAX_BLOCKS 0xffffffff I guess someone (ldiskfs or fsfilt) should be checking the offset is in supported range.
            jhammond John Hammond added a comment -

            Seems like an off-by-one-ish kind of error. Here is a simplified reproducer:

            buf = malloc(4096);
            fd = open("/mnt/lustre/Gena", O_WRONLY|O_CREAT);
            pwrite(fd, buf, 4096, 0x7fffffffffff);
            
            jhammond John Hammond added a comment - Seems like an off-by-one-ish kind of error. Here is a simplified reproducer: buf = malloc(4096); fd = open("/mnt/lustre/Gena", O_WRONLY|O_CREAT); pwrite(fd, buf, 4096, 0x7fffffffffff);

            People

              wc-triage WC Triage
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: