[LU-3732] osd_io.c:320:osd_do_bio()) ASSERTION( iobuf->dr_rw == 0 ) failed: page_idx 4, block_idx 4, i 0 - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.5.0
Labels:
- osd-ldiskfs
- trinity
Environment:
Using current master 2.4.53-22-g295968f on CentOS 6.4 2.6.32-358.11.1.el6.lustre.x86_64.

Severity:
3
Rank (Obsolete):
9631

Description

I don't have a simple reproducer but running trinity on a Lustre client mount will trigger this easily. I even turned off the weird and dangerous non-filesystem related stuff and I still see it.

LustreError: 3395:0:(osd_io.c:320:osd_do_bio()) ASSERTION( iobuf->dr_rw == 0 ) failed: page_idx 4, block_idx 4, i 0
LustreError: 3395:0:(osd_io.c:320:osd_do_bio()) LBUG
Pid: 3395, comm: ll_ost_io01_001

Call Trace:
 [<ffffffffa04ec895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa04ece97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0c7b228>] osd_do_bio+0x7f8/0x800 [osd_ldiskfs]
 [<ffffffffa0bf70bb>] ? __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
 [<ffffffffa0c2c348>] ? __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs]
 [<ffffffffa0c7dbb8>] osd_write_commit+0x328/0x610 [osd_ldiskfs]
 [<ffffffffa0e7ac84>] ofd_commitrw_write+0x684/0x11b0 [ofd]
 [<ffffffffa0e7d9ed>] ofd_commitrw+0x5cd/0xbb0 [ofd]
 [<ffffffffa06397e5>] ? lprocfs_counter_add+0x125/0x182 [lvfs]
 [<ffffffffa0dbe1e8>] obd_commitrw+0x128/0x3d0 [ost]
 [<ffffffffa0dc82d1>] ost_brw_write+0xea1/0x15d0 [ost]
 [<ffffffff81282b36>] ? vsnprintf+0x336/0x5e0
 [<ffffffffa07e2310>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
 [<ffffffffa0dce75e>] ost_handle+0x3a8e/0x4030 [ost]
 [<ffffffffa04f8d64>] ? libcfs_id2str+0x74/0xb0 [libcfs]
 [<ffffffffa0832598>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
 [<ffffffffa04ed54e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa04fea6f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
 [<ffffffffa08299a9>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
 [<ffffffff81055ab3>] ? __wake_up+0x53/0x70
 [<ffffffffa083391d>] ptlrpc_main+0xabd/0x1700 [ptlrpc]
 [<ffffffffa0832e60>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff81096936>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff810968a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

Attachments

Issue Links

duplicates

LU-6489 osd-ldiskfs checks s_maxbytes limits incorrectly

Resolved

Activity

[LU-3732] osd_io.c:320:osd_do_bio()) ASSERTION( iobuf->dr_rw == 0 ) failed: page_idx 4, block_idx 4, i 0

Henri Doreau (Inactive) added a comment - 11/Feb/14 4:03 PM - edited

I stumbled upon this crash as well. Offset 0x7ffffffff000 does trigger it, but like for you 0x800000000000 works fine. It seems that ldiskfs_ext_new_extent_cb isn't even called when the crash occurs, leading to iobuf->dr_blocks containing only zeroes. I have extensively traced it but am unsure how to fix it best.

Henri Doreau (Inactive) added a comment - 11/Feb/14 4:03 PM - edited I stumbled upon this crash as well. Offset 0x7ffffffff000 does trigger it, but like for you 0x800000000000 works fine. It seems that ldiskfs_ext_new_extent_cb isn't even called when the crash occurs, leading to iobuf->dr_blocks containing only zeroes. I have extensively traced it but am unsure how to fix it best.

John Hammond added a comment - 15/Aug/13 2:46 PM

OK but there may be more than one supported range. Using an offset of 0x7ffffffff000 or 0x800000000000 is fine. However 0x7ffffffff001 triggers the same assertion.

John Hammond added a comment - 15/Aug/13 2:46 PM OK but there may be more than one supported range. Using an offset of 0x7ffffffff000 or 0x800000000000 is fine. However 0x7ffffffff001 triggers the same assertion.

Alex Zhuravlev added a comment - 15/Aug/13 4:27 AM

check my math please:

(gdb) p (0x7fffffffffffULL / 4096) >> 32
$5 = 7

while with ldiskfs:

/*

Maximum number of logical blocks in a file; ldiskfs_extent's ee_block is
__le32.
*/
#define EXT_MAX_BLOCKS 0xffffffff

I guess someone (ldiskfs or fsfilt) should be checking the offset is in supported range.

Alex Zhuravlev added a comment - 15/Aug/13 4:27 AM check my math please: (gdb) p (0x7fffffffffffULL / 4096) >> 32 $5 = 7 while with ldiskfs: /* Maximum number of logical blocks in a file; ldiskfs_extent's ee_block is __le32. */ #define EXT_MAX_BLOCKS 0xffffffff I guess someone (ldiskfs or fsfilt) should be checking the offset is in supported range.

John Hammond added a comment - 14/Aug/13 7:38 PM

Seems like an off-by-one-ish kind of error. Here is a simplified reproducer:

buf = malloc(4096);
fd = open("/mnt/lustre/Gena", O_WRONLY|O_CREAT);
pwrite(fd, buf, 4096, 0x7fffffffffff);

John Hammond added a comment - 14/Aug/13 7:38 PM Seems like an off-by-one-ish kind of error. Here is a simplified reproducer: buf = malloc(4096); fd = open("/mnt/lustre/Gena", O_WRONLY|O_CREAT); pwrite(fd, buf, 4096, 0x7fffffffffff);

osd_io.c:320:osd_do_bio()) ASSERTION( iobuf->dr_rw == 0 ) failed: page_idx 4, block_idx 4, i 0

Details

Description

Attachments

Issue Links

Activity

People

Dates