[LU-3732] osd_io.c:320:osd_do_bio()) ASSERTION( iobuf->dr_rw == 0 ) failed: page_idx 4, block_idx 4, i 0 Created: 09/Aug/13 Updated: 11/May/15 Resolved: 11/May/15 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | John Hammond | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | osd-ldiskfs, trinity | ||
| Environment: |
Using current master 2.4.53-22-g295968f on CentOS 6.4 2.6.32-358.11.1.el6.lustre.x86_64. |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9631 | ||||||||
| Description |
|
I don't have a simple reproducer but running trinity on a Lustre client mount will trigger this easily. I even turned off the weird and dangerous non-filesystem related stuff and I still see it. LustreError: 3395:0:(osd_io.c:320:osd_do_bio()) ASSERTION( iobuf->dr_rw == 0 ) failed: page_idx 4, block_idx 4, i 0 LustreError: 3395:0:(osd_io.c:320:osd_do_bio()) LBUG Pid: 3395, comm: ll_ost_io01_001 Call Trace: [<ffffffffa04ec895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa04ece97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0c7b228>] osd_do_bio+0x7f8/0x800 [osd_ldiskfs] [<ffffffffa0bf70bb>] ? __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs] [<ffffffffa0c2c348>] ? __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs] [<ffffffffa0c7dbb8>] osd_write_commit+0x328/0x610 [osd_ldiskfs] [<ffffffffa0e7ac84>] ofd_commitrw_write+0x684/0x11b0 [ofd] [<ffffffffa0e7d9ed>] ofd_commitrw+0x5cd/0xbb0 [ofd] [<ffffffffa06397e5>] ? lprocfs_counter_add+0x125/0x182 [lvfs] [<ffffffffa0dbe1e8>] obd_commitrw+0x128/0x3d0 [ost] [<ffffffffa0dc82d1>] ost_brw_write+0xea1/0x15d0 [ost] [<ffffffff81282b36>] ? vsnprintf+0x336/0x5e0 [<ffffffffa07e2310>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc] [<ffffffffa0dce75e>] ost_handle+0x3a8e/0x4030 [ost] [<ffffffffa04f8d64>] ? libcfs_id2str+0x74/0xb0 [libcfs] [<ffffffffa0832598>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] [<ffffffffa04ed54e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa04fea6f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [<ffffffffa08299a9>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] [<ffffffff81055ab3>] ? __wake_up+0x53/0x70 [<ffffffffa083391d>] ptlrpc_main+0xabd/0x1700 [ptlrpc] [<ffffffffa0832e60>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffff81096936>] kthread+0x96/0xa0 [<ffffffff8100c0ca>] child_rip+0xa/0x20 [<ffffffff810968a0>] ? kthread+0x0/0xa0 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 |
| Comments |
| Comment by John Hammond [ 14/Aug/13 ] |
|
Seems like an off-by-one-ish kind of error. Here is a simplified reproducer: buf = malloc(4096);
fd = open("/mnt/lustre/Gena", O_WRONLY|O_CREAT);
pwrite(fd, buf, 4096, 0x7fffffffffff);
|
| Comment by Alex Zhuravlev [ 15/Aug/13 ] |
|
check my math please: (gdb) p (0x7fffffffffffULL / 4096) >> 32 while with ldiskfs: /*
I guess someone (ldiskfs or fsfilt) should be checking the offset is in supported range. |
| Comment by John Hammond [ 15/Aug/13 ] |
|
OK but there may be more than one supported range. |
| Comment by Henri Doreau (Inactive) [ 11/Feb/14 ] |
|
I stumbled upon this crash as well. Offset 0x7ffffffff000 does trigger it, but like for you 0x800000000000 works fine. It seems that ldiskfs_ext_new_extent_cb isn't even called when the crash occurs, leading to iobuf->dr_blocks containing only zeroes. I have extensively traced it but am unsure how to fix it best. |