[LU-8273] lov_sub_get()) ASSERTION( stripe < lio->lis_stripe_count ) failed Created: 14/Jun/16 Updated: 24/Jul/16 Resolved: 24/Jul/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Oleg Drokin | Assignee: | Zhenyu Xu |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
With landing of [183262.685161] Lustre: DEBUG MARKER: centos6-9.localnet: == sanity test 405: Various layout swap lock tests =================================================== 07:46:13 (1465904773) [183275.828024] LustreError: 8225:0:(lov_io.c:238:lov_sub_get()) ASSERTION( stripe < lio->lis_stripe_count ) failed: [183275.829228] LustreError: 8225:0:(lov_io.c:238:lov_sub_get()) LBUG [183275.829934] Pid: 8225, comm: swap_lock_test [183275.830665] Call Trace: [183275.831909] [<ffffffffa01a97b3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] [183275.832577] [<ffffffffa01a9d55>] lbug_with_loc+0x45/0xc0 [libcfs] [183275.833165] [<ffffffffa08b1e75>] lov_sub_get+0x4e5/0x650 [lov] [183275.833738] [<ffffffffa08b492d>] lov_sublock_env_get.isra.4+0xbd/0x100 [lov] [183275.835312] [<ffffffffa08b5392>] lov_lock_sub_init+0x2c2/0x9f0 [lov] [183275.835905] [<ffffffffa08b5af7>] lov_lock_init_raid0+0x37/0xf0 [lov] [183275.836493] [<ffffffffa08c172f>] lov_lock_init+0x1f/0x60 [lov] [183275.837086] [<ffffffffa0349a6f>] cl_lock_init+0x8f/0x190 [obdclass] [183275.837711] [<ffffffffa034bcd8>] ? cl_io_init0.isra.15+0x88/0x160 [obdclass] [183275.838778] [<ffffffffa0349bb5>] cl_lock_request+0x45/0x1f0 [obdclass] [183275.839389] [<ffffffffa0f29f79>] cl_get_grouplock+0x189/0x310 [lustre] [183275.839977] [<ffffffffa0ee0a69>] ll_get_grouplock+0x179/0x530 [lustre] [183275.840599] [<ffffffffa0eefb8d>] ll_file_ioctl+0x372d/0x38f0 [lustre] [183275.841183] [<ffffffff81202775>] do_vfs_ioctl+0x305/0x520 [183275.841748] [<ffffffff810b0c71>] ? finish_task_switch+0x81/0x180 [183275.842316] [<ffffffff810b0c34>] ? finish_task_switch+0x44/0x180 [183275.842888] [<ffffffff81202a31>] SyS_ioctl+0xa1/0xc0 [183275.843525] [<ffffffff81711809>] system_call_fastpath+0x16/0x1b [183275.844102] [183275.845572] Kernel panic - not syncing: LBUG Crashdump and modules are in /exports/crash/192.168.10.219-2016-06-14-07:46:34 |
| Comments |
| Comment by Peter Jones [ 14/Jun/16 ] |
|
Bobijam This seems like a rare issue to hit but are you able to see how to address it? Peter |
| Comment by Zhenyu Xu [ 15/Jun/16 ] |
|
Hi Oleg, On which node does /exports/crash/ locates? |
| Comment by Oleg Drokin [ 15/Jun/16 ] |
|
it's my private node. |
| Comment by Zhenyu Xu [ 15/Jun/16 ] |
|
Somehow the io is an lov_empty_io, while the lov_object is a raid0 object. crash> struct lov_io ffff880021f77e68
struct lov_io {
lis_cl = {
cis_io = 0xffff8800aae81eb8,
cis_obj = 0xffff880070e71e58,
cis_iop = 0xffffffffa08d07a0 <lov_empty_io_ops>,
cis_linkage = {
next = 0xffff8800aae81ed0,
prev = 0xffff8800438a6f20
}
},
lis_object = 0xffff880070e71e58,
lis_io_endpos = 0,
lis_pos = 0,
lis_endpos = 0,
lis_mem_frozen = 0,
lis_stripe_count = 0,
lis_active_subios = 0,
lis_single_subio_index = 0,
lis_single_subio = {
ci_type = CIT_READ,
ci_state = CIS_ZERO,
ci_obj = 0x0,
ci_parent = 0x0,
ci_layers = {
next = 0x0,
prev = 0x0
},
ci_lockset = {
cls_todo = {
next = 0x0,
prev = 0x0
},
cls_done = {
next = 0x0,
prev = 0x0
}
},
ci_lockreq = CILR_MANDATORY,
u = {
ci_rd = {
rd = {
crw_pos = 0,
crw_count = 0,
crw_nonblock = 0
}
},
ci_wr = {
wr = {
crw_pos = 0,
crw_count = 0,
crw_nonblock = 0
},
wr_append = 0,
wr_sync = 0
},
ci_rw = {
crw_pos = 0,
crw_count = 0,
crw_nonblock = 0
},
ci_setattr = {
sa_attr = {
lvb_size = 0,
lvb_mtime = 0,
lvb_atime = 0,
lvb_ctime = 0,
lvb_blocks = 0,
lvb_mtime_ns = 0,
lvb_atime_ns = 0,
lvb_ctime_ns = 0,
lvb_padding = 0
},
sa_attr_flags = 0,
sa_valid = 0,
sa_stripe_index = 0,
sa_parent_fid = 0x0
},
ci_data_version = {
dv_data_version = 0,
dv_flags = 0
},
ci_fault = {
ft_index = 0,
ft_nob = 0,
ft_writable = 0,
ft_executable = 0,
ft_mkwrite = 0,
ft_page = 0x0
},
ci_fsync = {
fi_start = 0,
fi_end = 0,
fi_fid = 0x0,
fi_mode = CL_FSYNC_NONE,
fi_nr_written = 0
},
ci_ladvise = {
li_start = 0,
li_end = 0,
li_fid = 0x0,
li_advice = LU_LADVISE_INVALID,
li_flags = 0
}
},
ci_queue = {
...
},
ci_nob = 0,
ci_result = 0,
ci_continue = 0,
ci_no_srvlock = 0,
ci_need_restart = 0,
ci_ignore_layout = 0,
ci_verify_layout = 0,
ci_restore_needed = 0,
ci_noatime = 0,
ci_owned_nr = 0
},
lis_nr_subios = 0,
lis_subs = 0x0,
lis_active = {
next = 0x0,
prev = 0x0
}
}
crash> struct lov_object 0xffff880070e71e58
struct lov_object {
lo_cl = {
co_lu = {
lo_header = 0xffff88001e6e3f08,
lo_dev = 0xffff880015a28f00,
lo_ops = 0xffffffffa08d1320 <lov_lu_obj_ops>,
lo_linkage = {
next = 0xffff88001e6e3f48,
prev = 0xffff88001e6e3fb8
},
lo_dev_ref = {<No data fields>}
},
co_ops = 0xffffffffa08d1360 <lov_ops>,
co_slice_off = 144
},
...
lo_type = LLT_RAID0,
lo_layout_invalid = false,
lo_active_ios = {
counter = 1
},
...
lo_lsm = 0xffff880089d9b1c0,
...
}
|
| Comment by Zhenyu Xu [ 15/Jun/16 ] |
|
Hi Jinshan, cl_get_grouplock() is a layout ignorance IO (io->ci_ignore_layout = 1), and in IO initialization |
| Comment by Jinshan Xiong (Inactive) [ 15/Jun/16 ] |
|
I can't think of a reason why group lock requires ci_ignore_layout but this could be due to deadlock. Can you please check git history to see if there is a commit related and if not, just try to clear ci_ignore_layout and see how it goes? Actually this is a reproduction of |
| Comment by Oleg Drokin [ 16/Jun/16 ] |
|
Ok I landed |
| Comment by Peter Jones [ 24/Jul/16 ] |
|
As this is rare and suspected to be fixed I will mark it as a duplicate of |