[LU-17448] LBUG in racer with layout change Created: 19/Jan/24 Updated: 28/Jan/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.16.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Hit a crash with a new racer rename test in v4 of patch https://review.whamcloud.com/47643 "LU-15913 tests: add rename stress test via racer" that was still using complex file layouts (I've removed most complex layouts in v5 to focus on actual rename testing for LU-17426). It looks like an ldiskfs credits issue with the large file layout (note xattr_set: 2/103/0), as I don't see any crashes with ZFS. Lustre: lod_parse_striping()) lustre-MDT0000-mdtlov: EXTENSION flags=40 set on component[2]=1 of non-SEL file [0x200000402:0x2402:0x0] with magic=0xbd60bd0 [few seconds pass, but included to show SEL is definitely in use] Lustre: osd_trans_exec_check()) lustre-MDT0000: opcode 7: used 14, used now 2, reserved 12 Lustre: osd_trans_dump_creds()) create: 0/0/0, destroy: 0/0/0 Lustre: osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 2/103/0 Lustre: osd_trans_dump_creds()) write: 3/12/14, punch: 0/0/0, quota 1/3/2 Lustre: osd_trans_dump_creds()) insert: 0/0/0, delete: 0/0/0 Lustre: osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 0/0/0 LustreError: osd_trans_exec_check()) LBUG Pid: 14124, comm: mdt00_019 3.10.0-7.9-debug #1 SMP Sat Mar 26 23:28:42 EDT 2022 Call Trace: libcfs_call_trace+0x90/0xf0 [libcfs] lbug_with_loc+0x4c/0xa0 [libcfs] osd_write+0x929/0xcc0 [osd_ldiskfs] dt_record_write+0x33/0x130 [obdclass] llog_osd_write_rec+0xfdb/0x1c80 [obdclass] llog_write_rec+0x290/0x590 [obdclass] llog_cat_add_rec+0x1d9/0xa50 [obdclass] llog_add+0x17f/0x1f0 [obdclass] osp_sync_add+0x1f9/0x760 [osp] osp_attr_set+0x3cd/0x680 [osp] lod_sub_attr_set+0x1d7/0x500 [lod] lod_obj_stripe_attr_set_cb+0x40/0x100 [lod] lod_obj_for_each_stripe+0x12d/0x310 [lod] lod_attr_set+0x51a/0xb60 [lod] lod_layout_change+0x367/0x3f0 [lod] dt_layout_change+0x20/0xc0 [mdd] mdd_layout_change+0x1712/0x1db0 [mdd] mdt_layout_change+0x2bf/0x450 [mdt] mdt_intent_layout+0x910/0xeb0 [mdt] mdt_intent_opc+0x1c8/0xc50 [mdt] mdt_intent_policy+0xfa/0x460 [mdt] ldlm_lock_enqueue+0x3b1/0xbb0 [ptlrpc] ldlm_handle_enqueue+0x359/0x17c0 [ptlrpc] tgt_enqueue+0x68/0x240 [ptlrpc] tgt_request_handle+0x74e/0x19d0 [ptlrpc] ptlrpc_server_handle_request+0x26e/0xcf0 [ptlrpc] ptlrpc_main+0xc37/0x16d0 [ptlrpc] |
| Comments |
| Comment by Andreas Dilger [ 22/Jan/24 ] |
|
Hit this again on racer test_1, so it isn't just related to the new rename test_2 I added. It also has to do with a layout change: |
| Comment by Alex Zhuravlev [ 28/Jan/24 ] |
|
also, sometimes I see very strange "deadlock": schedule,rwsem_down_write_slowpath,do_rmdir PIDs(2): "rm":41084 "rm":41766 schedule,schedule_preempt_disabled,__mutex_lock,lock_rename,do_renameat2,__x64_sys_rename PIDs(7): "mrename":41114 "mrename":41184 "mrename":41248 "mrename":41291 "mrename":41406 "mrename":41519 "mrename":41556 schedule,schedule_preempt_disabled,rwsem_down_read_slowpath,down_read,lookup_slow,walk_component,path_lookupat,filename_lookup,do_readlinkat,__x64_sys_readlink PIDs(1): "lfs":41134 schedule,rwsem_down_write_slowpath,vfs_rename,do_renameat2,__x64_sys_rename PIDs(1): "mrename":41171 schedule,rwsem_down_write_slowpath,lock_rename,do_renameat2,__x64_sys_rename PIDs(4): "mrename":41177 "mrename":41192 "mrename":41198 "mrename":41202 schedule,schedule_preempt_disabled,rwsem_down_read_slowpath,down_read,path_openat,do_filp_open,do_sys_openat2,do_sys_open PIDs(2): "lfs":41186 "lfs":41213 schedule,schedule_preempt_disabled,rwsem_down_read_slowpath,down_read,lookup_slow,walk_component,path_lookupat,filename_lookup,vfs_statx,__se_sys_newlstat PIDs(2): "lfs":41895 "lfs":47609 schedule,schedule_preempt_disabled,rwsem_down_read_slowpath,down_read,lookup_slow,walk_component,path_lookupat,filename_lookup,vfs_statx,__se_sys_newfstatat PIDs(1): "rm":42331 schedule,schedule_preempt_disabled,rwsem_down_read_slowpath,down_read,lookup_slow,walk_component,path_lookupat,filename_lookup,vfs_statx,__se_sys_statx PIDs(1): "lfs":43870 schedule,rwsem_down_write_slowpath,filename_create,do_mkdirat PIDs(3): "mkdir":44262 "mkdir":44273 "mkdir":47557 |
| Comment by Andreas Dilger [ 28/Jan/24 ] |
|
Yes, I've seen this several times in my racer rename test: |