[LU-14781] sanity-flr test 70: mirror create and split race crash Created: 23/Jun/21 Updated: 14/Jan/22 Resolved: 30/Nov/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Zhenyu Xu | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for wangshilong <wshilong@ddn.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/515ea0f6-6502-4f1c-96ad-2df998d95993 [ 6188.499816] Lustre: DEBUG MARKER: == sanity-flr test 70: mirror create and split race ================================================== 03:31:01 (1624246261) |
| Comments |
| Comment by Gerrit Updater [ 23/Jun/21 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/44055 |
| Comment by Alex Zhuravlev [ 31/Aug/21 ] |
|
on master https://testing.whamcloud.com/test_sessions/c74ca3d7-6927-41c3-abca-65bf5108eb71 |
| Comment by Zhenyu Xu [ 03/Sep/21 ] |
|
The stack trace of the Oops shows that osp_object_free encountered NULL pointer Call Trace: [ 6532.245084] [] lu_object_free.isra.30+0xf2/0x170 [obdclass] [ 6532.246905] [] lu_object_find_at+0x496/0x930 [obdclass] [ 6532.251132] [] lod_initialize_objects+0x3e4/0xba0 [lod] [ 6532.252240] [] lod_parse_striping+0x693/0xc20 [lod] [ 6532.253293] [] lod_striping_load+0x2b2/0x660 [lod] [ 6532.254347] [] lod_declare_destroy+0x12b/0x600 [lod] [ 6532.258781] [] mdd_declare_finish_unlink+0x91/0x210 [mdd] [ 6532.259911] [] mdd_unlink+0x48f/0xab0 [mdd] [ 6532.260957] [] mdt_reint_unlink+0xc32/0x1550 [mdt] [ 6532.263377] [] mdt_reint_rec+0x83/0x210 [mdt] [ 6532.264354] [] mdt_reint_internal+0x6e1/0xb00 [mdt] [ 6532.265410] [] mdt_reint+0x67/0x140 [mdt] [ 6532.266424] [] tgt_request_handle+0xaee/0x15f0 [ptlrpc] [ 6532.268678] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [ 6532.270797] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [ 6532.273053] [] kthread+0xd1/0xe0 [ 6532.274830] [] ret_from_fork_nospec_begin+0x21/0x21 [ 6532.276848] Code: 8d 7f 50 4d 8d af 20 01 00 00 49 be ff ff ff ff ff 1f 00 00 e8 75 e1 79 ff 48 89 df e8 6d 90 79 ff 49 8b 9f 20 01 00 00 4c 39 eb 8b 23 48 89 df 0f 85 ce 00 00 00 e9 9b 01 00 00 66 90 4c 01 [ 6532.281936] RIP [] osp_object_free+0x4d/0x490 [osp] [ 6532.283028] RSP [ 6532.283602] CR2: 0000000000000000 And (gdb) l *(osp_object_free+0x4d) 0x7b6d is in osp_object_free (/root/work/lustre/lustre/osp/osp_object.c:2296). 2289 struct lu_object_header *h = o->lo_header; 2290 struct osp_xattr_entry *oxe; 2291 struct osp_xattr_entry *tmp; 2292 int count; 2293 2294 dt_object_fini(&obj->opo_obj); 2295 lu_object_header_fini(h); 2296 list_for_each_entry_safe(oxe, tmp, &obj->opo_xattr_list, oxe_list) { 2297 list_del(&oxe->oxe_list); 2298 count = atomic_read(&oxe->oxe_ref); 2299 LASSERTF(count == 1, 2300 "Still has %d users on the xattr entry %.*s\n", So the object's lu_object_header is NULL? Which leads me to osp_object_alloc() 111 static struct lu_object *osp_object_alloc(const struct lu_env *env, 112 const struct lu_object_header *hdr, 113 struct lu_device *d) 114 { 115 struct lu_object_header *h = NULL; 116 struct osp_object *o; 117 struct lu_object *l; 118 119 OBD_SLAB_ALLOC_PTR_GFP(o, osp_object_kmem, GFP_NOFS); 120 if (o != NULL) { 121 l = &o->opo_obj.do_lu; 122 123 /* If hdr is NULL, it means the object is not built 124 * from the top dev(MDT/OST), usually it happens when 125 * building striped object, like data object on MDT or 126 * striped object for directory */ 127 if (hdr == NULL) { 128 h = &o->opo_header; 129 lu_object_header_init(h); 130 dt_object_init(&o->opo_obj, h, d); 131 lu_object_add_top(h, l); 132 } else { 133 dt_object_init(&o->opo_obj, h, d); 134 } 135 136 l->lo_ops = &osp_lu_obj_ops; 137 138 return l; 139 } else { 140 return NULL; 141 } 142 } It seems to me that line 133 is faulty, it should use hdr instead of h. |
| Comment by Gerrit Updater [ 22/Sep/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44055/ |
| Comment by Peter Jones [ 22/Sep/21 ] |
|
Landed for 2.15 |
| Comment by Zhenyu Xu [ 02/Nov/21 ] |
|
another hit (https://testing.whamcloud.com/test_sets/8e467190-cbc4-43f6-8b42-89c274f86735) reveals that osp_object free could still access NULL memory |
| Comment by Gerrit Updater [ 03/Nov/21 ] |
|
"Bobi Jam <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/45442 |
| Comment by Gerrit Updater [ 30/Nov/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45442/ |
| Comment by Peter Jones [ 30/Nov/21 ] |
|
Landed for 2.15 |