Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.5.3
-
None
-
MDT Server crash
-
3
-
9223372036854775807
Description
Null pointer access during umount MDT server if orph_cleanup_sc is not finish
crash> bt PID: 31563 TASK: ffff880879176ab0 CPU: 1 COMMAND: "orph_cleanup_sc" #0 [ffff880547d9b810] machine_kexec at ffffffff8103b71b #1 [ffff880547d9b870] crash_kexec at ffffffff810c9942 #2 [ffff880547d9b940] oops_end at ffffffff8152f070 #3 [ffff880547d9b970] no_context at ffffffff8104c80b #4 [ffff880547d9b9c0] __bad_area_nosemaphore at ffffffff8104ca95 #5 [ffff880547d9ba10] bad_area_nosemaphore at ffffffff8104cb63 #6 [ffff880547d9ba20] __do_page_fault at ffffffff8104d25c #7 [ffff880547d9bb40] do_page_fault at ffffffff81530fbe #8 [ffff880547d9bb70] page_fault at ffffffff8152e375 [exception RIP: fld_server_lookup+97] RIP: ffffffffa0a50b31 RSP: ffff880547d9bc20 RFLAGS: 00010286 RAX: ffff8810515df4c0 RBX: 00000002122fc000 RCX: ffff8810426c5078 RDX: ffff880e6896b400 RSI: ffffffffa0a56b00 RDI: ffff881040aff840 RBP: ffff880547d9bc70 R8: 0000000015fbb1dc R9: 0000000000000000 R10: 092c8d41cd51a9c5 R11: 0000000000000041 R12: 0000000000000000 R13: ffff8810515df4c0 R14: ffff8810426c5078 R15: ffff8810426c4000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff880547d9bc78] osd_fld_lookup at ffffffffa0d53b8a [osd_ldiskfs] #10 [ffff880547d9bca8] osd_remote_fid at ffffffffa0d54320 [osd_ldiskfs] #11 [ffff880547d9bcf8] osd_it_ea_rec at ffffffffa0d6539e [osd_ldiskfs] #12 [ffff880547d9be38] lod_it_rec at ffffffffa0eec331 [lod] #13 [ffff880547d9be48] __mdd_orphan_cleanup at ffffffffa0f55050 [mdd] #14 [ffff880547d9bee8] kthread at ffffffff8109e71e #15 [ffff880547d9bf48] kernel_thread at ffffffff8100c20a crash>
This crash appear because ss_server_fld is NULL
crash> p *(*((struct osd_device *)0xffff8801f827a000).od_dt_dev.dd_lu_dev.ld_site).ld_seq_site $9 = { ss_lu = 0xffff8801f827a150, ss_node_id = 0, ss_server_fld = 0x0, <---------- ICI ss_client_fld = 0x0, ss_server_seq = 0x0, ss_control_seq = 0x0, ss_control_exp = 0x0, ss_client_seq = 0x0 }
2228 int osd_fld_lookup(const struct lu_env *env, struct osd_device *osd, 2229 obd_seq seq, struct lu_seq_range *range) 2230 { .... 2248 2249 LASSERT(ss != NULL); 2250 fld_range_set_any(range); 2251 rc = fld_server_lookup(env, ss->ss_server_fld, seq, range); <--- in some condition ss->ss_server_fld could be NULL 2252 if (rc != 0) { 2253 CERROR("%s: cannot find FLD range for "LPX64": rc = %d\n", 2254 osd_name(osd), seq, rc); "lustre/osd-ldiskfs/osd_handler.c" 6005 lines --37%-- 2258,0-1 37%
Is it possible to stop orph_cleanup_sc process on the begining of umount MDT process to prevent this issue ?
Attachments
Issue Links
- is related to
-
LU-5249 conf-sanity test_32a: NULL pointer in fld_local_lookup
- Resolved