[LU-6973] Null pointer access during umount MDT server if orph_cleanup_sc is not finish Created: 10/Aug/15 Updated: 28/Aug/15 Resolved: 10/Aug/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Antoine Percher | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
MDT Server crash |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Null pointer access during umount MDT server if orph_cleanup_sc is not finish crash> bt
PID: 31563 TASK: ffff880879176ab0 CPU: 1 COMMAND: "orph_cleanup_sc"
#0 [ffff880547d9b810] machine_kexec at ffffffff8103b71b
#1 [ffff880547d9b870] crash_kexec at ffffffff810c9942
#2 [ffff880547d9b940] oops_end at ffffffff8152f070
#3 [ffff880547d9b970] no_context at ffffffff8104c80b
#4 [ffff880547d9b9c0] __bad_area_nosemaphore at ffffffff8104ca95
#5 [ffff880547d9ba10] bad_area_nosemaphore at ffffffff8104cb63
#6 [ffff880547d9ba20] __do_page_fault at ffffffff8104d25c
#7 [ffff880547d9bb40] do_page_fault at ffffffff81530fbe
#8 [ffff880547d9bb70] page_fault at ffffffff8152e375
[exception RIP: fld_server_lookup+97]
RIP: ffffffffa0a50b31 RSP: ffff880547d9bc20 RFLAGS: 00010286
RAX: ffff8810515df4c0 RBX: 00000002122fc000 RCX: ffff8810426c5078
RDX: ffff880e6896b400 RSI: ffffffffa0a56b00 RDI: ffff881040aff840
RBP: ffff880547d9bc70 R8: 0000000015fbb1dc R9: 0000000000000000
R10: 092c8d41cd51a9c5 R11: 0000000000000041 R12: 0000000000000000
R13: ffff8810515df4c0 R14: ffff8810426c5078 R15: ffff8810426c4000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff880547d9bc78] osd_fld_lookup at ffffffffa0d53b8a [osd_ldiskfs]
#10 [ffff880547d9bca8] osd_remote_fid at ffffffffa0d54320 [osd_ldiskfs]
#11 [ffff880547d9bcf8] osd_it_ea_rec at ffffffffa0d6539e [osd_ldiskfs]
#12 [ffff880547d9be38] lod_it_rec at ffffffffa0eec331 [lod]
#13 [ffff880547d9be48] __mdd_orphan_cleanup at ffffffffa0f55050 [mdd]
#14 [ffff880547d9bee8] kthread at ffffffff8109e71e
#15 [ffff880547d9bf48] kernel_thread at ffffffff8100c20a
crash>
This crash appear because ss_server_fld is NULL crash> p *(*((struct osd_device *)0xffff8801f827a000).od_dt_dev.dd_lu_dev.ld_site).ld_seq_site
$9 = {
ss_lu = 0xffff8801f827a150,
ss_node_id = 0,
ss_server_fld = 0x0, <---------- ICI
ss_client_fld = 0x0,
ss_server_seq = 0x0,
ss_control_seq = 0x0,
ss_control_exp = 0x0,
ss_client_seq = 0x0
}
2228 int osd_fld_lookup(const struct lu_env *env, struct osd_device *osd,
2229 obd_seq seq, struct lu_seq_range *range)
2230 {
....
2248
2249 LASSERT(ss != NULL);
2250 fld_range_set_any(range);
2251 rc = fld_server_lookup(env, ss->ss_server_fld, seq, range); <--- in some condition ss->ss_server_fld could be NULL
2252 if (rc != 0) {
2253 CERROR("%s: cannot find FLD range for "LPX64": rc = %d\n",
2254 osd_name(osd), seq, rc);
"lustre/osd-ldiskfs/osd_handler.c" 6005 lines --37%-- 2258,0-1 37%
Is it possible to stop orph_cleanup_sc process on the begining of umount MDT process to prevent this issue ? |
| Comments |
| Comment by Bruno Faccini (Inactive) [ 10/Aug/15 ] |
|
Hello Antoine, |
| Comment by Antoine Percher [ 10/Aug/15 ] |
|
Hello Bruno, |
| Comment by Peter Jones [ 10/Aug/15 ] |
|
Antoine The Peter |