Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.1.1
-
None
-
Bull lustre distribution 213 including patch from ORNL-22 and
LU-1144
-
2
-
4510
Description
A MDS crashed with a general protection failure in osc_create.
This is probably caused by a race as the struct obdo *oa parameter of osc_create is passed from a poisoned lov_request.
crash> bt
PID: 7769 TASK: ffff881808256790 CPU: 0 COMMAND: "mdt_00"
#0 [ffff8817a456af50] machine_kexec at ffffffff81027a4b
#1 [ffff8817a456afb0] crash_kexec at ffffffff810a2db2
#2 [ffff8817a456b080] oops_end at ffffffff81481730
#3 [ffff8817a456b0b0] die at ffffffff810071cb
#4 [ffff8817a456b0e0] do_general_protection at ffffffff814812c2
#5 [ffff8817a456b110] general_protection at ffffffff81480a95
[exception RIP: osc_create+101]
RIP: ffffffffa08d69b5 RSP: ffff8817a456b1c0 RFLAGS: 00010282
RAX: ffffffffa08d6950 RBX: ffff881792313178 RCX: ffff8817f70e8b00
RDX: ffff880b622b2d80 RSI: 5a5a5a5a5a5a5a5a RDI: ffff8817921c8000
RBP: ffff8817a456b290 R8: ffff8817f70e8b00 R9: 00000000ffffffff
R10: ffff881792b92000 R11: 00000000ffffff95 R12: ffff8817923124b8
R13: 5a5a5a5a5a5a5a5a R14: ffff8817f70e8b00 R15: ffff880b622b2d80
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#6 [ffff8817a456b298] lov_check_and_create_object at ffffffffa0929482 [lov]
#7 [ffff8817a456b308] qos_remedy_create at ffffffffa0929c55 [lov]
#8 [ffff8817a456b398] lov_fini_create_set at ffffffffa092660e [lov]
#9 [ffff8817a456b468] lov_create at ffffffffa090f2ed [lov]
#10 [ffff8817a456b5a8] mdd_lov_create at ffffffffa0a897cd [mdd]
#11 [ffff8817a456b688] mdd_create_data at ffffffffa0a93f6e [mdd]
#12 [ffff8817a456b728] cml_create_data at ffffffffa0b41066 [cmm]
#13 [ffff8817a456b7a8] mdt_finish_open at ffffffffa0af6335 [mdt]
#14 [ffff8817a456b878] mdt_reint_open at ffffffffa0af81d2 [mdt]
#15 [ffff8817a456b998] mdt_reint_rec at ffffffffa0adfadf [mdt]
#16 [ffff8817a456b9e8] mdt_reint_internal at ffffffffa0ad7f74 [mdt]
#17 [ffff8817a456ba78] mdt_intent_reint at ffffffffa0ad85f5 [mdt]
#18 [ffff8817a456baf8] mdt_intent_policy at ffffffffa0ad0550 [mdt]
#19 [ffff8817a456bb68] ldlm_lock_enqueue at ffffffffa06eab8a [ptlrpc]
#20 [ffff8817a456bc08] ldlm_handle_enqueue0 at ffffffffa0711777 [ptlrpc]
#21 [ffff8817a456bca8] mdt_enqueue at ffffffffa0ad00ca [mdt]
#22 [ffff8817a456bcd8] mdt_handle_common at ffffffffa0aca865 [mdt]
#23 [ffff8817a456bd58] mdt_regular_handle at ffffffffa0acb875 [mdt]
#24 [ffff8817a456bd68] ptlrpc_main at ffffffffa07409e9 [ptlrpc]
#25 [ffff8817a456bf48] kernel_thread at ffffffff810041aa
0xffffffffa08d69ac <osc_create+92>: test %r15,%r15
0xffffffffa08d69af <osc_create+95>: je 0xffffffffa08d7705
0xffffffffa08d69b5 <osc_create+101>: mov 0x0(%r13),%rax <=== R13: 5a5a5a5a5a5a5a5a
0xffffffffa08d69b9 <osc_create+105>: test $0x1000000,%eax
0xffffffffa08d6974 <osc_create+36>: mov 0xe0(%rdi),%r12
0xffffffffa08d697b <osc_create+43>: mov %rsi,%r13 <==== r13 value comes from rsi which is second parameter of osc_create
0xffffffffa08d697e <osc_create+46>: mov %rdx,%r15
int osc_create(struct obd_export *exp, struct obdo *oa,
^^^^^^^^^^^^^^^ == 0x5a5a5a5a5a5a5a5a
struct lov_stripe_md **ea, struct obd_trans_info *oti)
obdo *oa is transmitted from lov_check_and_create_object lov_request req which is poisoned except for the obdidx:
crash> lov_request ffff880b622b2d40
struct lov_request {
rq_oi = {
oi_policy = {
l_extent = {
start = 6510615555426900570,
end = 6510615555426900570,
gid = 6510615555426900570
},
l_flock = {
start = 6510615555426900570,
end = 6510615555426900570,
owner = 6510615555426900570,
blocking_owner = 6510615555426900570,
blocking_export = 0x5a5a5a5a5a5a5a5a,
pid = 1515870810
},
l_inodebits = {
bits = 6510615555426900570
}
},
oi_flags = 1515870810,
oi_lockh = 0x5a5a5a5a5a5a5a5a,
oi_md = 0x5a5a5a5a5a5a5a5a,
oi_oa = 0x5a5a5a5a5a5a5a5a,
oi_osfs = 0x5a5a5a5a5a5a5a5a,
oi_cb_up = 0x5a5a5a5a5a5a5a5a,
oi_capa = 0x5a5a5a5a5a5a5a5a
},
rq_rqset = 0x5a5a5a5a5a5a5a5a,
rq_link = {
next = 0x5a5a5a5a5a5a5a5a,
prev = 0x5a5a5a5a5a5a5a5a
},
rq_idx = 39, <== scratch-OST0027
rq_stripe = 1515870810,
rq_complete = 1515870810,
rq_rc = 1515870810,
rq_buflen = 1515870810,
rq_oabufs = 1515870810,
rq_pgaidx = 1515870810
}
Attachments
Issue Links
- Trackbacks
-
Changelog 2.1
Changes from version 2.1.2 to version 2.1.3 Server support for kernels: 2.6.18308.13.1.el5 (RHEL5) 2.6.32279.2.1.el6 (RHEL6) Client support for unpatched kernels: 2.6.18308.13.1.el5 (RHEL5) 2.6.32279.2.1....