Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.1.1
-
None
-
Bull lustre distribution 213 including patch from ORNL-22 and
LU-1144
-
2
-
4510
Description
A MDS crashed with a general protection failure in osc_create.
This is probably caused by a race as the struct obdo *oa parameter of osc_create is passed from a poisoned lov_request.
crash> bt PID: 7769 TASK: ffff881808256790 CPU: 0 COMMAND: "mdt_00" #0 [ffff8817a456af50] machine_kexec at ffffffff81027a4b #1 [ffff8817a456afb0] crash_kexec at ffffffff810a2db2 #2 [ffff8817a456b080] oops_end at ffffffff81481730 #3 [ffff8817a456b0b0] die at ffffffff810071cb #4 [ffff8817a456b0e0] do_general_protection at ffffffff814812c2 #5 [ffff8817a456b110] general_protection at ffffffff81480a95 [exception RIP: osc_create+101] RIP: ffffffffa08d69b5 RSP: ffff8817a456b1c0 RFLAGS: 00010282 RAX: ffffffffa08d6950 RBX: ffff881792313178 RCX: ffff8817f70e8b00 RDX: ffff880b622b2d80 RSI: 5a5a5a5a5a5a5a5a RDI: ffff8817921c8000 RBP: ffff8817a456b290 R8: ffff8817f70e8b00 R9: 00000000ffffffff R10: ffff881792b92000 R11: 00000000ffffff95 R12: ffff8817923124b8 R13: 5a5a5a5a5a5a5a5a R14: ffff8817f70e8b00 R15: ffff880b622b2d80 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #6 [ffff8817a456b298] lov_check_and_create_object at ffffffffa0929482 [lov] #7 [ffff8817a456b308] qos_remedy_create at ffffffffa0929c55 [lov] #8 [ffff8817a456b398] lov_fini_create_set at ffffffffa092660e [lov] #9 [ffff8817a456b468] lov_create at ffffffffa090f2ed [lov] #10 [ffff8817a456b5a8] mdd_lov_create at ffffffffa0a897cd [mdd] #11 [ffff8817a456b688] mdd_create_data at ffffffffa0a93f6e [mdd] #12 [ffff8817a456b728] cml_create_data at ffffffffa0b41066 [cmm] #13 [ffff8817a456b7a8] mdt_finish_open at ffffffffa0af6335 [mdt] #14 [ffff8817a456b878] mdt_reint_open at ffffffffa0af81d2 [mdt] #15 [ffff8817a456b998] mdt_reint_rec at ffffffffa0adfadf [mdt] #16 [ffff8817a456b9e8] mdt_reint_internal at ffffffffa0ad7f74 [mdt] #17 [ffff8817a456ba78] mdt_intent_reint at ffffffffa0ad85f5 [mdt] #18 [ffff8817a456baf8] mdt_intent_policy at ffffffffa0ad0550 [mdt] #19 [ffff8817a456bb68] ldlm_lock_enqueue at ffffffffa06eab8a [ptlrpc] #20 [ffff8817a456bc08] ldlm_handle_enqueue0 at ffffffffa0711777 [ptlrpc] #21 [ffff8817a456bca8] mdt_enqueue at ffffffffa0ad00ca [mdt] #22 [ffff8817a456bcd8] mdt_handle_common at ffffffffa0aca865 [mdt] #23 [ffff8817a456bd58] mdt_regular_handle at ffffffffa0acb875 [mdt] #24 [ffff8817a456bd68] ptlrpc_main at ffffffffa07409e9 [ptlrpc] #25 [ffff8817a456bf48] kernel_thread at ffffffff810041aa 0xffffffffa08d69ac <osc_create+92>: test %r15,%r15 0xffffffffa08d69af <osc_create+95>: je 0xffffffffa08d7705 0xffffffffa08d69b5 <osc_create+101>: mov 0x0(%r13),%rax <=== R13: 5a5a5a5a5a5a5a5a 0xffffffffa08d69b9 <osc_create+105>: test $0x1000000,%eax 0xffffffffa08d6974 <osc_create+36>: mov 0xe0(%rdi),%r12 0xffffffffa08d697b <osc_create+43>: mov %rsi,%r13 <==== r13 value comes from rsi which is second parameter of osc_create 0xffffffffa08d697e <osc_create+46>: mov %rdx,%r15 int osc_create(struct obd_export *exp, struct obdo *oa, ^^^^^^^^^^^^^^^ == 0x5a5a5a5a5a5a5a5a struct lov_stripe_md **ea, struct obd_trans_info *oti)
obdo *oa is transmitted from lov_check_and_create_object lov_request req which is poisoned except for the obdidx:
crash> lov_request ffff880b622b2d40 struct lov_request { rq_oi = { oi_policy = { l_extent = { start = 6510615555426900570, end = 6510615555426900570, gid = 6510615555426900570 }, l_flock = { start = 6510615555426900570, end = 6510615555426900570, owner = 6510615555426900570, blocking_owner = 6510615555426900570, blocking_export = 0x5a5a5a5a5a5a5a5a, pid = 1515870810 }, l_inodebits = { bits = 6510615555426900570 } }, oi_flags = 1515870810, oi_lockh = 0x5a5a5a5a5a5a5a5a, oi_md = 0x5a5a5a5a5a5a5a5a, oi_oa = 0x5a5a5a5a5a5a5a5a, oi_osfs = 0x5a5a5a5a5a5a5a5a, oi_cb_up = 0x5a5a5a5a5a5a5a5a, oi_capa = 0x5a5a5a5a5a5a5a5a }, rq_rqset = 0x5a5a5a5a5a5a5a5a, rq_link = { next = 0x5a5a5a5a5a5a5a5a, prev = 0x5a5a5a5a5a5a5a5a }, rq_idx = 39, <== scratch-OST0027 rq_stripe = 1515870810, rq_complete = 1515870810, rq_rc = 1515870810, rq_buflen = 1515870810, rq_oabufs = 1515870810, rq_pgaidx = 1515870810 }
Attachments
Issue Links
- Trackbacks
-
Changelog 2.1 Changes from version 2.1.2 to version 2.1.3 Server support for kernels: 2.6.18308.13.1.el5 (RHEL5) 2.6.32279.2.1.el6 (RHEL6) Client support for unpatched kernels: 2.6.18308.13.1.el5 (RHEL5) 2.6.32279.2.1....