Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.3.0, Lustre 2.1.3
    • Lustre 2.1.1
    • None
    • Bull lustre distribution 213 including patch from ORNL-22 and LU-1144
    • 2
    • 4510

    Description

      A MDS crashed with a general protection failure in osc_create.
      This is probably caused by a race as the struct obdo *oa parameter of osc_create is passed from a poisoned lov_request.

      crash> bt
      PID: 7769   TASK: ffff881808256790  CPU: 0   COMMAND: "mdt_00"
       #0 [ffff8817a456af50] machine_kexec at ffffffff81027a4b
       #1 [ffff8817a456afb0] crash_kexec at ffffffff810a2db2
       #2 [ffff8817a456b080] oops_end at ffffffff81481730
       #3 [ffff8817a456b0b0] die at ffffffff810071cb
       #4 [ffff8817a456b0e0] do_general_protection at ffffffff814812c2
       #5 [ffff8817a456b110] general_protection at ffffffff81480a95
          [exception RIP: osc_create+101]
          RIP: ffffffffa08d69b5  RSP: ffff8817a456b1c0  RFLAGS: 00010282
          RAX: ffffffffa08d6950  RBX: ffff881792313178  RCX: ffff8817f70e8b00
          RDX: ffff880b622b2d80  RSI: 5a5a5a5a5a5a5a5a  RDI: ffff8817921c8000
          RBP: ffff8817a456b290   R8: ffff8817f70e8b00   R9: 00000000ffffffff
          R10: ffff881792b92000  R11: 00000000ffffff95  R12: ffff8817923124b8
          R13: 5a5a5a5a5a5a5a5a  R14: ffff8817f70e8b00  R15: ffff880b622b2d80
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #6 [ffff8817a456b298] lov_check_and_create_object at ffffffffa0929482 [lov]
       #7 [ffff8817a456b308] qos_remedy_create at ffffffffa0929c55 [lov]
       #8 [ffff8817a456b398] lov_fini_create_set at ffffffffa092660e [lov]
       #9 [ffff8817a456b468] lov_create at ffffffffa090f2ed [lov]
      #10 [ffff8817a456b5a8] mdd_lov_create at ffffffffa0a897cd [mdd]
      #11 [ffff8817a456b688] mdd_create_data at ffffffffa0a93f6e [mdd]
      #12 [ffff8817a456b728] cml_create_data at ffffffffa0b41066 [cmm]
      #13 [ffff8817a456b7a8] mdt_finish_open at ffffffffa0af6335 [mdt]
      #14 [ffff8817a456b878] mdt_reint_open at ffffffffa0af81d2 [mdt]
      #15 [ffff8817a456b998] mdt_reint_rec at ffffffffa0adfadf [mdt]
      #16 [ffff8817a456b9e8] mdt_reint_internal at ffffffffa0ad7f74 [mdt]
      #17 [ffff8817a456ba78] mdt_intent_reint at ffffffffa0ad85f5 [mdt]
      #18 [ffff8817a456baf8] mdt_intent_policy at ffffffffa0ad0550 [mdt]
      #19 [ffff8817a456bb68] ldlm_lock_enqueue at ffffffffa06eab8a [ptlrpc]
      #20 [ffff8817a456bc08] ldlm_handle_enqueue0 at ffffffffa0711777 [ptlrpc]
      #21 [ffff8817a456bca8] mdt_enqueue at ffffffffa0ad00ca [mdt]
      #22 [ffff8817a456bcd8] mdt_handle_common at ffffffffa0aca865 [mdt]
      #23 [ffff8817a456bd58] mdt_regular_handle at ffffffffa0acb875 [mdt]
      #24 [ffff8817a456bd68] ptlrpc_main at ffffffffa07409e9 [ptlrpc]
      #25 [ffff8817a456bf48] kernel_thread at ffffffff810041aa
      
      0xffffffffa08d69ac <osc_create+92>:     test   %r15,%r15
      0xffffffffa08d69af <osc_create+95>:     je     0xffffffffa08d7705
      0xffffffffa08d69b5 <osc_create+101>:    mov    0x0(%r13),%rax           <=== R13: 5a5a5a5a5a5a5a5a
      0xffffffffa08d69b9 <osc_create+105>:    test   $0x1000000,%eax
      
      
      0xffffffffa08d6974 <osc_create+36>:     mov    0xe0(%rdi),%r12
      0xffffffffa08d697b <osc_create+43>:     mov    %rsi,%r13  <==== r13 value comes from rsi which is second parameter of osc_create
      0xffffffffa08d697e <osc_create+46>:     mov    %rdx,%r15
      
      
      int osc_create(struct obd_export *exp, struct obdo *oa,
                                             ^^^^^^^^^^^^^^^ == 0x5a5a5a5a5a5a5a5a
                     struct lov_stripe_md **ea, struct obd_trans_info *oti)
      
      

      obdo *oa is transmitted from lov_check_and_create_object lov_request req which is poisoned except for the obdidx:

      crash> lov_request ffff880b622b2d40
      struct lov_request {
        rq_oi = {
          oi_policy = {
            l_extent = {
              start = 6510615555426900570, 
              end = 6510615555426900570, 
              gid = 6510615555426900570
            }, 
            l_flock = {
              start = 6510615555426900570, 
              end = 6510615555426900570, 
              owner = 6510615555426900570, 
              blocking_owner = 6510615555426900570, 
              blocking_export = 0x5a5a5a5a5a5a5a5a, 
              pid = 1515870810
            }, 
            l_inodebits = {
              bits = 6510615555426900570
            }
          }, 
          oi_flags = 1515870810, 
          oi_lockh = 0x5a5a5a5a5a5a5a5a, 
          oi_md = 0x5a5a5a5a5a5a5a5a, 
          oi_oa = 0x5a5a5a5a5a5a5a5a, 
          oi_osfs = 0x5a5a5a5a5a5a5a5a, 
          oi_cb_up = 0x5a5a5a5a5a5a5a5a, 
          oi_capa = 0x5a5a5a5a5a5a5a5a
        }, 
        rq_rqset = 0x5a5a5a5a5a5a5a5a, 
        rq_link = {
          next = 0x5a5a5a5a5a5a5a5a, 
          prev = 0x5a5a5a5a5a5a5a5a
        }, 
        rq_idx = 39,                 <== scratch-OST0027
        rq_stripe = 1515870810, 
        rq_complete = 1515870810, 
        rq_rc = 1515870810, 
        rq_buflen = 1515870810, 
        rq_oabufs = 1515870810, 
        rq_pgaidx = 1515870810
      }
      
      

      Attachments

        Issue Links

          Activity

            [LU-1626] GPF in osc_create
            bobijam Zhenyu Xu added a comment -

            patch landed for 2.1.3 and 2.3.0

            bobijam Zhenyu Xu added a comment - patch landed for 2.1.3 and 2.3.0
            bobijam Zhenyu Xu added a comment -

            b2_1 patch tracking at http://review.whamcloud.com/3402

            bobijam Zhenyu Xu added a comment - b2_1 patch tracking at http://review.whamcloud.com/3402
            bobijam Zhenyu Xu added a comment -

            patch tracking at http://review.whamcloud.com/3401

            lov: fix lov request set finish check race

            When several lov_request callbacks are called, if one of them is
            the last lov_request in the set, lov_finished_set() checks for
            all of them will return true, while the following action is supposed
            be called only once for the set, in this case the assumption is broke
            and the lov request set's refcount is wrong.

            This patch fixed another glitch, in qos_remedy_create(), when we use
            OST pool, the ost_idx value does not initialied correctly.

            bobijam Zhenyu Xu added a comment - patch tracking at http://review.whamcloud.com/3401 lov: fix lov request set finish check race When several lov_request callbacks are called, if one of them is the last lov_request in the set, lov_finished_set() checks for all of them will return true, while the following action is supposed be called only once for the set, in this case the assumption is broke and the lov request set's refcount is wrong. This patch fixed another glitch, in qos_remedy_create(), when we use OST pool, the ost_idx value does not initialied correctly.
            pjones Peter Jones added a comment -

            Bobijam

            Could you please comment on this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Bobijam Could you please comment on this one? Thanks Peter

            People

              bobijam Zhenyu Xu
              spiechurski Sebastien Piechurski
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: