Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1626

GPF in osc_create

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.3.0, Lustre 2.1.3
    • Lustre 2.1.1
    • None
    • Bull lustre distribution 213 including patch from ORNL-22 and LU-1144
    • 2
    • 4510

    Description

      A MDS crashed with a general protection failure in osc_create.
      This is probably caused by a race as the struct obdo *oa parameter of osc_create is passed from a poisoned lov_request.

      crash> bt
      PID: 7769   TASK: ffff881808256790  CPU: 0   COMMAND: "mdt_00"
       #0 [ffff8817a456af50] machine_kexec at ffffffff81027a4b
       #1 [ffff8817a456afb0] crash_kexec at ffffffff810a2db2
       #2 [ffff8817a456b080] oops_end at ffffffff81481730
       #3 [ffff8817a456b0b0] die at ffffffff810071cb
       #4 [ffff8817a456b0e0] do_general_protection at ffffffff814812c2
       #5 [ffff8817a456b110] general_protection at ffffffff81480a95
          [exception RIP: osc_create+101]
          RIP: ffffffffa08d69b5  RSP: ffff8817a456b1c0  RFLAGS: 00010282
          RAX: ffffffffa08d6950  RBX: ffff881792313178  RCX: ffff8817f70e8b00
          RDX: ffff880b622b2d80  RSI: 5a5a5a5a5a5a5a5a  RDI: ffff8817921c8000
          RBP: ffff8817a456b290   R8: ffff8817f70e8b00   R9: 00000000ffffffff
          R10: ffff881792b92000  R11: 00000000ffffff95  R12: ffff8817923124b8
          R13: 5a5a5a5a5a5a5a5a  R14: ffff8817f70e8b00  R15: ffff880b622b2d80
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #6 [ffff8817a456b298] lov_check_and_create_object at ffffffffa0929482 [lov]
       #7 [ffff8817a456b308] qos_remedy_create at ffffffffa0929c55 [lov]
       #8 [ffff8817a456b398] lov_fini_create_set at ffffffffa092660e [lov]
       #9 [ffff8817a456b468] lov_create at ffffffffa090f2ed [lov]
      #10 [ffff8817a456b5a8] mdd_lov_create at ffffffffa0a897cd [mdd]
      #11 [ffff8817a456b688] mdd_create_data at ffffffffa0a93f6e [mdd]
      #12 [ffff8817a456b728] cml_create_data at ffffffffa0b41066 [cmm]
      #13 [ffff8817a456b7a8] mdt_finish_open at ffffffffa0af6335 [mdt]
      #14 [ffff8817a456b878] mdt_reint_open at ffffffffa0af81d2 [mdt]
      #15 [ffff8817a456b998] mdt_reint_rec at ffffffffa0adfadf [mdt]
      #16 [ffff8817a456b9e8] mdt_reint_internal at ffffffffa0ad7f74 [mdt]
      #17 [ffff8817a456ba78] mdt_intent_reint at ffffffffa0ad85f5 [mdt]
      #18 [ffff8817a456baf8] mdt_intent_policy at ffffffffa0ad0550 [mdt]
      #19 [ffff8817a456bb68] ldlm_lock_enqueue at ffffffffa06eab8a [ptlrpc]
      #20 [ffff8817a456bc08] ldlm_handle_enqueue0 at ffffffffa0711777 [ptlrpc]
      #21 [ffff8817a456bca8] mdt_enqueue at ffffffffa0ad00ca [mdt]
      #22 [ffff8817a456bcd8] mdt_handle_common at ffffffffa0aca865 [mdt]
      #23 [ffff8817a456bd58] mdt_regular_handle at ffffffffa0acb875 [mdt]
      #24 [ffff8817a456bd68] ptlrpc_main at ffffffffa07409e9 [ptlrpc]
      #25 [ffff8817a456bf48] kernel_thread at ffffffff810041aa
      
      0xffffffffa08d69ac <osc_create+92>:     test   %r15,%r15
      0xffffffffa08d69af <osc_create+95>:     je     0xffffffffa08d7705
      0xffffffffa08d69b5 <osc_create+101>:    mov    0x0(%r13),%rax           <=== R13: 5a5a5a5a5a5a5a5a
      0xffffffffa08d69b9 <osc_create+105>:    test   $0x1000000,%eax
      
      
      0xffffffffa08d6974 <osc_create+36>:     mov    0xe0(%rdi),%r12
      0xffffffffa08d697b <osc_create+43>:     mov    %rsi,%r13  <==== r13 value comes from rsi which is second parameter of osc_create
      0xffffffffa08d697e <osc_create+46>:     mov    %rdx,%r15
      
      
      int osc_create(struct obd_export *exp, struct obdo *oa,
                                             ^^^^^^^^^^^^^^^ == 0x5a5a5a5a5a5a5a5a
                     struct lov_stripe_md **ea, struct obd_trans_info *oti)
      
      

      obdo *oa is transmitted from lov_check_and_create_object lov_request req which is poisoned except for the obdidx:

      crash> lov_request ffff880b622b2d40
      struct lov_request {
        rq_oi = {
          oi_policy = {
            l_extent = {
              start = 6510615555426900570, 
              end = 6510615555426900570, 
              gid = 6510615555426900570
            }, 
            l_flock = {
              start = 6510615555426900570, 
              end = 6510615555426900570, 
              owner = 6510615555426900570, 
              blocking_owner = 6510615555426900570, 
              blocking_export = 0x5a5a5a5a5a5a5a5a, 
              pid = 1515870810
            }, 
            l_inodebits = {
              bits = 6510615555426900570
            }
          }, 
          oi_flags = 1515870810, 
          oi_lockh = 0x5a5a5a5a5a5a5a5a, 
          oi_md = 0x5a5a5a5a5a5a5a5a, 
          oi_oa = 0x5a5a5a5a5a5a5a5a, 
          oi_osfs = 0x5a5a5a5a5a5a5a5a, 
          oi_cb_up = 0x5a5a5a5a5a5a5a5a, 
          oi_capa = 0x5a5a5a5a5a5a5a5a
        }, 
        rq_rqset = 0x5a5a5a5a5a5a5a5a, 
        rq_link = {
          next = 0x5a5a5a5a5a5a5a5a, 
          prev = 0x5a5a5a5a5a5a5a5a
        }, 
        rq_idx = 39,                 <== scratch-OST0027
        rq_stripe = 1515870810, 
        rq_complete = 1515870810, 
        rq_rc = 1515870810, 
        rq_buflen = 1515870810, 
        rq_oabufs = 1515870810, 
        rq_pgaidx = 1515870810
      }
      
      

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              spiechurski Sebastien Piechurski
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: