Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4079

ll_revalidate_it() may access request after releasing intent

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.5.0
    • 3
    • 10951

    Description

      An error path in ll_revalidate_it() will access the request and the intent after calling ll_intent_release(). This will happen if ll_revalidate_it_finish() returns an error other than -ESTALE or -ENOENT:

      revalidate_finish:
              rc = ll_revalidate_it_finish(req, it, de);
              if (rc != 0) {
                      if (rc != -ESTALE && rc != -ENOENT)
                              ll_intent_release(it);
                      GOTO(out, rc = 0);
              }
      
      	if ((it->it_op & IT_OPEN) && de->d_inode &&
                  !S_ISREG(de->d_inode->i_mode) &&
                  !S_ISDIR(de->d_inode->i_mode)) {
                      ll_release_openhandle(de, it);
              }
              rc = 1;
      
      out:
          	/* We do not free request as it may be reused during following lookup
               * (see comment in mdc/mdc_locks.c::mdc_intent_lock()), request will
               * be freed in ll_lookup_it or in ll_intent_release. But if
               * request was not completed, we need to free it. (bug 5154, 9903)
              */
              if (req != NULL && !it_disposition(it, DISP_ENQ_COMPLETE))
                      ptlrpc_req_finished(req);
      

      Since ll_intent_release() clears it_disposition this logic is invalid. To trigger this inject a memory allocation failure in ll_prep_inode() or do as I did and simulate an unrecognized lov magic.

      diff --git a/lustre/lov/lov_pack.c b/lustre/lov/lov_pack.c
      index bc1e489..ca5f078 100644
      --- a/lustre/lov/lov_pack.c
      +++ b/lustre/lov/lov_pack.c
      @@ -282,6 +282,9 @@ static int lov_verify_lmm(void *lmm, int lmm_bytes, __u16 *stripe_count)
       {
               int rc;
      
      +       if (OBD_FAIL_CHECK(0x2001))
      +               RETURN(-EINVAL);
      +
               if (lsm_op_find(le32_to_cpu(*(__u32 *)lmm)) == NULL) {
                       char *buffer;
                       int sz;
      
      llmount.sh
      # cd ~/lustre-release
      # sh ./lustre/tests/racer.sh
      ...
      # lctl set_param fail_loc=0x2001
      
      == racer test 1: racer on clients: t DURATION=3600 == 09:47:11 (1381330031)
      racers pids: 7474 7475
      
      Message from syslogd@t at Oct  9 09:47:18 ...
       kernel:LustreError: 7517:0:(client.c:2231:__ptlrpc_free_req()) ASSERTION( !request->rq_replay ) failed: req ffff880178aedc00
      
      Message from syslogd@t at Oct  9 09:47:18 ...
       kernel:LustreError: 7517:0:(client.c:2231:__ptlrpc_free_req()) LBUG
      
      PID: 7517   TASK: ffff8801b8701500  CPU: 0   COMMAND: "dir_create.sh"
       #0 [ffff8801b12fda40] machine_kexec at ffffffff81035d6b
       #1 [ffff8801b12fdaa0] crash_kexec at ffffffff810c0e22
       #2 [ffff8801b12fdb70] panic at ffffffff8150f01f
       #3 [ffff8801b12fdbf0] lbug_with_loc at ffffffffa02c6eeb [libcfs]
       #4 [ffff8801b12fdc10] __ptlrpc_req_finished at ffffffffa059ff93 [ptlrpc]
       #5 [ffff8801b12fdc30] ptlrpc_req_finished at ffffffffa05a0630 [ptlrpc]
       #6 [ffff8801b12fdc40] ll_revalidate_it at ffffffffa0cbd533 [lustre]
       #7 [ffff8801b12fdd30] ll_revalidate_nd at ffffffffa0cbe613 [lustre]
       #8 [ffff8801b12fdd60] __lookup_hash at ffffffff8118fa45
       #9 [ffff8801b12fddb0] lookup_hash at ffffffff8119016a
      #10 [ffff8801b12fddd0] do_filp_open at ffffffff8119446f
      #11 [ffff8801b12fdf20] do_sys_open at ffffffff8117f849
      #12 [ffff8801b12fdf70] sys_open at ffffffff8117f960
      #13 [ffff8801b12fdf80] system_call_fastpath at ffffffff8100b072
          RIP: 0000003d046db400  RSP: 00007fff7425ed40  RFLAGS: 00010202
          RAX: 0000000000000002  RBX: ffffffff8100b072  RCX: 0000000000000000
          RDX: 00000000000001b6  RSI: 0000000000000241  RDI: 000000000257f650
          RBP: 00007fff7425ede0   R8: 0000000000000020   R9: 000000000000002a
          R10: 0000000000000076  R11: 0000000000000246  R12: ffffffff8117f960
          R13: ffff8801b12fdf78  R14: 0000000000000000  R15: 0000000000000001
          ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b
      

      On entry to ll_intent_release() then intent had type OPEN | CREATE and disposition
      DISP_IT_EXECD | DISP_LOOKUP_EXECD | DISP_LOOKUP_POS | DISP_OPEN_OPEN | DISP_ENQ_COMPLETE | DISP_ENQ_OPEN_REF. The request had refcount 3.

      ll_revalidate_it() and the other functions around md_intent_lock() all suffer from this confusing and dangerous practice of having/putting the request in it_data and keeping it on stack/returning it through the **reqp parameter. We have two pointers to request and it's not clear which one owns request. The above bug is a good example of this. As an aside, mdc_enqueue has the same issue with the lock handle. We store it in the intent and we return it through a pointer parameter. It's not clear whether they're always the same or which one owns the lock.

      Attachments

        Activity

          People

            dmiter Dmitry Eremin (Inactive)
            jhammond John Hammond
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: