Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.4.0
    • Lustre 2.3.0, Lustre 2.4.0, Lustre 1.8.8, Lustre 1.8.x (1.8.0 - 1.8.5)
    • FEFS based on Lustre-1.8.5
      MDSx1, OSSx1(OSTx3), Clientx1
    • 3
    • 5407

    Description

      I believe that I've found two possible race conditions between evict and umount on the server-side.

      ・About The Cases
      1) Touch an Already-Finalized Lustre Hash
      After class_manual_cleanup() with mds or obdfilter, obd->obd_uuid_hash has been finalized. so I believe that the hash shouldn't be touched by anyone. But, evict process can touch the hash right after class_cleanup() because no exclusive function works in this case. Which is why the case ends up in OS panic.

      PID: 3361   TASK: ffff810621f21080  CPU: 13  COMMAND: "lctl"
       #0 [ffff8105e1817a30] crash_kexec at ffffffff800aeb6b
       #1 [ffff8105e1817af0] __die at ffffffff80066157
       #2 [ffff8105e1817b30] die at ffffffff8006cce5
       #3 [ffff8105e1817b60] do_general_protection at ffffffff8006659f
       #4 [ffff8105e1817ba0] error_exit at ffffffff8005ede9
          [exception RIP: lustre_hash_lookup+249]
          RIP: ffffffff887400f9  RSP: ffff8105e1817c58  RFLAGS: 00010206
          RAX: ffff810115f337c0  RBX: 00000000ffff8106  RCX: 0000000000000001
          RDX: 00000000ffff8106  RSI: ffff8105e1817ce8  RDI: ffff810628834140
          RBP: ffff810628834140   R8: 0000000000000032   R9: 0000000000000020
          R10: 000000000000001b  R11: 0000000000000000  R12: 0000000000000000
          R13: ffff8105e1817ce8  R14: ffff8102de86e078  R15: 00007fff6a0adc10
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #5 [ffff8105e1817cb0] obd_export_evict_by_uuid at ffffffff8874426f
       #6 [ffff8105e1817d40] lprocfs_wr_evict_client at ffffffff88869bfd
       #7 [ffff8105e1817e00] lprocfs_mds_wr_evict_client at ffffffff889a6a39
       #8 [ffff8105e1817ee0] lprocfs_fops_write at ffffffff8875349b
       #9 [ffff8105e1817f10] vfs_write at ffffffff80016a49
      #10 [ffff8105e1817f40] sys_write at ffffffff80017316
      #11 [ffff8105e1817f80] system_call at ffffffff8005e116
          RIP: 000000394bac6070  RSP: 00007fff6a0ad3c0  RFLAGS: 00010287
          RAX: 0000000000000001  RBX: ffffffff8005e116  RCX: 00000000fbad2a84
          RDX: 0000000000000024  RSI: 00007fff6a0b0bc7  RDI: 0000000000000003
          RBP: 0000000000000001   R8: fefefefefefefeff   R9: 632d303038362d63
          R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000000
          R13: 00007fff6a0b0bb4  R14: 0000000000000003  R15: 0000000000000000
          ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
      

      2) Deadlock at _procfs_lock
      lprocfs_wr_evict_client() calls class_incref() before LPROCFS_EXIT() and calls class_decref() after LPROCFS_ENTRY(). So when the case in which the class_decref() calls lprocfs_removes() via osc_cleanup() while having _lprocfs_lock already, this case ends up in a deadlock.

      PID: 4495   TASK: ffff810638e25820  CPU: 5   COMMAND: "lctl"
       #0 [ffff8106184df728] schedule at ffffffff80063f96
       #1 [ffff8106184df800] __down_write_nested at ffffffff80065613
       #2 [ffff8106184df840] lprocfs_remove at ffffffff88751fc1
       #3 [ffff8106184df8a0] lprocfs_obd_cleanup at ffffffff887524a8
       #4 [ffff8106184df8b0] osc_cleanup at ffffffff88b25e70
       #5 [ffff8106184df900] class_decref at ffffffff8875c46c
       #6 [ffff8106184df950] obd_zombie_impexp_cull at ffffffff88743ad2
       #7 [ffff8106184df970] class_detach at ffffffff8875d9dd
       #8 [ffff8106184df9b0] class_process_config at ffffffff8876183e
       #9 [ffff8106184dfa30] class_manual_cleanup at ffffffff88763747
      #10 [ffff8106184dfb20] lov_putref at ffffffff88aceb60
      #11 [ffff8106184dfbe0] lov_disconnect at ffffffff88ad1f2b
      #12 [ffff8106184dfc40] mds_lov_clean at ffffffff88975fd4
      #13 [ffff8106184dfca0] mds_precleanup at ffffffff889851a9
      #14 [ffff8106184dfcf0] class_decref at ffffffff8875c07e
      #15 [ffff8106184dfd40] lprocfs_wr_evict_client at ffffffff88869c14
      #16 [ffff8106184dfe00] lprocfs_mds_wr_evict_client at ffffffff889a6a39
      #17 [ffff8106184dfee0] lprocfs_fops_write at ffffffff887534cb
      #18 [ffff8106184dff10] vfs_write at ffffffff80016a49
      #19 [ffff8106184dff40] sys_write at ffffffff80017316
      #20 [ffff8106184dff80] system_call at ffffffff8005e116
          RIP: 000000394bac6070  RSP: 00007fff3dfad010  RFLAGS: 00010287
          RAX: 0000000000000001  RBX: ffffffff8005e116  RCX: 00000000fbad2a84
          RDX: 0000000000000024  RSI: 00007fff3dfb1bc3  RDI: 0000000000000003
          RBP: 0000000000000001   R8: fefefefefefefeff   R9: 0000000000000000
          R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000000
          R13: 00007fff3dfb1bb0  R14: 0000000000000003  R15: 0000000000000000
          ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
      

      the both cases happen with FEFS based on Lustre-1.8.5. But after my source code reading, I came to think that the both case could happen with Lustre-1.8.8, and Lustre-2.3.x too. But I'm not so sure, so I'm happy if someone confirms that.

      ・About The Patches' Features
      1) When someone is touching evict_nid on procfs, mount thread waits for finishing touching it with a new function, server_wait_for_evict_client().
      2) After server_wait_for_evict_client(), a new variable, obd_evict_client_frozen, is set and this prohibits "evict_client" from evicting export object.
      3) Add a if-statement to the evict functions which checks obd_stopping and obd_evict_client_frozen so as not to touch the lustre hash which has already been finalized by class_manual_cleanup().

      I will be happy when someone reviews my patches and when my patches for the both cases help you to fix the problem.

      Thank you.

      Attachments

        Activity

          [LU-2258] races between evict and umount
          pjones Peter Jones made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]
          pjones Peter Jones added a comment -

          Patch was landed for 2.4

          pjones Peter Jones added a comment - Patch was landed for 2.4
          pjones Peter Jones made changes -
          Labels Original: patch New: mn8 patch
          jlevi Jodi Levi (Inactive) made changes -
          Fix Version/s New: Lustre 2.4.0 [ 10154 ]
          nozaki Hiroya Nozaki (Inactive) added a comment - - edited

          Regarding the issue which we've been discussing in the patch-set-1 comment, I've taken this way to fix it.

          class_decref()
          void class_decref(struct obd_device *obd)
          {
                  int err;
                  int refs;
          
                  mutex_down(&obd->obd_dev_cleanup_sem); <----- adds new mutex to obd_device
                  spin_lock(&obd->obd_dev_lock);
                  atomic_dec(&obd->obd_refcount);
                  refs = atomic_read(&obd->obd_refcount);
                  spin_unlock(&obd->obd_dev_lock);
          
                  CDEBUG(D_CONFIG, "Decref %s (%p) now %d\n", obd->obd_name, obd, refs);
          
                  if ((refs == 1) && obd->obd_stopping &&
                      !obd->obd_precleaned_up) {      <----- adds new stage to obd_device and check here
                          obd->obd_precleaned_up = 1; <----- set obd_precleaned_up so as not to handle here again
          
                          ...
          
                          mutex_up(&obd->obd_dev_cleanup_sem);
                          return;
                  }
                  mutex_up(&obd->obd_dev_cleanup_sem);
          
                  if (refs == 0) {
          
                          ...
          
                  }
          }
          

          I believe that class_decref() isn't called so often and that this exclusive section doesn't takes so much time especially in 2.x.
          So I think this way couldn't be problem .... whereas I'd like to refrain from adding a new member to obd_device, though.

          nozaki Hiroya Nozaki (Inactive) added a comment - - edited Regarding the issue which we've been discussing in the patch-set-1 comment, I've taken this way to fix it. class_decref() void class_decref(struct obd_device *obd) { int err; int refs; mutex_down(&obd->obd_dev_cleanup_sem); <----- adds new mutex to obd_device spin_lock(&obd->obd_dev_lock); atomic_dec(&obd->obd_refcount); refs = atomic_read(&obd->obd_refcount); spin_unlock(&obd->obd_dev_lock); CDEBUG(D_CONFIG, "Decref %s (%p) now %d\n" , obd->obd_name, obd, refs); if ((refs == 1) && obd->obd_stopping && !obd->obd_precleaned_up) { <----- adds new stage to obd_device and check here obd->obd_precleaned_up = 1; <----- set obd_precleaned_up so as not to handle here again ... mutex_up(&obd->obd_dev_cleanup_sem); return ; } mutex_up(&obd->obd_dev_cleanup_sem); if (refs == 0) { ... } } I believe that class_decref() isn't called so often and that this exclusive section doesn't takes so much time especially in 2.x. So I think this way couldn't be problem .... whereas I'd like to refrain from adding a new member to obd_device, though.
          pjones Peter Jones made changes -
          Assignee Original: WC Triage [ wc-triage ] New: Lai Siyao [ laisiyao ]
          adilger Andreas Dilger made changes -
          Priority Original: Minor [ 4 ] New: Major [ 3 ]
          pjones Peter Jones made changes -
          Labels New: patch
          nozaki Hiroya Nozaki (Inactive) added a comment - patch for the master branch http://review.whamcloud.com/4444
          nozaki Hiroya Nozaki (Inactive) added a comment - - edited

          Here is patch for b1_8.
          http://review.whamcloud.com/#change,4442

          I'll make the patch for the master branch soon.
          please wait for a while.

          nozaki Hiroya Nozaki (Inactive) added a comment - - edited Here is patch for b1_8. http://review.whamcloud.com/#change,4442 I'll make the patch for the master branch soon. please wait for a while.

          People

            laisiyao Lai Siyao
            nozaki Hiroya Nozaki (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: