Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1306

LBUG at (dlm_lock.c:213:ldlm_lock_add_to_lru_nolock()) ASSERTION(lock->l_resource->lr_type != LDLM_FLOCK failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.3.0, Lustre 1.8.9
    • Lustre 2.4.0
    • None
    • 3
    • 4630

    Description

      Following bug occured :
      > c0-0c0s5n1 LustreError: 20262:0:(ldlm_lock.c:213:ldlm_lock_add_to_lru_nolock()) ASSERTION(lock->l_resource->lr_type != LDLM_FLOCK) failed
      > c0-0c0s5n1 LustreError: 20262:0:(ldlm_lock.c:213:ldlm_lock_add_to_lru_nolock()) LBUG
      > c0-0c0s5n1 Pid: 20262, comm: fcntl17
      > c0-0c0s5n1 Call Trace:
      > c0-0c0s5n1 [<ffffffff81007a89>] try_stack_unwind+0x149/0x190
      > c0-0c0s5n1 [<ffffffff81006420>] dump_trace+0x90/0x300
      > c0-0c0s5n1 [<ffffffffa0132992>] libcfs_debug_dumpstack+0x52/0x80 [libcfs]
      > c0-0c0s5n1 [<ffffffffa0132f01>] lbug_with_loc+0x71/0xe0 [libcfs]
      > c0-0c0s5n1 [<ffffffffa013c461>] libcfs_assertion_failed+0x61/0x70 [libcfs]
      > c0-0c0s5n1 [<ffffffffa0261348>] ldlm_lock_add_to_lru_nolock+0xd8/0xe0 [ptlrpc]
      > c0-0c0s5n1 [<ffffffffa02619d9>] ldlm_lock_add_to_lru+0x49/0x100 [ptlrpc]
      > c0-0c0s5n1 [<ffffffffa0266d28>] ldlm_lock_decref_internal+0x2e8/0x860 [ptlrpc]
      > c0-0c0s5n1 [<ffffffffa027d288>] failed_lock_cleanup+0x58/0x100 [ptlrpc]
      > c0-0c0s5n1 [<ffffffffa027d4e6>] ldlm_cli_enqueue_fini+0x1b6/0xbb0 [ptlrpc]
      > c0-0c0s5n1 [<ffffffffa0282541>] ldlm_cli_enqueue+0x1a1/0x760 [ptlrpc]
      > c0-0c0s5n1 [<ffffffffa04a876b>] ll_file_flock+0x47b/0x690 [lustre]
      > c0-0c0s5n1 [<ffffffff81122dee>] vfs_lock_file+0x1e/0x40
      > c0-0c0s5n1 [<ffffffff81123027>] fcntl_setlk+0x167/0x320
      > c0-0c0s5n1 [<ffffffff810f6661>] sys_fcntl+0x321/0x540
      > c0-0c0s5n1 [<ffffffff81002eab>] system_call_fastpath+0x16/0x1b
      > c0-0c0s5n1 [<00002aaaadd7f702>] 0x2aaaadd7f702

      It looks like the problem is in following race:

      ldlm_cb thread calls ldlm_run_cp_ast_work() :
      lock_res_and_lock(lock);
      list_del_init(&lock->l_cp_ast);
      LASSERT(lock->l_flags & LDLM_FL_CP_REQD);
      /* save l_completion_ast since it can be changed by

      mds_intent_policy(), see bug 14225 */
      completion_callback = lock->l_completion_ast;
      lock->l_flags &= ~LDLM_FL_CP_REQD;
      unlock_res_and_lock(lock);

      while original lock wait thread receives signal:
      signal callback ldlm_flock_interrupted_wait() does
      lock->l_flags |= LDLM_FL_CBPENDING;
      without locking
      l_wait_event() exits with error (signal occurred) and failed_lock_cleanup() fails on assert because LDLM_FL_CBPENDING was cleared by ldlm_run_cp_ast_work()

      Attachments

        Activity

          People

            wc-triage WC Triage
            askulysh Andriy Skulysh
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: