Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3669

(mdt_xattr.c:178:mdt_getxattr()) ASSERTION( lu_object_assert_exists(&info->mti_object->mot_obj) ) failed:

Details

    • 3
    • 9457

    Description

      Using today's master 2.4.52-72-g643e0ae with the extra racer operations from http://review.whamcloud.com/#/c/5936/ I see LDLM hangs and LBUGs is
      the getxattr handler.

      # MOUNT_2=y llmount.sh
      # sh ./lustre/tests/racer.sh
      
      LNet: Service thread pid 16149 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Pid: 16149, comm: mdt01_004
      
      # cat /proc/16149/stack
      [<ffffffffa04ed641>] cfs_waitq_timedwait+0x11/0x20 [libcfs]
      [<ffffffffa07f912d>] ldlm_completion_ast+0x4ed/0x960 [ptlrpc]
      [<ffffffffa07f8870>] ldlm_cli_enqueue_local+0x1f0/0x5c0 [ptlrpc]
      [<ffffffffa0cd3c0b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      [<ffffffffa0cd4484>] mdt_object_lock+0x14/0x20 [mdt]
      [<ffffffffa0cd4520>] mdt_intent_getxattr+0x90/0x160 [mdt]
      [<ffffffffa0cd111e>] mdt_intent_policy+0x3ae/0x770 [mdt]
      [<ffffffffa07d9471>] ldlm_lock_enqueue+0x361/0x8c0 [ptlrpc]
      [<ffffffffa080219f>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      [<ffffffffa0cd15e6>] mdt_enqueue+0x46/0xe0 [mdt]
      [<ffffffffa0cd7db7>] mdt_handle_common+0x647/0x16d0 [mdt]
      [<ffffffffa0d12d25>] mds_regular_handle+0x15/0x20 [mdt]
      [<ffffffffa0831fd8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      [<ffffffffa083335d>] ptlrpc_main+0xabd/0x1700 [ptlrpc]
      [<ffffffff81096936>] kthread+0x96/0xa0
      [<ffffffff8100c0ca>] child_rip+0xa/0x20
      [<ffffffffffffffff>] 0xffffffffffffffff
      
      LustreError: 11483:0:(mdt_xattr.c:178:mdt_getxattr()) ASSERTION( lu_object_assert_exists(&info->mti_object->mot_obj) ) failed: 
      LustreError: 11483:0:(mdt_xattr.c:178:mdt_getxattr()) LBUG
      Pid: 11483, comm: mdt00_005
      
      crash> bt
      PID: 11483  TASK: ffff8801f2651540  CPU: 0   COMMAND: "mdt00_005"
       #0 [ffff8801feb478f8] machine_kexec at ffffffff81035c9b
       #1 [ffff8801feb47958] crash_kexec at ffffffff810c0d22
       #2 [ffff8801feb47a28] panic at ffffffff8150eeff
       #3 [ffff8801feb47aa8] lbug_with_loc at ffffffffa0c08eeb [libcfs]
       #4 [ffff8801feb47ac8] mdt_getxattr at ffffffffa06ad35f [mdt]
       #5 [ffff8801feb47b68] mdt_intent_getxattr at ffffffffa068a52e [mdt]
       #6 [ffff8801feb47bb8] mdt_intent_policy at ffffffffa068711e [mdt]
       #7 [ffff8801feb47bf8] ldlm_lock_enqueue at ffffffffa0e6e471 [ptlrpc]
       #8 [ffff8801feb47c58] ldlm_handle_enqueue0 at ffffffffa0e9719f [ptlrpc]
       #9 [ffff8801feb47cc8] mdt_enqueue at ffffffffa06875e6 [mdt]
      #10 [ffff8801feb47ce8] mdt_handle_common at ffffffffa068ddf7 [mdt]
      #11 [ffff8801feb47d38] mds_regular_handle at ffffffffa06c8ca5 [mdt]
      #12 [ffff8801feb47d48] ptlrpc_server_handle_request at ffffffffa0ec6fd8 [ptlrpc]
      #13 [ffff8801feb47e48] ptlrpc_main at ffffffffa0ec835d [ptlrpc]
      #14 [ffff8801feb47ee8] kthread at ffffffff81096936
      #15 [ffff8801feb47f48] kernel_thread at ffffffff8100c0ca
      

      We can't keep adding IBITs without defining some ordering or hierarchy for when we use them.

      00010000:00010000:6.0:1375198422.885704:0:5685:0:(ldlm_resource.c:1448:ldlm_resource_dump()) --- Resource: [0x200000401:0x8:0x0].0 (ffff8801eceee480) refcount = 8
      00010000:00010000:6.0:1375198422.885707:0:5685:0:(ldlm_resource.c:1451:ldlm_resource_dump()) Granted locks (in reverse order):
      00010000:00010000:6.0:1375198422.885708:0:5685:0:(ldlm_resource.c:1454:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801e9cc9d00/0x132cecfff5acfc2f lrc: 3/0,0 mode: PW/PW res: [0x200000401:0x8:0x0].0 bits 0x20 rrc: 8 type: IBT flags: 0x60200000000020 nid: 0@lo remote: 0x132cecfff5acfc05 expref: 30 pid: 3274 timeout: 4294866401 lvb_type: 0
      00010000:00010000:6.0:1375198422.885715:0:5685:0:(ldlm_resource.c:1454:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801eeae4d40/0x132cecfff5acf6e1 lrc: 2/0,0 mode: CR/CR res: [0x200000401:0x8:0x0].0 bits 0x9 rrc: 8 type: IBT flags: 0x40200000000000 nid: 0@lo remote: 0x132cecfff5acf6c5 expref: 30 pid: 5252 timeout: 0 lvb_type: 0
      00010000:00010000:6.0:1375198422.885721:0:5685:0:(ldlm_resource.c:1454:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801ec5246c0/0x132cecfff5acebf8 lrc: 2/0,0 mode: CR/CR res: [0x200000401:0x8:0x0].0 bits 0x9 rrc: 8 type: IBT flags: 0x40200000000000 nid: 0@lo remote: 0x132cecfff5acebdc expref: 39 pid: 3274 timeout: 0 lvb_type: 0
      00010000:00010000:6.0:1375198422.885727:0:5685:0:(ldlm_resource.c:1454:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801eb589280/0x132cecfff5ace354 lrc: 2/0,0 mode: CR/CR res: [0x200000401:0x8:0x0].0 bits 0x8 rrc: 8 type: IBT flags: 0x40000000000000 nid: 0@lo remote: 0x132cecfff5ace34d expref: 30 pid: 3810 timeout: 0 lvb_type: 3
      00010000:00010000:6.0:1375198422.885734:0:5685:0:(ldlm_resource.c:1469:ldlm_resource_dump()) Waiting locks:
      00010000:00010000:6.0:1375198422.885736:0:5685:0:(ldlm_resource.c:1471:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801eea356c0/0x132cecfff5ad0e42 lrc: 3/1,0 mode: --/PR res: [0x200000401:0x8:0x0].0 bits 0x20 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 5306 timeout: 0 lvb_type: 0
      00010000:00010000:6.0:1375198422.885742:0:5685:0:(ldlm_resource.c:1471:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801e95e8d00/0x132cecfff5ad0f1b lrc: 3/0,1 mode: --/EX res: [0x200000401:0x8:0x0].0 bits 0x2 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 3274 timeout: 0 lvb_type: 0
      

      Attachments

        Issue Links

          Activity

            [LU-3669] (mdt_xattr.c:178:mdt_getxattr()) ASSERTION( lu_object_assert_exists(&info->mti_object->mot_obj) ) failed:

            Both patches landed to master.

            jhammond John Hammond added a comment - Both patches landed to master.

            With Change, 7208 landing to Master, can this ticket be closed or reduced from blocker?

            jlevi Jodi Levi (Inactive) added a comment - With Change, 7208 landing to Master, can this ticket be closed or reduced from blocker?
            adilger Andreas Dilger added a comment - http://review.whamcloud.com/7208

            Panda, I think your proposal to separate ACL xattrs from xattr cache is a reasonable one.

            The other thing that would reduce lock contention would be to only grant the PR lock if the lock is under contention, as is done in mdt_getattr_name_lock() for the UPDATE lock. This allows clients to cache readdir data and negative dentries for static directories, but does not cause lock ping-pong when the directory is being modified:

                                    /* If the file has not been changed for some time, we
                                     * return not only a LOOKUP lock, but also an UPDATE
                                     * lock and this might save us RPC on later STAT. For
                                     * directories, it also let negative dentry starts
                                     * working for this dir. */
                                    if (ma->ma_valid & MA_INODE &&
                                        ma->ma_attr.la_valid & LA_CTIME &&
                                        info->mti_mdt->mdt_namespace->ns_ctime_age_limit +
                                            ma->ma_attr.la_ctime < cfs_time_current_sec())
                                            child_bits |= MDS_INODELOCK_UPDATE;
            

            Doing something similar for the xattr cache would allow e.g. Samba servers to have a writethrough cache, normal clients to have a read cache for most files, and no cache for files that are being modified frequently (which is no worse than we have today).

            NB - if you change code near here, please s/dentry starts/dentry cachestart/ in the above comment.

            adilger Andreas Dilger added a comment - Panda, I think your proposal to separate ACL xattrs from xattr cache is a reasonable one. The other thing that would reduce lock contention would be to only grant the PR lock if the lock is under contention, as is done in mdt_getattr_name_lock() for the UPDATE lock. This allows clients to cache readdir data and negative dentries for static directories, but does not cause lock ping-pong when the directory is being modified: /* If the file has not been changed for some time, we * return not only a LOOKUP lock, but also an UPDATE * lock and this might save us RPC on later STAT. For * directories, it also let negative dentry starts * working for this dir. */ if (ma->ma_valid & MA_INODE && ma->ma_attr.la_valid & LA_CTIME && info->mti_mdt->mdt_namespace->ns_ctime_age_limit + ma->ma_attr.la_ctime < cfs_time_current_sec()) child_bits |= MDS_INODELOCK_UPDATE; Doing something similar for the xattr cache would allow e.g. Samba servers to have a writethrough cache, normal clients to have a read cache for most files, and no cache for files that are being modified frequently (which is no worse than we have today). NB - if you change code near here, please s/dentry starts/dentry cachestart/ in the above comment.

            We've discussed the issue with our team, and we think the following might be a solution:

            1. UPDATE locking should probably be removed from the MDT setxattr path, because it does not seem to be protecting any piece of data [needs a bit more investigation]
            2. POSIX ACL ACCESS data should be filtered out from the xattr cache.
            3. getxattr and setxattr on POSIX ACLs should use the old (non-xattrcache) code path
            4. listxattr should use the xattr cache and also additionally list XATTR_NAME_ACL_ACCESS ("system.posix_acl_access") if it finds that lli_posix_acl is initialized.

            This splits xattr cache and ACL cache and so splits XATTR and LOOKUP locking.

            panda Andrew Perepechko added a comment - We've discussed the issue with our team, and we think the following might be a solution: 1. UPDATE locking should probably be removed from the MDT setxattr path, because it does not seem to be protecting any piece of data [needs a bit more investigation] 2. POSIX ACL ACCESS data should be filtered out from the xattr cache. 3. getxattr and setxattr on POSIX ACLs should use the old (non-xattrcache) code path 4. listxattr should use the xattr cache and also additionally list XATTR_NAME_ACL_ACCESS ("system.posix_acl_access") if it finds that lli_posix_acl is initialized. This splits xattr cache and ACL cache and so splits XATTR and LOOKUP locking.

            The hang reproduces within seconds of starting the two loops.

            OK, I see.

            I pulled your patch for the crash and it seems OK.

            Great! Thanks a lot for the feedback.

            Eventually the setattr gets a PW XATTR lock, the unlink enqueues a blocked EX FULL lock, and the setattr enqueues a blocked EX UPDATE lock. Sorry I didn't mention it earlier, I only thought of it this morning.

            Hm, right. It looks like nested LDLM locking is often a headache.

            panda Andrew Perepechko added a comment - The hang reproduces within seconds of starting the two loops. OK, I see. I pulled your patch for the crash and it seems OK. Great! Thanks a lot for the feedback. Eventually the setattr gets a PW XATTR lock, the unlink enqueues a blocked EX FULL lock, and the setattr enqueues a blocked EX UPDATE lock. Sorry I didn't mention it earlier, I only thought of it this morning. Hm, right. It looks like nested LDLM locking is often a headache.
            jhammond John Hammond added a comment -

            Hi Andrew, I was just updating the ticket when I saw your comment.

            > It reproduces within seconds of starting the two loops.

            Sorry, I misread your question. The hang reproduces within seconds of starting the two loops. For the crash use http://review.whamcloud.com/#/c/5936/ and run racer. Four times out of five it will hang the other time it will crash. I pulled your patch for the crash and it seems OK.

            > John, is there a chance that 7148 will be landed or should locking be redesigned anyway?

            Not sure. I'm working to get reviewers on it.

            I'm still concerned that 7148 + 7191 are not enough. Consider the following case:

            # MOUNT_2=y llmount.sh
            # cd /mnt/lustre
            # while true; do touch f0; rm f0; done &
            # cd /mnt/lustre2
            # while true; do setfattr -n user.ATTR -v LULZ f0; done &
            

            Eventually the setattr gets a PW XATTR lock, the unlink enqueues a blocked EX FULL lock, and the setattr enqueues a blocked EX UPDATE lock. Sorry I didn't mention it earlier, I only thought of it this morning.

            jhammond John Hammond added a comment - Hi Andrew, I was just updating the ticket when I saw your comment. > It reproduces within seconds of starting the two loops. Sorry, I misread your question. The hang reproduces within seconds of starting the two loops. For the crash use http://review.whamcloud.com/#/c/5936/ and run racer. Four times out of five it will hang the other time it will crash. I pulled your patch for the crash and it seems OK. > John, is there a chance that 7148 will be landed or should locking be redesigned anyway? Not sure. I'm working to get reviewers on it. I'm still concerned that 7148 + 7191 are not enough. Consider the following case: # MOUNT_2=y llmount.sh # cd /mnt/lustre # while true; do touch f0; rm f0; done & # cd /mnt/lustre2 # while true; do setfattr -n user.ATTR -v LULZ f0; done & Eventually the setattr gets a PW XATTR lock, the unlink enqueues a blocked EX FULL lock, and the setattr enqueues a blocked EX UPDATE lock. Sorry I didn't mention it earlier, I only thought of it this morning.

            Using http://review.whamcloud.com/#/c/7148/ and http://review.whamcloud.com/#/c/7191/ seems to make racer.sh pass (no traces of eviction or crashes for a few runs).

            John, is there a chance that 7148 will be landed or should locking be redesigned anyway?

            panda Andrew Perepechko added a comment - Using http://review.whamcloud.com/#/c/7148/ and http://review.whamcloud.com/#/c/7191/ seems to make racer.sh pass (no traces of eviction or crashes for a few runs). John, is there a chance that 7148 will be landed or should locking be redesigned anyway?
            panda Andrew Perepechko added a comment - http://review.whamcloud.com/#/c/7191/1 for the crash part

            Never mind, I was able to reproduce it on my hardware.

            panda Andrew Perepechko added a comment - Never mind, I was able to reproduce it on my hardware.

            People

              jhammond John Hammond
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: