Details

    • Technical task
    • Resolution: Fixed
    • Major
    • Lustre 2.6.0, Lustre 2.5.1
    • Lustre 2.5.0
    • 9136

    Description

      Running the HSM stack as of July 15 2013, I see a hang when a release is issued while a restore is still running. To reproduce I run the following:

      #!/bin/bash
      
      export MOUNT_2=n
      export MDSCOUNT=1
      export PTLDEBUG="super inode ioctl warning dlmtrace error emerg ha rpctrace vfstrace config console"
      export DEBUG_SIZE=512
      
      hsm_root=/tmp/hsm_root
      
      rm -rf $hsm_root
      mkdir $hsm_root
      
      llmount.sh
      
      lctl conf_param lustre-MDT0000.mdt.hsm_control=enabled
      # lctl conf_param lustre-MDT0001.mdt.hsm_control=enabled
      sleep 10
      lhsmtool_posix --verbose --hsm_root=$hsm_root --bandwidth 1 lustre
      
      lctl dk > ~/hsm-0-mount.dk
      
      set -x
      cd /mnt/lustre
      lfs setstripe -c2 f0
      dd if=/dev/urandom of=f0 bs=1M count=100
      lctl dk > ~/hsm-1-dd.dk
      
      lfs hsm_archive f0
      sleep 10
      echo > /proc/fs/lustre/ldlm/dump_namespaces
      lctl dk > ~/hsm-2-archive.dk
      
      lfs hsm_release f0
      echo > /proc/fs/lustre/ldlm/dump_namespaces
      lctl dk > ~/hsm-3-release.dk
      
      lfs hsm_restore f0
      echo > /proc/fs/lustre/ldlm/dump_namespaces
      lctl dk > ~/hsm-4-restore.dk
      
      lfs hsm_release f0
      

      with the last command never returning. The MDS_CLOSE handler looks like

      10070
      [<ffffffffa0f9866e>] cfs_waitq_wait+0xe/0x10 [libcfs]
      [<ffffffffa124826a>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
      [<ffffffffa1247920>] ldlm_cli_enqueue_local+0x1f0/0x5c0 [ptlrpc]
      [<ffffffffa08cee3b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      [<ffffffffa08cf6b4>] mdt_object_lock+0x14/0x20 [mdt]
      [<ffffffffa08f9551>] mdt_mfd_close+0x351/0xde0 [mdt]
      [<ffffffffa08fb372>] mdt_close+0x662/0xa60 [mdt]
      [<ffffffffa08d2c07>] mdt_handle_common+0x647/0x16d0 [mdt]
      [<ffffffffa090c9e5>] mds_readpage_handle+0x15/0x20 [mdt]
      [<ffffffffa12813d8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      [<ffffffffa128275d>] ptlrpc_main+0xabd/0x1700 [ptlrpc]
      [<ffffffff81096936>] kthread+0x96/0xa0
      [<ffffffff8100c0ca>] child_rip+0xa/0x20
      [<ffffffffffffffff>] 0xffffffffffffffff
      

      while the MDS_HSM_PROGRESS handler looks like:

      10065
      [<ffffffffa0f9866e>] cfs_waitq_wait+0xe/0x10 [libcfs]
      [<ffffffffa124826a>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
      [<ffffffffa1247920>] ldlm_cli_enqueue_local+0x1f0/0x5c0 [ptlrpc]
      [<ffffffffa08cee3b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      [<ffffffffa08cf6b4>] mdt_object_lock+0x14/0x20 [mdt]
      [<ffffffffa08cf721>] mdt_object_find_lock+0x61/0x170 [mdt]
      [<ffffffffa091dc22>] hsm_get_md_attr+0x62/0x270 [mdt]
      [<ffffffffa0923253>] mdt_hsm_update_request_state+0x4d3/0x1c20 [mdt]
      [<ffffffffa091ae6e>] mdt_hsm_coordinator_update+0x3e/0xe0 [mdt]
      [<ffffffffa090931b>] mdt_hsm_progress+0x21b/0x330 [mdt]
      [<ffffffffa08d2c07>] mdt_handle_common+0x647/0x16d0 [mdt]
      [<ffffffffa090ca05>] mds_regular_handle+0x15/0x20 [mdt]
      [<ffffffffa12813d8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      [<ffffffffa128275d>] ptlrpc_main+0xabd/0x1700 [ptlrpc]
      [<ffffffff81096936>] kthread+0x96/0xa0
      [<ffffffff8100c0ca>] child_rip+0xa/0x20
      [<ffffffffffffffff>] 0xffffffffffffffff
      

      The close handler is waiting on an EX layout lock on f0. While the
      progress handler is waiting on PW update lock on f0. dump_namespaces does not show that the UPDATE lock is granted.

      For reference I'm using the following changes:

      # LU-2919 hsm: Implementation of exclusive open
      # http://review.whamcloud.com/#/c/6730
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/30/6730/13 && git cherry-pick FETCH_HEAD
       
      # LU-1333 hsm: Add hsm_release feature.
      # http://review.whamcloud.com/#/c/6526
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/26/6526/9 && git cherry-pick FETCH_HEAD
       
      # LU-3339 mdt: HSM on disk actions record
      # http://review.whamcloud.com/#/c/6529
      # MERGED
       
      # LU-3340 mdt: HSM memory requests management
      # http://review.whamcloud.com/#/c/6530
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/30/6530/8 && git cherry-pick FETCH_HEAD
       
      # LU-3341 mdt: HSM coordinator client interface
      # http://review.whamcloud.com/#/c/6532
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/32/6532/13 && git cherry-pick FETCH_HEAD
      # Needs rebase in sanity-hsm.sh
       
      # LU-3342 mdt: HSM coordinator agent interface
      # http://review.whamcloud.com/#/c/6534
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/34/6534/8 && git cherry-pick FETCH_HEAD
       
      # LU-3343 mdt: HSM coordinator main thread
      # http://review.whamcloud.com/#/c/6912
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/12/6912/3 && git cherry-pick FETCH_HEAD
      # lustre/mdt/mdt_internal.h
       
      # LU-3561 tests: HSM sanity test suite
      # http://review.whamcloud.com/#/c/6913/
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/13/6913/4 && git cherry-pick FETCH_HEAD
      # lustre/tests/sanity-hsm.sh
       
      # LU-3432 llite: Access to released file trigs a restore
      # http://review.whamcloud.com/#/c/6537
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/37/6537/11 && git cherry-pick FETCH_HEAD
       
      # LU-3363 api: HSM import uses new released pattern
      # http://review.whamcloud.com/#/c/6536
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/36/6536/8 && git cherry-pick FETCH_HEAD
       
      # LU-2062 utils: HSM Posix CopyTool
      # http://review.whamcloud.com/#/c/4737
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/37/4737/18 && git cherry-pick FETCH_HEAD
      

      Attachments

        Issue Links

          Activity

            [LU-3601] HSM release causes running restore to hang, hangs itself

            Andreas, it was hit during testing.

            process1.lock1: open|lookup, granted
            process2.lock1: layout | XXX, granted
            process3.lock1: lookup | XXX, waiting process1.lock1
            process1.lock2: layout, waiting process2.lock1
            process2.lock1: cancelled, reprocessing does not reach process1.lock2

            process1 is open by fid
            process3 is getattr

            in other words, as 2 locks are taken not atomically, you must guarantee nobody can take a conflict for 1st lock in between. otherwise you need either:

            • full reprocess
            • reordering on waiting list
            • make these 2 enqueue atomic
            • take 1 common lock with all the ibits

            btw, why the last option was not done originally ?

            as it can deadlock without HSM, I would consider it as a blocker.

            vitaly_fertman Vitaly Fertman added a comment - Andreas, it was hit during testing. process1.lock1: open|lookup, granted process2.lock1: layout | XXX, granted process3.lock1: lookup | XXX, waiting process1.lock1 process1.lock2: layout, waiting process2.lock1 process2.lock1: cancelled, reprocessing does not reach process1.lock2 process1 is open by fid process3 is getattr in other words, as 2 locks are taken not atomically, you must guarantee nobody can take a conflict for 1st lock in between. otherwise you need either: full reprocess reordering on waiting list make these 2 enqueue atomic take 1 common lock with all the ibits btw, why the last option was not done originally ? as it can deadlock without HSM, I would consider it as a blocker.

            Andriy wrote in LU-4152:

            LU-1876 adds mdt_object_open_lock() which acquires lock in 2 steps for layout locks.
            A deadlock is possible since it isn't atomic and ibits locks are reprocessed until first blocking lock found.

            Such situation was hit with mdt_reint_open() & mdt_intent_getattr()

            mdt_reint_open()->mdt_open_by_fid_lock() takes first part of the lock (ibits=5),
            mdt_intent_getattr() tries to obtain lock (ibits=17)
            mdt_open_by_fid_lock() tries to obtain second part but fails due to some conflict with another layout lock2. During cancellation of lock2 only getattr lock is reprocessed.
            http://review.whamcloud.com/#/c/7148/1 can help, but it is better to fix mdt_open_by_fid_lock()

            Andriy, was this problem actually hit during testing, or was this problem found by code inspection?

            adilger Andreas Dilger added a comment - Andriy wrote in LU-4152 : LU-1876 adds mdt_object_open_lock() which acquires lock in 2 steps for layout locks. A deadlock is possible since it isn't atomic and ibits locks are reprocessed until first blocking lock found. Such situation was hit with mdt_reint_open() & mdt_intent_getattr() mdt_reint_open()->mdt_open_by_fid_lock() takes first part of the lock (ibits=5), mdt_intent_getattr() tries to obtain lock (ibits=17) mdt_open_by_fid_lock() tries to obtain second part but fails due to some conflict with another layout lock2. During cancellation of lock2 only getattr lock is reprocessed. http://review.whamcloud.com/#/c/7148/1 can help, but it is better to fix mdt_open_by_fid_lock() Andriy, was this problem actually hit during testing, or was this problem found by code inspection?
            jhammond John Hammond added a comment -

            This issue was fixed for 2.5.0 and can be closed now.

            jhammond John Hammond added a comment - This issue was fixed for 2.5.0 and can be closed now.
            jhammond John Hammond added a comment -

            Landed after being improved per comments on gerrit.

            jhammond John Hammond added a comment - Landed after being improved per comments on gerrit.

            Should Change, 7148 be landed or abandoned?

            jlevi Jodi Levi (Inactive) added a comment - Should Change, 7148 be landed or abandoned?

            sanity-hsm #33 hits the same bug, but was not designed to test concurrent access to file during the restore phase. We also today do no test rename/rm during restore.

            jcl jacques-charles lafoucriere added a comment - sanity-hsm #33 hits the same bug, but was not designed to test concurrent access to file during the restore phase. We also today do no test rename/rm during restore.

            We already have such test. sanity-hsm #33 deadlock was hitting this bug. John's patch was fixing hit. I will confirm that the latest coordinator, without John's patch do not trig this deadlock anymore on monday, but I'm confident.

            adegremont Aurelien Degremont (Inactive) added a comment - We already have such test. sanity-hsm #33 deadlock was hitting this bug. John's patch was fixing hit. I will confirm that the latest coordinator, without John's patch do not trig this deadlock anymore on monday, but I'm confident.

            We will add sanity-hsm tests for the 2 simple use cases. Will be safer for futures changes.

            jcl jacques-charles lafoucriere added a comment - We will add sanity-hsm tests for the 2 simple use cases. Will be safer for futures changes.
            jhammond John Hammond added a comment -

            Since the removal of UPDATE lock use from the coordinator, I can no longer reproduce these issues.

            jhammond John Hammond added a comment - Since the removal of UPDATE lock use from the coordinator, I can no longer reproduce these issues.
            jhammond John Hammond added a comment - - edited

            A similar hang can be triggered by trying to read a file while a restore is still running. To see this add --bandwidth=1 to the copytool options and do:

            # cd /mnt/lustre
            # dd if=/dev/urandom of=f0 bs=1M count=10
            # lfs hsm_archive f0
            # # Wait for archive to complete.
            # sleep 15
            # lfs hsm_release f0
            # lfs hsm_restore f0
            # cat f0 > /dev/null
            

            This is addresses by the http://review.whamcloud.com/#/c/7148/.

            However even with the latest version (patch set 9) of http://review.whamcloud.com/#/c/6912/ we have an easily exploited race between restore and rename which is not addressed by the change in 7148. Rename onto during restore will hang:

            cd /mnt/lustre
            dd if=/dev/urandom of=f0 bs=1M count=10
            lfs hsm_archive f0
            # Wait for archive to complete.
            sleep 15
            lfs hsm_state f0
            lfs hsm_release f0
            lfs hsm_restore f0; touch f1; sys_rename f1 f0
            

            Since this rename takes MDS_INODELOCK_FULL on f0, I doubt that the choice of using LAYOUT, UPDATE, or other in hsm_get_md_attr() matters very much. But I could be wrong.

            jhammond John Hammond added a comment - - edited A similar hang can be triggered by trying to read a file while a restore is still running. To see this add --bandwidth=1 to the copytool options and do: # cd /mnt/lustre # dd if=/dev/urandom of=f0 bs=1M count=10 # lfs hsm_archive f0 # # Wait for archive to complete. # sleep 15 # lfs hsm_release f0 # lfs hsm_restore f0 # cat f0 > /dev/null This is addresses by the http://review.whamcloud.com/#/c/7148/ . However even with the latest version (patch set 9) of http://review.whamcloud.com/#/c/6912/ we have an easily exploited race between restore and rename which is not addressed by the change in 7148. Rename onto during restore will hang: cd /mnt/lustre dd if=/dev/urandom of=f0 bs=1M count=10 lfs hsm_archive f0 # Wait for archive to complete. sleep 15 lfs hsm_state f0 lfs hsm_release f0 lfs hsm_restore f0; touch f1; sys_rename f1 f0 Since this rename takes MDS_INODELOCK_FULL on f0, I doubt that the choice of using LAYOUT, UPDATE, or other in hsm_get_md_attr() matters very much. But I could be wrong.
            jhammond John Hammond added a comment -

            Please see http://review.whamcloud.com/7148 for the LDLM patch we discussed.

            jhammond John Hammond added a comment - Please see http://review.whamcloud.com/7148 for the LDLM patch we discussed.

            People

              jay Jinshan Xiong (Inactive)
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: