Details

    • Technical task
    • Resolution: Fixed
    • Major
    • Lustre 2.6.0, Lustre 2.5.1
    • Lustre 2.5.0
    • 9136

    Description

      Running the HSM stack as of July 15 2013, I see a hang when a release is issued while a restore is still running. To reproduce I run the following:

      #!/bin/bash
      
      export MOUNT_2=n
      export MDSCOUNT=1
      export PTLDEBUG="super inode ioctl warning dlmtrace error emerg ha rpctrace vfstrace config console"
      export DEBUG_SIZE=512
      
      hsm_root=/tmp/hsm_root
      
      rm -rf $hsm_root
      mkdir $hsm_root
      
      llmount.sh
      
      lctl conf_param lustre-MDT0000.mdt.hsm_control=enabled
      # lctl conf_param lustre-MDT0001.mdt.hsm_control=enabled
      sleep 10
      lhsmtool_posix --verbose --hsm_root=$hsm_root --bandwidth 1 lustre
      
      lctl dk > ~/hsm-0-mount.dk
      
      set -x
      cd /mnt/lustre
      lfs setstripe -c2 f0
      dd if=/dev/urandom of=f0 bs=1M count=100
      lctl dk > ~/hsm-1-dd.dk
      
      lfs hsm_archive f0
      sleep 10
      echo > /proc/fs/lustre/ldlm/dump_namespaces
      lctl dk > ~/hsm-2-archive.dk
      
      lfs hsm_release f0
      echo > /proc/fs/lustre/ldlm/dump_namespaces
      lctl dk > ~/hsm-3-release.dk
      
      lfs hsm_restore f0
      echo > /proc/fs/lustre/ldlm/dump_namespaces
      lctl dk > ~/hsm-4-restore.dk
      
      lfs hsm_release f0
      

      with the last command never returning. The MDS_CLOSE handler looks like

      10070
      [<ffffffffa0f9866e>] cfs_waitq_wait+0xe/0x10 [libcfs]
      [<ffffffffa124826a>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
      [<ffffffffa1247920>] ldlm_cli_enqueue_local+0x1f0/0x5c0 [ptlrpc]
      [<ffffffffa08cee3b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      [<ffffffffa08cf6b4>] mdt_object_lock+0x14/0x20 [mdt]
      [<ffffffffa08f9551>] mdt_mfd_close+0x351/0xde0 [mdt]
      [<ffffffffa08fb372>] mdt_close+0x662/0xa60 [mdt]
      [<ffffffffa08d2c07>] mdt_handle_common+0x647/0x16d0 [mdt]
      [<ffffffffa090c9e5>] mds_readpage_handle+0x15/0x20 [mdt]
      [<ffffffffa12813d8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      [<ffffffffa128275d>] ptlrpc_main+0xabd/0x1700 [ptlrpc]
      [<ffffffff81096936>] kthread+0x96/0xa0
      [<ffffffff8100c0ca>] child_rip+0xa/0x20
      [<ffffffffffffffff>] 0xffffffffffffffff
      

      while the MDS_HSM_PROGRESS handler looks like:

      10065
      [<ffffffffa0f9866e>] cfs_waitq_wait+0xe/0x10 [libcfs]
      [<ffffffffa124826a>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
      [<ffffffffa1247920>] ldlm_cli_enqueue_local+0x1f0/0x5c0 [ptlrpc]
      [<ffffffffa08cee3b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      [<ffffffffa08cf6b4>] mdt_object_lock+0x14/0x20 [mdt]
      [<ffffffffa08cf721>] mdt_object_find_lock+0x61/0x170 [mdt]
      [<ffffffffa091dc22>] hsm_get_md_attr+0x62/0x270 [mdt]
      [<ffffffffa0923253>] mdt_hsm_update_request_state+0x4d3/0x1c20 [mdt]
      [<ffffffffa091ae6e>] mdt_hsm_coordinator_update+0x3e/0xe0 [mdt]
      [<ffffffffa090931b>] mdt_hsm_progress+0x21b/0x330 [mdt]
      [<ffffffffa08d2c07>] mdt_handle_common+0x647/0x16d0 [mdt]
      [<ffffffffa090ca05>] mds_regular_handle+0x15/0x20 [mdt]
      [<ffffffffa12813d8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      [<ffffffffa128275d>] ptlrpc_main+0xabd/0x1700 [ptlrpc]
      [<ffffffff81096936>] kthread+0x96/0xa0
      [<ffffffff8100c0ca>] child_rip+0xa/0x20
      [<ffffffffffffffff>] 0xffffffffffffffff
      

      The close handler is waiting on an EX layout lock on f0. While the
      progress handler is waiting on PW update lock on f0. dump_namespaces does not show that the UPDATE lock is granted.

      For reference I'm using the following changes:

      # LU-2919 hsm: Implementation of exclusive open
      # http://review.whamcloud.com/#/c/6730
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/30/6730/13 && git cherry-pick FETCH_HEAD
       
      # LU-1333 hsm: Add hsm_release feature.
      # http://review.whamcloud.com/#/c/6526
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/26/6526/9 && git cherry-pick FETCH_HEAD
       
      # LU-3339 mdt: HSM on disk actions record
      # http://review.whamcloud.com/#/c/6529
      # MERGED
       
      # LU-3340 mdt: HSM memory requests management
      # http://review.whamcloud.com/#/c/6530
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/30/6530/8 && git cherry-pick FETCH_HEAD
       
      # LU-3341 mdt: HSM coordinator client interface
      # http://review.whamcloud.com/#/c/6532
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/32/6532/13 && git cherry-pick FETCH_HEAD
      # Needs rebase in sanity-hsm.sh
       
      # LU-3342 mdt: HSM coordinator agent interface
      # http://review.whamcloud.com/#/c/6534
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/34/6534/8 && git cherry-pick FETCH_HEAD
       
      # LU-3343 mdt: HSM coordinator main thread
      # http://review.whamcloud.com/#/c/6912
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/12/6912/3 && git cherry-pick FETCH_HEAD
      # lustre/mdt/mdt_internal.h
       
      # LU-3561 tests: HSM sanity test suite
      # http://review.whamcloud.com/#/c/6913/
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/13/6913/4 && git cherry-pick FETCH_HEAD
      # lustre/tests/sanity-hsm.sh
       
      # LU-3432 llite: Access to released file trigs a restore
      # http://review.whamcloud.com/#/c/6537
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/37/6537/11 && git cherry-pick FETCH_HEAD
       
      # LU-3363 api: HSM import uses new released pattern
      # http://review.whamcloud.com/#/c/6536
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/36/6536/8 && git cherry-pick FETCH_HEAD
       
      # LU-2062 utils: HSM Posix CopyTool
      # http://review.whamcloud.com/#/c/4737
      git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/37/4737/18 && git cherry-pick FETCH_HEAD
      

      Attachments

        Issue Links

          Activity

            [LU-3601] HSM release causes running restore to hang, hangs itself

            Patch http://review.whamcloud.com/8084 was landed under this bug, but is not reported here.

            adilger Andreas Dilger added a comment - Patch http://review.whamcloud.com/8084 was landed under this bug, but is not reported here.

            this is fixed in LU-4152

            jay Jinshan Xiong (Inactive) added a comment - this is fixed in LU-4152

            Moving conversation about patches to LU-4152; latest is there.

            paf Patrick Farrell (Inactive) added a comment - Moving conversation about patches to LU-4152 ; latest is there.

            Oleg - We hit it while testing NFS exported Lustre during a large-ish test run, with tests drawn primarily from the Linux Test Project. The problem is we don't always hit it with the same test.

            The test engineer who's been handling it thinks a way to hit it is concurrent runs of fsx-linux with different command line options. Those are being run against an NFS export of Lustre.
            He's going to try to pin that down this afternoon, I'll update if he's able to be more specific.

            paf Patrick Farrell (Inactive) added a comment - Oleg - We hit it while testing NFS exported Lustre during a large-ish test run, with tests drawn primarily from the Linux Test Project. The problem is we don't always hit it with the same test. The test engineer who's been handling it thinks a way to hit it is concurrent runs of fsx-linux with different command line options. Those are being run against an NFS export of Lustre. He's going to try to pin that down this afternoon, I'll update if he's able to be more specific.
            green Oleg Drokin added a comment -

            Patrick: what's your exact reproducer to hit this? We are so far unable to hit it ourselves

            green Oleg Drokin added a comment - Patrick: what's your exact reproducer to hit this? We are so far unable to hit it ourselves

            Links to Oleg's patches (which all reference this issue) may be found in the comments on LU-4152.

            jhammond John Hammond added a comment - Links to Oleg's patches (which all reference this issue) may be found in the comments on LU-4152 .

            just an update - Oleg is creating a patch for this issue.

            jay Jinshan Xiong (Inactive) added a comment - just an update - Oleg is creating a patch for this issue.

            Jinshan - This was originally a Cray bug (thank you Andriy and Vitaly for bringing this up), which I've been tracking.

            I think eliminating the case where 2 locks are taken non-atomically is key long term. If you're planning to do that, then that sounds good.
            If you're planning to only do it in certain cases, are you completely sure we don't have another possible live lock?

            I'd back Vitaly's suggestion that it be a blocker. We're able to trigger it during testing of NFS export, presumably because of the open_by_fid operations caused by NFS export.

            paf Patrick Farrell (Inactive) added a comment - Jinshan - This was originally a Cray bug (thank you Andriy and Vitaly for bringing this up), which I've been tracking. I think eliminating the case where 2 locks are taken non-atomically is key long term. If you're planning to do that, then that sounds good. If you're planning to only do it in certain cases, are you completely sure we don't have another possible live lock? I'd back Vitaly's suggestion that it be a blocker. We're able to trigger it during testing of NFS export, presumably because of the open_by_fid operations caused by NFS export.

            Indeed, this is a live lock case.

            To clarify, the process1 must be writing an empty file without layout, so writing will cause new layout to be created.

            btw, why the last option was not done originally ?

            The reason for me to not acquire one common lock is that we have to acquire EX mode for layout lock which will be too strong for lookup and open lock since they have to share the same DLM lock.

            Though patch 7148 can fix this problem, acquiring 2 locks in a row is generally bad. Therefore, I'll fix by acquiring one lock with EX mode for the above case, however, this lock won't be returned to client side. As a result, the process will not cache this specific open. This is good as it will happen rarely.

            How do you guys think?

            jay Jinshan Xiong (Inactive) added a comment - Indeed, this is a live lock case. To clarify, the process1 must be writing an empty file without layout, so writing will cause new layout to be created. btw, why the last option was not done originally ? The reason for me to not acquire one common lock is that we have to acquire EX mode for layout lock which will be too strong for lookup and open lock since they have to share the same DLM lock. Though patch 7148 can fix this problem, acquiring 2 locks in a row is generally bad. Therefore, I'll fix by acquiring one lock with EX mode for the above case, however, this lock won't be returned to client side. As a result, the process will not cache this specific open. This is good as it will happen rarely. How do you guys think?

            Andreas, it was hit during testing.

            process1.lock1: open|lookup, granted
            process2.lock1: layout | XXX, granted
            process3.lock1: lookup | XXX, waiting process1.lock1
            process1.lock2: layout, waiting process2.lock1
            process2.lock1: cancelled, reprocessing does not reach process1.lock2

            process1 is open by fid
            process3 is getattr

            in other words, as 2 locks are taken not atomically, you must guarantee nobody can take a conflict for 1st lock in between. otherwise you need either:

            • full reprocess
            • reordering on waiting list
            • make these 2 enqueue atomic
            • take 1 common lock with all the ibits

            btw, why the last option was not done originally ?

            as it can deadlock without HSM, I would consider it as a blocker.

            vitaly_fertman Vitaly Fertman added a comment - Andreas, it was hit during testing. process1.lock1: open|lookup, granted process2.lock1: layout | XXX, granted process3.lock1: lookup | XXX, waiting process1.lock1 process1.lock2: layout, waiting process2.lock1 process2.lock1: cancelled, reprocessing does not reach process1.lock2 process1 is open by fid process3 is getattr in other words, as 2 locks are taken not atomically, you must guarantee nobody can take a conflict for 1st lock in between. otherwise you need either: full reprocess reordering on waiting list make these 2 enqueue atomic take 1 common lock with all the ibits btw, why the last option was not done originally ? as it can deadlock without HSM, I would consider it as a blocker.

            People

              jay Jinshan Xiong (Inactive)
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: