Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8010

lfs hsm command hangs up after lfs hsm_cancel

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.8.0
    • 3
    • HSM
    • 9223372036854775807

    Description

      When hsm_restore are canceled, the following lfs hsm command hangs up.
      To reproduce, setup HSM and do the following
      (We found 2 patterns. They seem to be hanged at different points.):

      (1)

      # lfs hsm_state /lustre/file
      /lustre/file: (0x0000000d) released exists archived, archive_id:1
      # lfs hsm_restore /lustre/file
      # lfs hsm_cancel /lustre/file
      # lfs hsm_restore /lustre/file (quickly after lfs hsm_cancel)
      # lfs hsm_restore /lustre/file  *hang up
      

      The call trace is as following:

      PID: 9550   TASK: ffff880c33f86040  CPU: 13  COMMAND: "lfs"
       #0 [ffff880c1ca8f568] schedule at ffffffff81539170
       #1 [ffff880c1ca8f640] schedule_timeout at ffffffff8153a042
       #2 [ffff880c1ca8f6f0] ldlm_completion_ast at ffffffffa0847fc9 [ptlrpc]
       #3 [ffff880c1ca8f7a0] ldlm_cli_enqueue_fini at ffffffffa08420e6 [ptlrpc]
       #4 [ffff880c1ca8f840] ldlm_cli_enqueue at ffffffffa08429a1 [ptlrpc]
       #5 [ffff880c1ca8f8f0] mdc_enqueue at ffffffffa0a6e8aa [mdc]
       #6 [ffff880c1ca8fa40] lmv_enqueue at ffffffffa0a25bfb [lmv]
       #7 [ffff880c1ca8fac0] ll_layout_refresh_locked at ffffffffa0b084f6 [lustre]
       #8 [ffff880c1ca8fc00] ll_layout_refresh at ffffffffa0b09159 [lustre]
       #9 [ffff880c1ca8fc50] vvp_io_init at ffffffffa0b5227f [lustre]
      #10 [ffff880c1ca8fcc0] cl_io_init0 at ffffffffa06a8e78 [obdclass]
      #11 [ffff880c1ca8fd00] cl_io_init at ffffffffa06abdf4 [obdclass]
      #12 [ffff880c1ca8fd40] cl_glimpse_size0 at ffffffffa0b4bc05 [lustre]
      #13 [ffff880c1ca8fda0] ll_getattr at ffffffffa0b07cb8 [lustre]
      #14 [ffff880c1ca8fe40] vfs_getattr at ffffffff81197c61
      #15 [ffff880c1ca8fe80] vfs_fstatat at ffffffff81197cf4
      #16 [ffff880c1ca8fee0] vfs_lstat at ffffffff81197d9e
      #17 [ffff880c1ca8fef0] sys_newlstat at ffffffff81197dc4
      #18 [ffff880c1ca8ff80] tracesys at ffffffff8100b2e8 (via system_call)
      

      (2)

      # lfs hsm_state /lustre/file
      /lustre/file: (0x0000000d) released exists archived, archive_id:1
      # lfs hsm_restore /lustre/file
      <MDS Immediate reset>
      <MDS recover and HSM setup>
      # lfs hsm_action /lustre/file
      /lustre/file: RESTORE waiting (from 0 to EOF)
      # lfs hsm_state /lustre/file
      /lustre/file: (0x0000000d) released exists archived, archive_id:1
      # lfs hsm_cancel /lustre/file
      # lfs hsm_restore /lustre/file   *hang up
      

      The call trace is as following:

      PID: 3731   TASK: ffff880c35087520  CPU: 13  COMMAND: "lfs"
       #0 [ffff880c327df598] schedule at ffffffff81539170
       #1 [ffff880c327df670] schedule_timeout at ffffffff8153a042
       #2 [ffff880c327df720] ptlrpc_set_wait at ffffffffa0860c41 [ptlrpc]
       #3 [ffff880c327df7e0] ptlrpc_queue_wait at ffffffffa0861301 [ptlrpc]
       #4 [ffff880c327df800] mdc_ioc_hsm_request at ffffffffa0a65138 [mdc]
       #5 [ffff880c327df830] mdc_iocontrol at ffffffffa0a66df9 [mdc]
       #6 [ffff880c327df950] obd_iocontrol at ffffffffa0a16aa5 [lmv]
       #7 [ffff880c327df9a0] lmv_iocontrol at ffffffffa0a2d9dc [lmv]
       #8 [ffff880c327dfb90] obd_iocontrol at ffffffffa0af2d15 [lustre]
       #9 [ffff880c327dfbe0] ll_dir_ioctl at ffffffffa0af9ae8 [lustre]
      #10 [ffff880c327dfe60] vfs_ioctl at ffffffff811a7972
      #11 [ffff880c327dfea0] do_vfs_ioctl at ffffffff811a7b14
      #12 [ffff880c327dff30] sys_ioctl at ffffffff811a8091
      #13 [ffff880c327dff80] tracesys at ffffffff8100b2e8 (via system_call)
      

      Attachments

        Activity

          [LU-8010] lfs hsm command hangs up after lfs hsm_cancel
          pjones Peter Jones added a comment -

          Landed for 2.9

          pjones Peter Jones added a comment - Landed for 2.9

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19710/
          Subject: LU-8010 mdt: fix orphan layout_lock cases for restore
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: c13ddec8e1cd3a63c16e08f28749771200b92f1b

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19710/ Subject: LU-8010 mdt: fix orphan layout_lock cases for restore Project: fs/lustre-release Branch: master Current Patch Set: Commit: c13ddec8e1cd3a63c16e08f28749771200b92f1b

          Sorry for the late reply. I have checked with your new patch and both reproducers did not trigger a hang-up.
          Thanks

          takamura Tatsushi Takamura added a comment - Sorry for the late reply. I have checked with your new patch and both reproducers did not trigger a hang-up. Thanks

          Right, I double-checked and reproducer #2 is an other condition, leading to the same consequence (orphan layout lock), that must be fixed alternatively. In a few words, my understatnding is that a restore presently started when a MDT failure occurs must be set back in WAITING state to be replayed (if legal) upon CDT restart. This will allow to cancel it without leaving associated layout locked.
          I have pushed a new version/patch-set, let see how it works.

          bfaccini Bruno Faccini (Inactive) added a comment - Right, I double-checked and reproducer #2 is an other condition, leading to the same consequence (orphan layout lock), that must be fixed alternatively. In a few words, my understatnding is that a restore presently started when a MDT failure occurs must be set back in WAITING state to be replayed (if legal) upon CDT restart. This will allow to cancel it without leaving associated layout locked. I have pushed a new version/patch-set, let see how it works.

          Hello Bruno,

          I have confirmed that command hang-up from the reproducer(1) is resolved, but the restore command still hangs up using the the reproducer(2).
          Could you check again?

          takamura Tatsushi Takamura added a comment - Hello Bruno, I have confirmed that command hang-up from the reproducer(1) is resolved, but the restore command still hangs up using the the reproducer(2). Could you check again?

          Hello Tatsushi,
          My local testing, using the reproducers you had detailed, with my patch did not trigger a new hang situation.
          Can you give it a try at your site ?

          bfaccini Bruno Faccini (Inactive) added a comment - Hello Tatsushi, My local testing, using the reproducers you had detailed, with my patch did not trigger a new hang situation. Can you give it a try at your site ?

          Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/19710
          Subject: LU-8010 mdt: unlock layout when failing restore
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 656abdd8a342902e67dc77a86e30fb0e0a30c8da

          gerrit Gerrit Updater added a comment - Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/19710 Subject: LU-8010 mdt: unlock layout when failing restore Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 656abdd8a342902e67dc77a86e30fb0e0a30c8da

          In both reproducers cases/scenarios, the root cause is the layout_lock not being released and leading to hung threads on both Client/MDS sides during handling of HSM requests on same file.

          I think I have found a hole in HSM code that can lead to a layout_lock taken for a restore not being released. I will push a tentative patch to fix it.

          bfaccini Bruno Faccini (Inactive) added a comment - In both reproducers cases/scenarios, the root cause is the layout_lock not being released and leading to hung threads on both Client/MDS sides during handling of HSM requests on same file. I think I have found a hole in HSM code that can lead to a layout_lock taken for a restore not being released. I will push a tentative patch to fix it.

          The syslog messages is diffrecnt in the 2 patterns.

          (1)
          The following is possibly related messages in the MDS syslog.

          Apr 14 13:31:04 mds kernel: Lustre: lustre-MDT0000: Client 4e2ff398-da98-562d-99e7-cd38a8709ba2 (at 192.168.128.85@o2ib) reconnecting
          Apr 14 13:31:04 mds kernel: Lustre: lustre-MDT0000: Connection restored to ce03607c-8712-5fed-7d04-6d296439e868 (at 192.168.128.85@o2ib)
          

          The following is possibly related messages in the Client(which I run lfs command at) syslog.

          Apr 14 13:31:04 clinet kernel: Lustre: lustre-MDT0000-mdc-ffff880c35677c00: Connection to lustre-MDT0000 (at 192.168.128.81@o2ib) was lost; in progress operations using this service will wait for recovery to complete
          Apr 14 13:31:04 clinet kernel: Lustre: Skipped 1 previous similar message
          Apr 14 13:31:04 clinet kernel: LustreError: 6021:0:(ldlm_request.c:125:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1460607964, 300s ago), entering recovery for lustre-MDT0000_UUID@192.168.128.81@o2ib ns: lustre-MDT0000-mdc-ffff880c35677c00 lock: ffff880c3471dac0/0x241d687a0e4c12e9 lrc: 4/1,0 mode: --/CR res: [0x200000bd1:0x1:0x0].0x0 bits 0x8 rrc: 2 type: IBT flags: 0x0 nid: local remote: 0x936186c7472a77bf expref: -99 pid: 6021 timeout: 0 lvb_type: 3
          Apr 14 13:31:04 clinet kernel: Lustre: lustre-MDT0000-mdc-ffff880c35677c00: Connection restored to 192.168.128.81@o2ib (at 192.168.128.81@o2ib)
          

          (2)
          The following is possibly related messages in the MDS syslog.

          Apr 14 14:16:09 mds kernel: LNet: Service thread pid 3763 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
          Apr 14 14:16:09 mds kernel: Pid: 3763, comm: mdt02_004
          Apr 14 14:16:09 mds kernel:
          Apr 14 14:16:09 mds kernel: Call Trace:
          Apr 14 14:16:09 mds kernel: [<ffffffff8153a042>] schedule_timeout+0x192/0x2e0
          Apr 14 14:16:09 mds kernel: [<ffffffff81089be0>] ? process_timeout+0x0/0x10
          Apr 14 14:16:09 mds kernel: [<ffffffffa08b9e30>] ? ldlm_expired_completion_wait+0x0/0x250 [ptlrpc]
          Apr 14 14:16:09 mds kernel: [<ffffffffa08be989>] ldlm_completion_ast+0x609/0x9b0 [ptlrpc]
          Apr 14 14:16:09 mds kernel: [<ffffffff810672b0>] ? default_wake_function+0x0/0x20
          Apr 14 14:16:09 mds kernel: [<ffffffffa08bdd8e>] ldlm_cli_enqueue_local+0x21e/0x810 [ptlrpc]
          Apr 14 14:16:09 mds kernel: [<ffffffffa08be380>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
          Apr 14 14:16:09 mds kernel: [<ffffffffa1025130>] ? mdt_blocking_ast+0x0/0x2e0 [mdt]
          Apr 14 14:16:09 mds kernel: [<ffffffffa10318e4>] mdt_object_local_lock+0x3a4/0xb00 [mdt]
          Apr 14 14:16:09 mds kernel: [<ffffffffa1025130>] ? mdt_blocking_ast+0x0/0x2e0 [mdt]
          Apr 14 14:16:09 mds kernel: [<ffffffffa08be380>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
          Apr 14 14:16:09 mds kernel: [<ffffffffa06cf791>] ? lu_object_find_at+0xb1/0xe0 [obdclass]
          Apr 14 14:16:09 mds kernel: [<ffffffffa10323b3>] mdt_object_lock_internal+0x63/0x310 [mdt]
          Apr 14 14:16:09 mds kernel: [<ffffffffa1032874>] mdt_object_lock+0x14/0x20 [mdt]
          Apr 14 14:16:09 mds kernel: [<ffffffffa1032a2e>] mdt_object_find_lock+0x5e/0x170 [mdt]
          Apr 14 14:16:09 mds kernel: [<ffffffffa106a683>] mdt_hsm_add_actions+0x973/0x12b0 [mdt]
          Apr 14 14:16:09 mds kernel: [<ffffffff8129957a>] ? strlcpy+0x4a/0x60
          Apr 14 14:16:09 mds kernel: [<ffffffffa103d50f>] ? ucred_set_jobid+0x5f/0x70 [mdt]
          Apr 14 14:16:09 mds kernel: [<ffffffffa106116c>] mdt_hsm_request+0x60c/0x9b0 [mdt]
          Apr 14 14:16:09 mds kernel: [<ffffffffa094ec2c>] tgt_request_handle+0x8ec/0x1440 [ptlrpc]
          Apr 14 14:16:09 mds kernel: [<ffffffffa08fbc61>] ptlrpc_main+0xd21/0x1800 [ptlrpc]
          Apr 14 14:16:09 mds kernel: [<ffffffff815391be>] ? thread_return+0x4e/0x7d0
          Apr 14 14:16:09 mds kernel: [<ffffffffa08faf40>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
          Apr 14 14:16:09 mds kernel: [<ffffffff810a0fce>] kthread+0x9e/0xc0
          Apr 14 14:16:09 mds kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
          Apr 14 14:16:09 mds kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0
          Apr 14 14:16:09 mds kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
          Apr 14 14:16:09 mds kernel:
          Apr 14 14:16:09 mds kernel: LustreError: dumping log to /tmp/lustre-log.1460610969.3763
          Apr 14 14:17:49 mds kernel: LustreError: 3763:0:(ldlm_request.c:106:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1460610768, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-lustre-MDT0000_UUID lock: ffff880c18f31980/0x605de39e648b48a7 lrc: 3/0,1 mode: --/EX res: [0x200000402:0x1:0x0].0x0 bits 0x8 rrc: 3 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 3763 timeout: 0 lvb_type: 0
          Apr 14 14:17:49 mds kernel: LustreError: dumping log to /tmp/lustre-log.1460611069.3763
          

          There is no related messages in the Client(which I run lfs command at) syslog.

          takamura Tatsushi Takamura added a comment - The syslog messages is diffrecnt in the 2 patterns. (1) The following is possibly related messages in the MDS syslog. Apr 14 13:31:04 mds kernel: Lustre: lustre-MDT0000: Client 4e2ff398-da98-562d-99e7-cd38a8709ba2 (at 192.168.128.85@o2ib) reconnecting Apr 14 13:31:04 mds kernel: Lustre: lustre-MDT0000: Connection restored to ce03607c-8712-5fed-7d04-6d296439e868 (at 192.168.128.85@o2ib) The following is possibly related messages in the Client(which I run lfs command at) syslog. Apr 14 13:31:04 clinet kernel: Lustre: lustre-MDT0000-mdc-ffff880c35677c00: Connection to lustre-MDT0000 (at 192.168.128.81@o2ib) was lost; in progress operations using this service will wait for recovery to complete Apr 14 13:31:04 clinet kernel: Lustre: Skipped 1 previous similar message Apr 14 13:31:04 clinet kernel: LustreError: 6021:0:(ldlm_request.c:125:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1460607964, 300s ago), entering recovery for lustre-MDT0000_UUID@192.168.128.81@o2ib ns: lustre-MDT0000-mdc-ffff880c35677c00 lock: ffff880c3471dac0/0x241d687a0e4c12e9 lrc: 4/1,0 mode: --/CR res: [0x200000bd1:0x1:0x0].0x0 bits 0x8 rrc: 2 type: IBT flags: 0x0 nid: local remote: 0x936186c7472a77bf expref: -99 pid: 6021 timeout: 0 lvb_type: 3 Apr 14 13:31:04 clinet kernel: Lustre: lustre-MDT0000-mdc-ffff880c35677c00: Connection restored to 192.168.128.81@o2ib (at 192.168.128.81@o2ib) (2) The following is possibly related messages in the MDS syslog. Apr 14 14:16:09 mds kernel: LNet: Service thread pid 3763 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 14 14:16:09 mds kernel: Pid: 3763, comm: mdt02_004 Apr 14 14:16:09 mds kernel: Apr 14 14:16:09 mds kernel: Call Trace: Apr 14 14:16:09 mds kernel: [<ffffffff8153a042>] schedule_timeout+0x192/0x2e0 Apr 14 14:16:09 mds kernel: [<ffffffff81089be0>] ? process_timeout+0x0/0x10 Apr 14 14:16:09 mds kernel: [<ffffffffa08b9e30>] ? ldlm_expired_completion_wait+0x0/0x250 [ptlrpc] Apr 14 14:16:09 mds kernel: [<ffffffffa08be989>] ldlm_completion_ast+0x609/0x9b0 [ptlrpc] Apr 14 14:16:09 mds kernel: [<ffffffff810672b0>] ? default_wake_function+0x0/0x20 Apr 14 14:16:09 mds kernel: [<ffffffffa08bdd8e>] ldlm_cli_enqueue_local+0x21e/0x810 [ptlrpc] Apr 14 14:16:09 mds kernel: [<ffffffffa08be380>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] Apr 14 14:16:09 mds kernel: [<ffffffffa1025130>] ? mdt_blocking_ast+0x0/0x2e0 [mdt] Apr 14 14:16:09 mds kernel: [<ffffffffa10318e4>] mdt_object_local_lock+0x3a4/0xb00 [mdt] Apr 14 14:16:09 mds kernel: [<ffffffffa1025130>] ? mdt_blocking_ast+0x0/0x2e0 [mdt] Apr 14 14:16:09 mds kernel: [<ffffffffa08be380>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] Apr 14 14:16:09 mds kernel: [<ffffffffa06cf791>] ? lu_object_find_at+0xb1/0xe0 [obdclass] Apr 14 14:16:09 mds kernel: [<ffffffffa10323b3>] mdt_object_lock_internal+0x63/0x310 [mdt] Apr 14 14:16:09 mds kernel: [<ffffffffa1032874>] mdt_object_lock+0x14/0x20 [mdt] Apr 14 14:16:09 mds kernel: [<ffffffffa1032a2e>] mdt_object_find_lock+0x5e/0x170 [mdt] Apr 14 14:16:09 mds kernel: [<ffffffffa106a683>] mdt_hsm_add_actions+0x973/0x12b0 [mdt] Apr 14 14:16:09 mds kernel: [<ffffffff8129957a>] ? strlcpy+0x4a/0x60 Apr 14 14:16:09 mds kernel: [<ffffffffa103d50f>] ? ucred_set_jobid+0x5f/0x70 [mdt] Apr 14 14:16:09 mds kernel: [<ffffffffa106116c>] mdt_hsm_request+0x60c/0x9b0 [mdt] Apr 14 14:16:09 mds kernel: [<ffffffffa094ec2c>] tgt_request_handle+0x8ec/0x1440 [ptlrpc] Apr 14 14:16:09 mds kernel: [<ffffffffa08fbc61>] ptlrpc_main+0xd21/0x1800 [ptlrpc] Apr 14 14:16:09 mds kernel: [<ffffffff815391be>] ? thread_return+0x4e/0x7d0 Apr 14 14:16:09 mds kernel: [<ffffffffa08faf40>] ? ptlrpc_main+0x0/0x1800 [ptlrpc] Apr 14 14:16:09 mds kernel: [<ffffffff810a0fce>] kthread+0x9e/0xc0 Apr 14 14:16:09 mds kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Apr 14 14:16:09 mds kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0 Apr 14 14:16:09 mds kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 Apr 14 14:16:09 mds kernel: Apr 14 14:16:09 mds kernel: LustreError: dumping log to /tmp/lustre-log.1460610969.3763 Apr 14 14:17:49 mds kernel: LustreError: 3763:0:(ldlm_request.c:106:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1460610768, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-lustre-MDT0000_UUID lock: ffff880c18f31980/0x605de39e648b48a7 lrc: 3/0,1 mode: --/EX res: [0x200000402:0x1:0x0].0x0 bits 0x8 rrc: 3 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 3763 timeout: 0 lvb_type: 0 Apr 14 14:17:49 mds kernel: LustreError: dumping log to /tmp/lustre-log.1460611069.3763 There is no related messages in the Client(which I run lfs command at) syslog.

          Thanks for this report and the reproducer. I am presently working on a similar platform and to reproduce.
          Did you notice some possibly related messages in Server/MDS, Client, Agent syslogs?

          bfaccini Bruno Faccini (Inactive) added a comment - Thanks for this report and the reproducer. I am presently working on a similar platform and to reproduce. Did you notice some possibly related messages in Server/MDS, Client, Agent syslogs?

          People

            bfaccini Bruno Faccini (Inactive)
            takamura Tatsushi Takamura
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: