Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3647 HSM _not only_ small fixes and to do list goes here
  3. LU-3884

hsm_release hang at local root finding with quota enabled

    XMLWordPrintable

Details

    • Technical task
    • Resolution: Fixed
    • Blocker
    • Lustre 2.5.0
    • None
    • 10098

    Description

      I did quota test today and found a problem with hsm_release. The test script is as follows:

      #!/bin/bash
      
      setup() {
              ( cd srcs/lustre/lustre/tests; sh llmount.sh )
      
              lctl set_param mdt.*.hsm_control=enabled
      
              rm -rf /tmp/arc
              mkdir /tmp/arc
              ~/srcs/lustre/lustre/utils/lhsmtool_posix --daemon --hsm-root /tmp/arc /mnt/lustre
      
              lctl conf_param lustre.quota.ost=u
              lctl conf_param lustre.quota.mdt=u
      }
      
      LFS=~/srcs/lustre/lustre/utils/lfs
      file=/mnt/lustre/testfile
      
      setup
      
      rm -f $file
      dd if=/dev/zero of=$file bs=1M count=30
      chown tstusr.tstusr $file
      
      set -x
      
      $LFS hsm_archive $file
      while $LFS hsm_state $file |grep -qv archived; do
              sleep 1
      done
      $LFS hsm_state $file
      
      lctl set_param debug=-1
      lctl set_param debug_mb=500
      lctl dk > /dev/null
      
      count=0
      while :; do
              lctl mark "############# $count"
              count=$((count+1))
      
              $LFS hsm_release $file
              $LFS hsm_state $file
      
              $LFS hsm_restore $file
              $LFS hsm_state $file
      
              sleep 1
      done
      

      The output on the console before the script hung:

      + /Users/jinxiong/srcs/lustre/lustre/utils/lfs hsm_state /mnt/lustre/testfile
      + grep -qv archived
      + /Users/jinxiong/srcs/lustre/lustre/utils/lfs hsm_state /mnt/lustre/testfile
      /mnt/lustre/testfile: (0x00000009) exists archived, archive_id:1
      + lctl set_param debug=-1
      debug=-1
      + lctl set_param debug_mb=500
      debug_mb=500
      + lctl dk
      + count=0
      + :
      + lctl mark '############# 0'
      + count=1
      + /Users/jinxiong/srcs/lustre/lustre/utils/lfs hsm_release /mnt/lustre/testfile
      

      It looks like the mdt thread was hung at finding local root object, for unknown reason, the local root object was being deleted. This sounds impossible but happened:

      LNet: Service thread pid 2945 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Pid: 2945, comm: mdt_rdpg00_001
      
      Call Trace:
       [<ffffffffa03c466e>] cfs_waitq_wait+0xe/0x10 [libcfs]
       [<ffffffffa056ffa7>] lu_object_find_at+0xb7/0x360 [obdclass]
       [<ffffffff81063410>] ? default_wake_function+0x0/0x20
       [<ffffffffa0570266>] lu_object_find+0x16/0x20 [obdclass]
       [<ffffffffa0bf5b16>] mdt_object_find+0x56/0x170 [mdt]
       [<ffffffffa0c264ef>] mdt_mfd_close+0x15ef/0x1b60 [mdt]
       [<ffffffffa03d3900>] ? libcfs_debug_vmsg2+0xba0/0xbb0 [libcfs]
       [<ffffffffa0c27e32>] mdt_close+0x682/0xac0 [mdt]
       [<ffffffffa0bffa4a>] mdt_handle_common+0x52a/0x1470 [mdt]
       [<ffffffffa0c39365>] mds_readpage_handle+0x15/0x20 [mdt]
       [<ffffffffa0709a55>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
       [<ffffffffa03c454e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
       [<ffffffffa03d540f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
       [<ffffffffa03d3951>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
       [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
       [<ffffffffa070ad9d>] ptlrpc_main+0xacd/0x1710 [ptlrpc]
       [<ffffffffa070a2d0>] ? ptlrpc_main+0x0/0x1710 [ptlrpc]
       [<ffffffff81096a36>] kthread+0x96/0xa0
       [<ffffffff8100c0ca>] child_rip+0xa/0x20
       [<ffffffff810969a0>] ? kthread+0x0/0xa0
       [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      

      I suspect this issue is related to quota because if I turned quota off everything became all right.

      Attachments

        Activity

          People

            niu Niu Yawei (Inactive)
            jay Jinshan Xiong (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: