Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5395

lfsck_start not progressing

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.7.0
    • Lustre 2.6.0
    • OpenSFS cluster with 2 MDSs with 2MDTs each, 4 OSSs with two OSTs each
    • 3
    • 15018

    Description

      I was running the Small files create performance impact by LFSCK portion of the LFSCK Phase II test plan (LU-3423) and noticed that the speed limit flag was not working as expected.

      I ran:

      # lctl lfsck_start -M scratch-MDT0000 -A --reset --type layout  -s 1379
      Started LFSCK on the device scratch-MDT0000: scrub layout
      

      With the following results:

      # lctl get_param -n mdd.scratch-MDT0000.lfsck_layout
      
      name: lfsck_layout
      magic: 0xb173ae14
      version: 2
      status: completed
      flags:
      param: all_targets
      time_since_last_completed: 5 seconds
      time_since_latest_start: 2263 seconds
      time_since_last_checkpoint: 5 seconds
      latest_start_position: 0
      last_checkpoint_position: 241696769
      first_failure_position: 0
      success_count: 75
      repaired_dangling: 0
      repaired_unmatched_pair: 6400010
      repaired_multiple_referenced: 0
      repaired_orphan: 0
      repaired_inconsistent_owner: 0
      repaired_others: 0
      skipped: 0
      failed_phase1: 0
      failed_phase2: 0
      checked_phase1: 6912081
      checked_phase2: 0
      run_time_phase1: 2258 seconds
      run_time_phase2: 0 seconds
      average_speed_phase1: 3061 items/sec
      average_speed_phase2: 0 objs/sec
      real-time_speed_phase1: N/A
      real-time_speed_phase2: N/A
      current_position: N/A
      

      After that, I started running 'lctl lfsck_start" on the MDS with different values for the speed limit (-s). After a couple of 'lctl lfsck_start', LFSCK is stuck in scanning-phase-1 for 16 or so hours. Currently, I see:

      status: scanning-phase1
      

      for both 'lctl get_param mdd.scratch-MDT0000.lfsck_namespace' and 'lctl get_param mdd.scratch-MDT0000.lfsck_namespace' .

      I ran

      # lctl lfsck_stop -M scratch-MDT0000
      

      but it hasn't returned in the past 30 minutes.

      Prior to running lfsck_stop, dmesg on mds01, where the 'lctl lfsck_start' and 'lctl lfsck_stop' commands were run, I see the following:

      LNet: Service thread pid 32564 was inactive for 0.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      LNet: Service thread pid 32564 completed after 0.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      Pid: 32564, comm: mdt02_003
      
      Call Trace:
       [<ffffffffa0576561>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
       [<ffffffffa090ffc8>] ? ptlrpc_server_normal_pending+0x38/0xc0 [ptlrpc]
       [<ffffffffa0911565>] ptlrpc_wait_event+0x2c5/0x2d0 [ptlrpc]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa091ad9f>] ptlrpc_main+0x84f/0x1980 [ptlrpc]
       [<ffffffffa091a550>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      
      LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x200000000:731978560 ost_idx:4294936591
      LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 3 previous similar messages
      LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:901663376 ost_idx:4294936589
      LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 3 previous similar messages
      LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:901663376 ost_idx:4294936589
      LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 7 previous similar messages
      LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:901663376 ost_idx:4294936589
      LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 11 previous similar messages
      LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x500000000:901663376 ost_idx:4294936589
      LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 19 previous similar messages
      LustreError: 26090:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:1515870810 ost_idx:1515870810
      LustreError: 26090:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 19 previous similar messages
      LustreError: 1262:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:3564759104 ost_idx:4294936583
      LustreError: 1262:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 3 previous similar messages
      INFO: task lfsck:2905 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      lfsck         D 000000000000000a     0  2905      2 0x00000080
       ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
       0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
       ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
      Call Trace:
       [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
       [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
       [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
       [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
       [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
       [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
       [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
       [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
       [<ffffffff81528cae>] ? thread_return+0x4e/0x760
       [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      INFO: task lfsck:2905 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      lfsck         D 000000000000000a     0  2905      2 0x00000080
       ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
       0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
       ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
      Call Trace:
       [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
       [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
       [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
       [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
       [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
       [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
       [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
       [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
       [<ffffffff81528cae>] ? thread_return+0x4e/0x760
       [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      INFO: task lfsck:2905 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      lfsck         D 000000000000000a     0  2905      2 0x00000080
       ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
       0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
       ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
      Call Trace:
       [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
       [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
       [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
       [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
       [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
       [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
       [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
       [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
       [<ffffffff81528cae>] ? thread_return+0x4e/0x760
       [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      INFO: task lfsck:2905 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      lfsck         D 000000000000000a     0  2905      2 0x00000080
       ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
       0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
       ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
      Call Trace:
       [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
       [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
       [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
       [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
       [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
       [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
       [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
       [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
       [<ffffffff81528cae>] ? thread_return+0x4e/0x760
       [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      INFO: task lfsck:2905 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      lfsck         D 000000000000000a     0  2905      2 0x00000080
       ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
       0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
       ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
      Call Trace:
       [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
       [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
       [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
       [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
       [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
       [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
       [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
       [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
       [<ffffffff81528cae>] ? thread_return+0x4e/0x760
       [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      INFO: task lfsck:2905 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      lfsck         D 000000000000000a     0  2905      2 0x00000080
       ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
       0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
       ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
      Call Trace:
       [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
       [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
       [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
       [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
       [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
       [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
       [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
       [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
       [<ffffffff81528cae>] ? thread_return+0x4e/0x760
       [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      INFO: task lfsck:2905 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      lfsck         D 000000000000000a     0  2905      2 0x00000080
       ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
       0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
       ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
      Call Trace:
       [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
       [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
       [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
       [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
       [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
       [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
       [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
       [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
       [<ffffffff81528cae>] ? thread_return+0x4e/0x760
       [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      INFO: task lfsck:2905 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      lfsck         D 000000000000000a     0  2905      2 0x00000080
       ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
       0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
       ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
      Call Trace:
       [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
       [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
       [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
       [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
       [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
       [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
       [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
       [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
       [<ffffffff81528cae>] ? thread_return+0x4e/0x760
       [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      INFO: task lfsck:2905 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      lfsck         D 000000000000000a     0  2905      2 0x00000080
       ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
       0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
       ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
      Call Trace:
       [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
       [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
       [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
       [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
       [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
       [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
       [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
       [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
       [<ffffffff81528cae>] ? thread_return+0x4e/0x760
       [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      INFO: task lfsck:2905 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      lfsck         D 000000000000000a     0  2905      2 0x00000080
       ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
       0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
       ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
      Call Trace:
       [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
       [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
       [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
       [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
       [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
       [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
       [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
       [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
       [<ffffffff81528cae>] ? thread_return+0x4e/0x760
       [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      Attachments

        1. messages
          787 kB
        2. 07Aug_lfsck_log1.txt
          0.2 kB
        3. 07Aug_messages.txt
          889 kB
        4. 07Aug_lfsck_log2.txt.gz
          0.2 kB

        Issue Links

          Activity

            People

              yong.fan nasf (Inactive)
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: