Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.6.0
-
OpenSFS cluster with 2 MDSs with 2MDTs each, 4 OSSs with two OSTs each
-
3
-
15018
Description
I was running the Small files create performance impact by LFSCK portion of the LFSCK Phase II test plan (LU-3423) and noticed that the speed limit flag was not working as expected.
I ran:
# lctl lfsck_start -M scratch-MDT0000 -A --reset --type layout -s 1379 Started LFSCK on the device scratch-MDT0000: scrub layout
With the following results:
# lctl get_param -n mdd.scratch-MDT0000.lfsck_layout name: lfsck_layout magic: 0xb173ae14 version: 2 status: completed flags: param: all_targets time_since_last_completed: 5 seconds time_since_latest_start: 2263 seconds time_since_last_checkpoint: 5 seconds latest_start_position: 0 last_checkpoint_position: 241696769 first_failure_position: 0 success_count: 75 repaired_dangling: 0 repaired_unmatched_pair: 6400010 repaired_multiple_referenced: 0 repaired_orphan: 0 repaired_inconsistent_owner: 0 repaired_others: 0 skipped: 0 failed_phase1: 0 failed_phase2: 0 checked_phase1: 6912081 checked_phase2: 0 run_time_phase1: 2258 seconds run_time_phase2: 0 seconds average_speed_phase1: 3061 items/sec average_speed_phase2: 0 objs/sec real-time_speed_phase1: N/A real-time_speed_phase2: N/A current_position: N/A
After that, I started running 'lctl lfsck_start" on the MDS with different values for the speed limit (-s). After a couple of 'lctl lfsck_start', LFSCK is stuck in scanning-phase-1 for 16 or so hours. Currently, I see:
status: scanning-phase1
for both 'lctl get_param mdd.scratch-MDT0000.lfsck_namespace' and 'lctl get_param mdd.scratch-MDT0000.lfsck_namespace' .
I ran
# lctl lfsck_stop -M scratch-MDT0000
but it hasn't returned in the past 30 minutes.
Prior to running lfsck_stop, dmesg on mds01, where the 'lctl lfsck_start' and 'lctl lfsck_stop' commands were run, I see the following:
LNet: Service thread pid 32564 was inactive for 0.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: LNet: Service thread pid 32564 completed after 0.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Pid: 32564, comm: mdt02_003 Call Trace: [<ffffffffa0576561>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [<ffffffffa090ffc8>] ? ptlrpc_server_normal_pending+0x38/0xc0 [ptlrpc] [<ffffffffa0911565>] ptlrpc_wait_event+0x2c5/0x2d0 [ptlrpc] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa091ad9f>] ptlrpc_main+0x84f/0x1980 [ptlrpc] [<ffffffffa091a550>] ? ptlrpc_main+0x0/0x1980 [ptlrpc] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x200000000:731978560 ost_idx:4294936591 LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 3 previous similar messages LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:901663376 ost_idx:4294936589 LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 3 previous similar messages LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:901663376 ost_idx:4294936589 LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 7 previous similar messages LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:901663376 ost_idx:4294936589 LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 11 previous similar messages LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x500000000:901663376 ost_idx:4294936589 LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 19 previous similar messages LustreError: 26090:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:1515870810 ost_idx:1515870810 LustreError: 26090:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 19 previous similar messages LustreError: 1262:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:3564759104 ost_idx:4294936583 LustreError: 1262:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 3 previous similar messages INFO: task lfsck:2905 blocked for more than 120 seconds. Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lfsck D 000000000000000a 0 2905 2 0x00000080 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8 Call Trace: [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs] [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass] [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck] [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod] [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs] [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck] [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck] [<ffffffff81528cae>] ? thread_return+0x4e/0x760 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task lfsck:2905 blocked for more than 120 seconds. Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lfsck D 000000000000000a 0 2905 2 0x00000080 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8 Call Trace: [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs] [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass] [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck] [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod] [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs] [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck] [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck] [<ffffffff81528cae>] ? thread_return+0x4e/0x760 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task lfsck:2905 blocked for more than 120 seconds. Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lfsck D 000000000000000a 0 2905 2 0x00000080 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8 Call Trace: [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs] [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass] [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck] [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod] [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs] [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck] [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck] [<ffffffff81528cae>] ? thread_return+0x4e/0x760 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task lfsck:2905 blocked for more than 120 seconds. Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lfsck D 000000000000000a 0 2905 2 0x00000080 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8 Call Trace: [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs] [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass] [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck] [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod] [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs] [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck] [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck] [<ffffffff81528cae>] ? thread_return+0x4e/0x760 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task lfsck:2905 blocked for more than 120 seconds. Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lfsck D 000000000000000a 0 2905 2 0x00000080 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8 Call Trace: [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs] [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass] [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck] [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod] [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs] [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck] [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck] [<ffffffff81528cae>] ? thread_return+0x4e/0x760 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task lfsck:2905 blocked for more than 120 seconds. Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lfsck D 000000000000000a 0 2905 2 0x00000080 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8 Call Trace: [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs] [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass] [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck] [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod] [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs] [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck] [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck] [<ffffffff81528cae>] ? thread_return+0x4e/0x760 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task lfsck:2905 blocked for more than 120 seconds. Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lfsck D 000000000000000a 0 2905 2 0x00000080 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8 Call Trace: [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs] [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass] [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck] [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod] [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs] [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck] [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck] [<ffffffff81528cae>] ? thread_return+0x4e/0x760 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task lfsck:2905 blocked for more than 120 seconds. Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lfsck D 000000000000000a 0 2905 2 0x00000080 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8 Call Trace: [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs] [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass] [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck] [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod] [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs] [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck] [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck] [<ffffffff81528cae>] ? thread_return+0x4e/0x760 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task lfsck:2905 blocked for more than 120 seconds. Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lfsck D 000000000000000a 0 2905 2 0x00000080 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8 Call Trace: [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs] [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass] [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck] [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod] [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs] [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck] [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck] [<ffffffff81528cae>] ? thread_return+0x4e/0x760 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task lfsck:2905 blocked for more than 120 seconds. Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lfsck D 000000000000000a 0 2905 2 0x00000080 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8 Call Trace: [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs] [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass] [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck] [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod] [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs] [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck] [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck] [<ffffffff81528cae>] ? thread_return+0x4e/0x760 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20
Attachments
Issue Links
- is related to
-
LU-4688 target_destroy_export() LBUG
-
- Resolved
-
I suspect what is wrong on the OSTs is that I set 'fail_loc=0x1614' on them and that caused some problems.
The OSS kernel logs were collected after all the LFSCK runs. So, it may not be obvious where one LFSCK ends and the next one starts. I do see several of the following messages in the OSS logs:
I've uploaded two logs to uploads; lfsck_log_oss1.txt contains the kernel log for OST0000 and OST0001 and lfsck_log_oss2.txt has the kernel log for OST0002 and OST0003