[LU-5395] lfsck_start not progressing - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.7.0
Affects Version/s: Lustre 2.6.0
Labels:
- lfsck
Environment:
OpenSFS cluster with 2 MDSs with 2MDTs each, 4 OSSs with two OSTs each

Severity:
3
Rank (Obsolete):
15018

Description

I was running the Small files create performance impact by LFSCK portion of the LFSCK Phase II test plan (~~LU-3423~~) and noticed that the speed limit flag was not working as expected.

I ran:

# lctl lfsck_start -M scratch-MDT0000 -A --reset --type layout  -s 1379
Started LFSCK on the device scratch-MDT0000: scrub layout

With the following results:

# lctl get_param -n mdd.scratch-MDT0000.lfsck_layout

name: lfsck_layout
magic: 0xb173ae14
version: 2
status: completed
flags:
param: all_targets
time_since_last_completed: 5 seconds
time_since_latest_start: 2263 seconds
time_since_last_checkpoint: 5 seconds
latest_start_position: 0
last_checkpoint_position: 241696769
first_failure_position: 0
success_count: 75
repaired_dangling: 0
repaired_unmatched_pair: 6400010
repaired_multiple_referenced: 0
repaired_orphan: 0
repaired_inconsistent_owner: 0
repaired_others: 0
skipped: 0
failed_phase1: 0
failed_phase2: 0
checked_phase1: 6912081
checked_phase2: 0
run_time_phase1: 2258 seconds
run_time_phase2: 0 seconds
average_speed_phase1: 3061 items/sec
average_speed_phase2: 0 objs/sec
real-time_speed_phase1: N/A
real-time_speed_phase2: N/A
current_position: N/A

After that, I started running 'lctl lfsck_start" on the MDS with different values for the speed limit (-s). After a couple of 'lctl lfsck_start', LFSCK is stuck in scanning-phase-1 for 16 or so hours. Currently, I see:

status: scanning-phase1

for both 'lctl get_param mdd.scratch-MDT0000.lfsck_namespace' and 'lctl get_param mdd.scratch-MDT0000.lfsck_namespace' .

I ran

# lctl lfsck_stop -M scratch-MDT0000

but it hasn't returned in the past 30 minutes.

Prior to running lfsck_stop, dmesg on mds01, where the 'lctl lfsck_start' and 'lctl lfsck_stop' commands were run, I see the following:

LNet: Service thread pid 32564 was inactive for 0.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
LNet: Service thread pid 32564 completed after 0.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
Pid: 32564, comm: mdt02_003

Call Trace:
 [<ffffffffa0576561>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa090ffc8>] ? ptlrpc_server_normal_pending+0x38/0xc0 [ptlrpc]
 [<ffffffffa0911565>] ptlrpc_wait_event+0x2c5/0x2d0 [ptlrpc]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa091ad9f>] ptlrpc_main+0x84f/0x1980 [ptlrpc]
 [<ffffffffa091a550>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x200000000:731978560 ost_idx:4294936591
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 3 previous similar messages
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:901663376 ost_idx:4294936589
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 3 previous similar messages
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:901663376 ost_idx:4294936589
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 7 previous similar messages
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:901663376 ost_idx:4294936589
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 11 previous similar messages
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x500000000:901663376 ost_idx:4294936589
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 19 previous similar messages
LustreError: 26090:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:1515870810 ost_idx:1515870810
LustreError: 26090:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 19 previous similar messages
LustreError: 1262:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:3564759104 ost_idx:4294936583
LustreError: 1262:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 3 previous similar messages
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

07Aug_lfsck_log1.txt
0.2 kB
07/Aug/14 10:01 PM
07Aug_lfsck_log2.txt.gz
0.2 kB
07/Aug/14 10:03 PM
07Aug_messages.txt
889 kB
07/Aug/14 10:01 PM
messages
787 kB
29/Jul/14 3:51 PM

Issue Links

is related to

LU-4688 target_destroy_export() LBUG

Resolved

Activity

[LU-5395] lfsck_start not progressing

James Nunez (Inactive) added a comment - 15/Aug/14 2:48 PM

I suspect what is wrong on the OSTs is that I set 'fail_loc=0x1614' on them and that caused some problems.

The OSS kernel logs were collected after all the LFSCK runs. So, it may not be obvious where one LFSCK ends and the next one starts. I do see several of the following messages in the OSS logs:

00100000:10000000:7.0:1408057678.451385:0:25634:0:(lfsck_layout.c:6274:lfsck_orphan_it_init()) scratch-OST0003-osd: init the orphan iteration: rc = -3

I've uploaded two logs to uploads; lfsck_log_oss1.txt contains the kernel log for OST0000 and OST0001 and lfsck_log_oss2.txt has the kernel log for OST0002 and OST0003

James Nunez (Inactive) added a comment - 15/Aug/14 2:48 PM I suspect what is wrong on the OSTs is that I set 'fail_loc=0x1614' on them and that caused some problems. The OSS kernel logs were collected after all the LFSCK runs. So, it may not be obvious where one LFSCK ends and the next one starts. I do see several of the following messages in the OSS logs: 00100000:10000000:7.0:1408057678.451385:0:25634:0:(lfsck_layout.c:6274:lfsck_orphan_it_init()) scratch-OST0003-osd: init the orphan iteration: rc = -3 I've uploaded two logs to uploads; lfsck_log_oss1.txt contains the kernel log for OST0000 and OST0001 and lfsck_log_oss2.txt has the kernel log for OST0002 and OST0003

nasf (Inactive) added a comment - 15/Aug/14 9:05 AM

A lot of failure message like that:
"(lfsck_layout.c:2906:lfsck_layout_scan_orphan()) scratch-MDT0000-osd: layout LFSCK assistant finished the orphan scanning for OST0003: rc = -3"

That means some failures on the OST side during the LFSCK. So do you have the Lustre logs on OST? Thanks!

nasf (Inactive) added a comment - 15/Aug/14 9:05 AM A lot of failure message like that: "(lfsck_layout.c:2906:lfsck_layout_scan_orphan()) scratch-MDT0000-osd: layout LFSCK assistant finished the orphan scanning for OST0003: rc = -3" That means some failures on the OST side during the LFSCK. So do you have the Lustre logs on OST? Thanks!

James Nunez (Inactive) added a comment - 15/Aug/14 4:32 AM

Thank you for the explanation about what to expect with the different fail_loc values.

I have uploaded two logs to /scratch/ftp/uploads/~~LU-5395~~ for each of the LFSCK runs that returned with a status of "partial"; file names lfsck_log_40.txt and lfsck_log_41.txt. Please let me know if the logs reveal anything interesting on why LFSCK could not complete.

James Nunez (Inactive) added a comment - 15/Aug/14 4:32 AM Thank you for the explanation about what to expect with the different fail_loc values. I have uploaded two logs to /scratch/ftp/uploads/ LU-5395 for each of the LFSCK runs that returned with a status of "partial"; file names lfsck_log_40.txt and lfsck_log_41.txt. Please let me know if the logs reveal anything interesting on why LFSCK could not complete.

nasf (Inactive) added a comment - 15/Aug/14 2:29 AM

It is good news that the original issues have been resolved.

As for the "status: partial", it means that during the LFSCK running, not all the object can be verified, then the LFSCK cannot make sure whether there are some inconsistency or not. For example, during the LFSCK processing, the communication between the MDT and the OST is broken, then the LFSCK on the MDT cannot get related information from the OST for consistency verification, then some OST-ojects have to be skipped. Under such case, the LFSCK will be marked as partial.

As for the message "lvbo_init failed for resource 0x6a083:0x0: rc = -2" during the dangling test, it is normal, because we inject error and cause related OST-objects are lost or not created, then if someone try to access related OST-objects before the LFSCK repairing them, then they will get the "-2" failure as you hit in the test.

If we have the Lustre logs, then we can analysis it for the detailed reasons.

About the error injection, the dangling reference "0x1610" is set on the OST when create, the multiple reference "0x1614" is set on the MDS when create. If you want to test, please apply the patches for the ~~LU-5395~~. Thanks!

nasf (Inactive) added a comment - 15/Aug/14 2:29 AM It is good news that the original issues have been resolved. As for the "status: partial", it means that during the LFSCK running, not all the object can be verified, then the LFSCK cannot make sure whether there are some inconsistency or not. For example, during the LFSCK processing, the communication between the MDT and the OST is broken, then the LFSCK on the MDT cannot get related information from the OST for consistency verification, then some OST-ojects have to be skipped. Under such case, the LFSCK will be marked as partial. As for the message "lvbo_init failed for resource 0x6a083:0x0: rc = -2" during the dangling test, it is normal, because we inject error and cause related OST-objects are lost or not created, then if someone try to access related OST-objects before the LFSCK repairing them, then they will get the "-2" failure as you hit in the test. If we have the Lustre logs, then we can analysis it for the detailed reasons. About the error injection, the dangling reference "0x1610" is set on the OST when create, the multiple reference "0x1614" is set on the MDS when create. If you want to test, please apply the patches for the LU-5395 . Thanks!

James Nunez (Inactive) added a comment - 14/Aug/14 11:31 PM

I’ve tested a build of master plus the latest versions of the two patches mentioned above. Things look good, but there are some issues that I’d like you to comment on.

I ran the workload explained above, ran ‘lctl lfsck_start’ and everything completed as expected. I didn't see any hangs, I do not see any messages about “bad ost_idx” on the MDSs nor on the OSSs dmesg. So, that is good news. It looks like these patches cleared up the issue I was seeing.

Since this was successful, I decided to do the same exact thing as above, but change one thing, the fail_loc value on all OSTs. I set fail_loc to “0x1614”, which should be multiple references. When I did this, I can’t get LFSCK to complete successfully. I see in the LFSCK output:

status: partial
flags: incomplete

I started LFSCK twice and both times, it would not complete and would stop with “status: partial”. I have logs for both of these runs.

I then tried to set fail_loc to “0x1610”, dangling references, but the script fails and I see the following in the OST dmesg:

LustreError: 25382:0:(ldlm_resource.c:1150:ldlm_resource_get()) scratch-OST0000: lvbo_init failed for resource 0x6a083:0x0: rc = -2
LustreError: 25382:0:(ldlm_resource.c:1150:ldlm_resource_get()) Skipped 46 previous similar messages
Lustre: *** cfs_fail_loc=1610, val=0***
LustreError: 25644:0:(ldlm_resource.c:1150:ldlm_resource_get()) scratch-OST0001: lvbo_init failed for resource 0x6a0e3:0x0: rc = -2
LustreError: 25644:0:(ldlm_resource.c:1150:ldlm_resource_get()) Skipped 168 previous similar messages

This looks like ~~LU-5457~~.

I plan to reinstall Lustre on these nodes and run the workload creating the multiple references and dangling references. Should I use this series of patches or just use master? What would provide the most information? If I can recreate this problem, I’ll open a new ticket.

Thanks.

James Nunez (Inactive) added a comment - 14/Aug/14 11:31 PM I’ve tested a build of master plus the latest versions of the two patches mentioned above. Things look good, but there are some issues that I’d like you to comment on. I ran the workload explained above, ran ‘lctl lfsck_start’ and everything completed as expected. I didn't see any hangs, I do not see any messages about “bad ost_idx” on the MDSs nor on the OSSs dmesg. So, that is good news. It looks like these patches cleared up the issue I was seeing. Since this was successful, I decided to do the same exact thing as above, but change one thing, the fail_loc value on all OSTs. I set fail_loc to “0x1614”, which should be multiple references. When I did this, I can’t get LFSCK to complete successfully. I see in the LFSCK output: status: partial flags: incomplete I started LFSCK twice and both times, it would not complete and would stop with “status: partial”. I have logs for both of these runs. I then tried to set fail_loc to “0x1610”, dangling references, but the script fails and I see the following in the OST dmesg: LustreError: 25382:0:(ldlm_resource.c:1150:ldlm_resource_get()) scratch-OST0000: lvbo_init failed for resource 0x6a083:0x0: rc = -2 LustreError: 25382:0:(ldlm_resource.c:1150:ldlm_resource_get()) Skipped 46 previous similar messages Lustre: *** cfs_fail_loc=1610, val=0*** LustreError: 25644:0:(ldlm_resource.c:1150:ldlm_resource_get()) scratch-OST0001: lvbo_init failed for resource 0x6a0e3:0x0: rc = -2 LustreError: 25644:0:(ldlm_resource.c:1150:ldlm_resource_get()) Skipped 168 previous similar messages This looks like LU-5457 . I plan to reinstall Lustre on these nodes and run the workload creating the multiple references and dangling references. Should I use this series of patches or just use master? What would provide the most information? If I can recreate this problem, I’ll open a new ticket. Thanks.

nasf (Inactive) added a comment - 14/Aug/14 4:23 AM

Thanks James! Your log is very useful. I have found at least one bad code in LFSCK that will cause the trouble message "lustre_idl.h:770:ostid_to_fid()) bad ost_idx...".

I have updated the the two patches: http://review.whamcloud.com/11373 and http://review.whamcloud.com/#/c/11304/. Would you please to retry them? And please get the same logs as you did before if still hit the trouble message. Thanks!

nasf (Inactive) added a comment - 14/Aug/14 4:23 AM Thanks James! Your log is very useful. I have found at least one bad code in LFSCK that will cause the trouble message "lustre_idl.h:770:ostid_to_fid()) bad ost_idx...". I have updated the the two patches: http://review.whamcloud.com/11373 and http://review.whamcloud.com/#/c/11304/ . Would you please to retry them? And please get the same logs as you did before if still hit the trouble message. Thanks!

James Nunez (Inactive) added a comment - 13/Aug/14 4:15 PM

I've uploaded the logs to /scratch/ftp/uploads/~~LU-5395~~. Here what I did yesterday and the logs that were taken:

started remove of directories containing 100,000 files each and started running test 2.2 from the LFSCK test plan, create files with unmatched pairs. Start kernel log and then LFSCK:

lctl clear
lctl debug_daemon start /tmp/lfsck_log_1 1024
lctl mark LFSCK_START
lctl lfsck_start -M scratch-MDT0000 -A --reset --type layout -s 100

I stopped LFSCK because the speed limit of 100 was too slow. Since the workload of remove files and create files with unmatched pairs was still running, I stopped the kernel log, called lfsck_log_1.txt in uploads directory, and restarted the kernel logging since I didn't want to lose information from the log.

lctl lfsck_stop -M scratch-MDT0000 -A
lctl mark LFSCK_END
lctl debug_daemon stop
lctl clear
lctl debug_daemon start /tmp/lfsck_log_2 1024
lctl mark LFSCK_START
lctl lfsck_start -M scratch-MDT0000 -A --reset --type layout

Again, 'lfsck_start' completed, but workload was still running. So, I stopped the kernel log, lfsck_log_2.txt in uploads, and started the kernel log and 'lfsck_start' again:

lctl mark LFSCK_END
   31  lctl debug_daemon stop
   32  lctl clear
   33  lctl debug_daemon start /tmp/lfsck_log_3 1024
   34  lctl mark LFSCK_START
   35  lctl lfsck_start -M scratch-MDT0000 -A --reset --type layout

I did this two more times; stop the log, start the log and start 'lctl lfsck_start'. Each time, lfsck_start completed with no hangs. The logs are lfsck_log_3.txt, lfsck_log_4.txt, and lfsck_log_5.txt in uploads. lfsck_log_4.txt and lfsck_log_5.txt capture one run of lfsck_start because I used the '-s 100' flag to see if I could get lfsck_start to hang with the speed limit flag. It did not hang.

At this point, the file create and delete workload ended. I started up one more 'lctl lfsck_start' to clean up and issues that were left behind in the file system. Since I couldn't hand lfsck_start, I thought I should try again and I started creating 100,000 files in several directories again. At this point, I was watching the status of LFSCK at /proc/fs/lustre/mdd/scratch-MDT0000/lfsck_layout and it looked like things hung. So, I issued "echo t > /proc/sysrq-trigger" and the output is in messages in uploads.

Several minutes later, things started progressing again and lfsck_start finally completed. So, I started up the same workload of remove files and create files with unmatched pairs and ran lfsck_start twice, with a speed limit of 250 and 300, while this was running. The logs for these runs are in uploads at lfsck_log_20.txt and lfsck_log_21.txt. In both cases, LFSCK completed.

James Nunez (Inactive) added a comment - 13/Aug/14 4:15 PM I've uploaded the logs to /scratch/ftp/uploads/ LU-5395 . Here what I did yesterday and the logs that were taken: started remove of directories containing 100,000 files each and started running test 2.2 from the LFSCK test plan, create files with unmatched pairs. Start kernel log and then LFSCK: lctl clear lctl debug_daemon start /tmp/lfsck_log_1 1024 lctl mark LFSCK_START lctl lfsck_start -M scratch-MDT0000 -A --reset --type layout -s 100 I stopped LFSCK because the speed limit of 100 was too slow. Since the workload of remove files and create files with unmatched pairs was still running, I stopped the kernel log, called lfsck_log_1.txt in uploads directory, and restarted the kernel logging since I didn't want to lose information from the log. lctl lfsck_stop -M scratch-MDT0000 -A lctl mark LFSCK_END lctl debug_daemon stop lctl clear lctl debug_daemon start /tmp/lfsck_log_2 1024 lctl mark LFSCK_START lctl lfsck_start -M scratch-MDT0000 -A --reset --type layout Again, 'lfsck_start' completed, but workload was still running. So, I stopped the kernel log, lfsck_log_2.txt in uploads, and started the kernel log and 'lfsck_start' again: lctl mark LFSCK_END 31 lctl debug_daemon stop 32 lctl clear 33 lctl debug_daemon start /tmp/lfsck_log_3 1024 34 lctl mark LFSCK_START 35 lctl lfsck_start -M scratch-MDT0000 -A --reset --type layout I did this two more times; stop the log, start the log and start 'lctl lfsck_start'. Each time, lfsck_start completed with no hangs. The logs are lfsck_log_3.txt, lfsck_log_4.txt, and lfsck_log_5.txt in uploads. lfsck_log_4.txt and lfsck_log_5.txt capture one run of lfsck_start because I used the '-s 100' flag to see if I could get lfsck_start to hang with the speed limit flag. It did not hang. At this point, the file create and delete workload ended. I started up one more 'lctl lfsck_start' to clean up and issues that were left behind in the file system. Since I couldn't hand lfsck_start, I thought I should try again and I started creating 100,000 files in several directories again. At this point, I was watching the status of LFSCK at /proc/fs/lustre/mdd/scratch-MDT0000/lfsck_layout and it looked like things hung. So, I issued "echo t > /proc/sysrq-trigger" and the output is in messages in uploads. Several minutes later, things started progressing again and lfsck_start finally completed. So, I started up the same workload of remove files and create files with unmatched pairs and ran lfsck_start twice, with a speed limit of 250 and 300, while this was running. The logs for these runs are in uploads at lfsck_log_20.txt and lfsck_log_21.txt. In both cases, LFSCK completed.

nasf (Inactive) added a comment - 13/Aug/14 3:29 PM

So you still hit once LFSCK hung with the latest patches applied, right? Or such hung was temporarily and the LFSCK finished finally? If you have the kernel stack dump log "echo t > /proc/sysrq-trigger" at that time point, then it will be very helpful. Thanks!

nasf (Inactive) added a comment - 13/Aug/14 3:29 PM So you still hit once LFSCK hung with the latest patches applied, right? Or such hung was temporarily and the LFSCK finished finally? If you have the kernel stack dump log "echo t > /proc/sysrq-trigger" at that time point, then it will be very helpful. Thanks!

James Nunez (Inactive) added a comment - 13/Aug/14 1:43 PM

Yes, I have the logs and will upload them this morning.

James Nunez (Inactive) added a comment - 13/Aug/14 1:43 PM Yes, I have the logs and will upload them this morning.

nasf (Inactive) added a comment - 13/Aug/14 3:51 AM

James, do you have the "lctl dk" logs? Thanks!

nasf (Inactive) added a comment - 13/Aug/14 3:51 AM James, do you have the "lctl dk" logs? Thanks!

James Nunez (Inactive) added a comment - 12/Aug/14 10:49 PM

I tested the patch composed of latest master with http://review.whamcloud.com/#/c/11373/1 applied and then http://review.whamcloud.com/#/c/11304/ applied. I ran the combination of creating and deleting files twice and 'lctl lfsck_start' completed each time.

During the first run of file removes and creates, I did see the 'bad ost_idx. message on the first MDS. Here's dmesg from mds01 for this test run:

Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000380000400-0x00000003c0000400):5:ost
Lustre: Skipped 2 previous similar messages
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000400000400-0x0000000440000400):0:mdt
Lustre: Skipped 1 previous similar message
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_1 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_2 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_3 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1192:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936582
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000440000400-0x0000000480000400):1:mdt
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_4 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000480000400-0x00000004c0000400):2:mdt
Lustre: cli-ctl-scratch-MDT0002: Allocated super-sequence [0x0000000480000400-0x00000004c0000400):2:mdt]
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_5 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1210:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680
LustreError: 1210:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x00000004c0000400-0x0000000500000400):3:mdt
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
LustreError: 1316:0:(class_obd.c:398:class_handle_ioctl()) OBD ioctl : No Device -12066
LustreError: 1323:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936582
LustreError: 1323:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484672 ost_idx:0
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x700000000:1024 ost_idx:0
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x700000000:1024 ost_idx:0
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x200000000:2147484674 ost_idx:0
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message

At this point, all file creates and deletes has stopped. So, I started creating file again to run the test one more time. At that point, it looked like 'lfsck_start' was hung, but it looks like the server was just really busy and, eventually, 'lfsck_start' completed. Here's what dmesg had in it during this busy period:

LustreError: 11-0: scratch-OST0005-osc-MDT0000: Communicating with 192.168.2.113@o2ib, operation ost_statfs failed with -107.
Lustre: mdt_readpage: This server is not able to keep up with request traffic (cpu-bound).
Lustre: 746:0:(service.c:1509:ptlrpc_at_check_timed()) earlyQ=2 reqQ=0 recA=2, svcEst=31, delay=0(jiff)
Lustre: 746:0:(service.c:1306:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-510s), not sending early reply. Consider increasing at_early_margin (5)?  req@ffff880ecbf84800 x1476250892354708/t4302517158(0) o35->98357ec6-eddc-7db9-b827-938060f0c85f@192.168.2.115@o2ib:0/0 lens 392/696 e 1 to 0 dl 1407874953 ref 2 fl Interpret:/0/0 rc 0/0
Lustre: 1250:0:(service.c:2118:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (31:510s); client may timeout.  req@ffff88091bcd8400 x1476250907062608/t4302517159(0) o35->e641b843-fa22-0bd9-70de-06ac9836f5d3@192.168.2.119@o2ib:0/0 lens 392/424 e 1 to 0 dl 1407874953 ref 1 fl Complete:/0/0 rc 0/0
Lustre: scratch-OST0007-osc-MDT0002: Connection to scratch-OST0007 (at 192.168.2.114@o2ib) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 1 previous similar message
LNet: Service thread pid 1041 completed after 540.62s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
LustreError: 167-0: scratch-OST0004-osc-MDT0002: This client was evicted by scratch-OST0004; in progress operations using this service will fail.
Lustre: scratch-OST0004-osc-MDT0002: Connection restored to scratch-OST0004 (at 192.168.2.113@o2ib)
Lustre: scratch-OST0005-osc-MDT0000: Connection to scratch-OST0005 (at 192.168.2.113@o2ib) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 17 previous similar messages
LustreError: 977:0:(osp_precreate.c:466:osp_precreate_send()) scratch-OST0005-osc-MDT0000: can't precreate: rc = -107
LustreError: 167-0: scratch-OST0005-osc-MDT0000: This client was evicted by scratch-OST0005; in progress operations using this service will fail.
LustreError: Skipped 18 previous similar messages
LustreError: Skipped 73 previous similar messages
Lustre: scratch-OST0005-osc-MDT0000: Connection restored to scratch-OST0005 (at 192.168.2.113@o2ib)
Lustre: Skipped 18 previous similar messages

On the second MDS, the dmesg looked similar:

Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_20 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936584
LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_21 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680
LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484672 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x500000000:3221226498 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x400000000:3221226498 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...

I did the same workload a second time just to make sure I didn't just get lucky. 'lctl lfsck_start' completed each tie I started it and dmesg on the first MDS looks similar to the first test:

Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_20 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936584
LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_21 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680
LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484672 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x500000000:3221226498 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x400000000:3221226498 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...

I have kernel logs from the first MDS for the periods between where you see "LFSCK_START" and "LFSCK_END" in the dmesg logs above. Please let me know if you would like to see these logs.

James Nunez (Inactive) added a comment - 12/Aug/14 10:49 PM I tested the patch composed of latest master with http://review.whamcloud.com/#/c/11373/1 applied and then http://review.whamcloud.com/#/c/11304/ applied. I ran the combination of creating and deleting files twice and 'lctl lfsck_start' completed each time. During the first run of file removes and creates, I did see the 'bad ost_idx. message on the first MDS. Here's dmesg from mds01 for this test run: Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000380000400-0x00000003c0000400):5:ost Lustre: Skipped 2 previous similar messages Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000400000400-0x0000000440000400):0:mdt Lustre: Skipped 1 previous similar message Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_1 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_2 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_3 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START LustreError: 1192:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936582 Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000440000400-0x0000000480000400):1:mdt Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_4 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000480000400-0x00000004c0000400):2:mdt Lustre: cli-ctl-scratch-MDT0002: Allocated super-sequence [0x0000000480000400-0x00000004c0000400):2:mdt] Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_5 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START LustreError: 1210:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680 LustreError: 1210:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x00000004c0000400-0x0000000500000400):3:mdt Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... LustreError: 1316:0:(class_obd.c:398:class_handle_ioctl()) OBD ioctl : No Device -12066 LustreError: 1323:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936582 LustreError: 1323:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484672 ost_idx:0 LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0 LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x700000000:1024 ost_idx:0 LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x700000000:1024 ost_idx:0 LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x200000000:2147484674 ost_idx:0 LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message At this point, all file creates and deletes has stopped. So, I started creating file again to run the test one more time. At that point, it looked like 'lfsck_start' was hung, but it looks like the server was just really busy and, eventually, 'lfsck_start' completed. Here's what dmesg had in it during this busy period: LustreError: 11-0: scratch-OST0005-osc-MDT0000: Communicating with 192.168.2.113@o2ib, operation ost_statfs failed with -107. Lustre: mdt_readpage: This server is not able to keep up with request traffic (cpu-bound). Lustre: 746:0:(service.c:1509:ptlrpc_at_check_timed()) earlyQ=2 reqQ=0 recA=2, svcEst=31, delay=0(jiff) Lustre: 746:0:(service.c:1306:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-510s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff880ecbf84800 x1476250892354708/t4302517158(0) o35->98357ec6-eddc-7db9-b827-938060f0c85f@192.168.2.115@o2ib:0/0 lens 392/696 e 1 to 0 dl 1407874953 ref 2 fl Interpret:/0/0 rc 0/0 Lustre: 1250:0:(service.c:2118:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (31:510s); client may timeout. req@ffff88091bcd8400 x1476250907062608/t4302517159(0) o35->e641b843-fa22-0bd9-70de-06ac9836f5d3@192.168.2.119@o2ib:0/0 lens 392/424 e 1 to 0 dl 1407874953 ref 1 fl Complete:/0/0 rc 0/0 Lustre: scratch-OST0007-osc-MDT0002: Connection to scratch-OST0007 (at 192.168.2.114@o2ib) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 1 previous similar message LNet: Service thread pid 1041 completed after 540.62s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). LustreError: 167-0: scratch-OST0004-osc-MDT0002: This client was evicted by scratch-OST0004; in progress operations using this service will fail. Lustre: scratch-OST0004-osc-MDT0002: Connection restored to scratch-OST0004 (at 192.168.2.113@o2ib) Lustre: scratch-OST0005-osc-MDT0000: Connection to scratch-OST0005 (at 192.168.2.113@o2ib) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 17 previous similar messages LustreError: 977:0:(osp_precreate.c:466:osp_precreate_send()) scratch-OST0005-osc-MDT0000: can't precreate: rc = -107 LustreError: 167-0: scratch-OST0005-osc-MDT0000: This client was evicted by scratch-OST0005; in progress operations using this service will fail. LustreError: Skipped 18 previous similar messages LustreError: Skipped 73 previous similar messages Lustre: scratch-OST0005-osc-MDT0000: Connection restored to scratch-OST0005 (at 192.168.2.113@o2ib) Lustre: Skipped 18 previous similar messages On the second MDS, the dmesg looked similar: Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_20 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936584 LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_21 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680 LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484672 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x500000000:3221226498 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x400000000:3221226498 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... I did the same workload a second time just to make sure I didn't just get lucky. 'lctl lfsck_start' completed each tie I started it and dmesg on the first MDS looks similar to the first test: Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_20 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936584 LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_21 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680 LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484672 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x500000000:3221226498 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x400000000:3221226498 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... I have kernel logs from the first MDS for the periods between where you see "LFSCK_START" and "LFSCK_END" in the dmesg logs above. Please let me know if you would like to see these logs.

People

Assignee:: nasf (Inactive)

Reporter:: James Nunez (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 23/Jul/14 5:51 PM

Updated:: 05/Sep/14 4:36 AM

Resolved:: 05/Sep/14 4:36 AM