[LU-5395] lfsck_start not progressing - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.7.0
Affects Version/s: Lustre 2.6.0
Labels:
- lfsck
Environment:
OpenSFS cluster with 2 MDSs with 2MDTs each, 4 OSSs with two OSTs each

Severity:
3
Rank (Obsolete):
15018

Description

I was running the Small files create performance impact by LFSCK portion of the LFSCK Phase II test plan (~~LU-3423~~) and noticed that the speed limit flag was not working as expected.

I ran:

# lctl lfsck_start -M scratch-MDT0000 -A --reset --type layout  -s 1379
Started LFSCK on the device scratch-MDT0000: scrub layout

With the following results:

# lctl get_param -n mdd.scratch-MDT0000.lfsck_layout

name: lfsck_layout
magic: 0xb173ae14
version: 2
status: completed
flags:
param: all_targets
time_since_last_completed: 5 seconds
time_since_latest_start: 2263 seconds
time_since_last_checkpoint: 5 seconds
latest_start_position: 0
last_checkpoint_position: 241696769
first_failure_position: 0
success_count: 75
repaired_dangling: 0
repaired_unmatched_pair: 6400010
repaired_multiple_referenced: 0
repaired_orphan: 0
repaired_inconsistent_owner: 0
repaired_others: 0
skipped: 0
failed_phase1: 0
failed_phase2: 0
checked_phase1: 6912081
checked_phase2: 0
run_time_phase1: 2258 seconds
run_time_phase2: 0 seconds
average_speed_phase1: 3061 items/sec
average_speed_phase2: 0 objs/sec
real-time_speed_phase1: N/A
real-time_speed_phase2: N/A
current_position: N/A

After that, I started running 'lctl lfsck_start" on the MDS with different values for the speed limit (-s). After a couple of 'lctl lfsck_start', LFSCK is stuck in scanning-phase-1 for 16 or so hours. Currently, I see:

status: scanning-phase1

for both 'lctl get_param mdd.scratch-MDT0000.lfsck_namespace' and 'lctl get_param mdd.scratch-MDT0000.lfsck_namespace' .

I ran

# lctl lfsck_stop -M scratch-MDT0000

but it hasn't returned in the past 30 minutes.

Prior to running lfsck_stop, dmesg on mds01, where the 'lctl lfsck_start' and 'lctl lfsck_stop' commands were run, I see the following:

LNet: Service thread pid 32564 was inactive for 0.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
LNet: Service thread pid 32564 completed after 0.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
Pid: 32564, comm: mdt02_003

Call Trace:
 [<ffffffffa0576561>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa090ffc8>] ? ptlrpc_server_normal_pending+0x38/0xc0 [ptlrpc]
 [<ffffffffa0911565>] ptlrpc_wait_event+0x2c5/0x2d0 [ptlrpc]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa091ad9f>] ptlrpc_main+0x84f/0x1980 [ptlrpc]
 [<ffffffffa091a550>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x200000000:731978560 ost_idx:4294936591
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 3 previous similar messages
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:901663376 ost_idx:4294936589
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 3 previous similar messages
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:901663376 ost_idx:4294936589
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 7 previous similar messages
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:901663376 ost_idx:4294936589
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 11 previous similar messages
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x500000000:901663376 ost_idx:4294936589
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 19 previous similar messages
LustreError: 26090:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:1515870810 ost_idx:1515870810
LustreError: 26090:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 19 previous similar messages
LustreError: 1262:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:3564759104 ost_idx:4294936583
LustreError: 1262:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 3 previous similar messages
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

07Aug_lfsck_log1.txt
0.2 kB
07/Aug/14 10:01 PM
07Aug_lfsck_log2.txt.gz
0.2 kB
07/Aug/14 10:03 PM
07Aug_messages.txt
889 kB
07/Aug/14 10:01 PM
messages
787 kB
29/Jul/14 3:51 PM

Issue Links

is related to

LU-4688 target_destroy_export() LBUG

Resolved

Activity

[LU-5395] lfsck_start not progressing

nasf (Inactive) added a comment - 13/Aug/14 3:51 AM

James, do you have the "lctl dk" logs? Thanks!

nasf (Inactive) added a comment - 13/Aug/14 3:51 AM James, do you have the "lctl dk" logs? Thanks!

James Nunez (Inactive) added a comment - 12/Aug/14 10:49 PM

I tested the patch composed of latest master with http://review.whamcloud.com/#/c/11373/1 applied and then http://review.whamcloud.com/#/c/11304/ applied. I ran the combination of creating and deleting files twice and 'lctl lfsck_start' completed each time.

During the first run of file removes and creates, I did see the 'bad ost_idx. message on the first MDS. Here's dmesg from mds01 for this test run:

Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000380000400-0x00000003c0000400):5:ost
Lustre: Skipped 2 previous similar messages
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000400000400-0x0000000440000400):0:mdt
Lustre: Skipped 1 previous similar message
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_1 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_2 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_3 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1192:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936582
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000440000400-0x0000000480000400):1:mdt
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_4 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000480000400-0x00000004c0000400):2:mdt
Lustre: cli-ctl-scratch-MDT0002: Allocated super-sequence [0x0000000480000400-0x00000004c0000400):2:mdt]
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_5 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1210:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680
LustreError: 1210:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x00000004c0000400-0x0000000500000400):3:mdt
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
LustreError: 1316:0:(class_obd.c:398:class_handle_ioctl()) OBD ioctl : No Device -12066
LustreError: 1323:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936582
LustreError: 1323:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484672 ost_idx:0
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x700000000:1024 ost_idx:0
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x700000000:1024 ost_idx:0
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x200000000:2147484674 ost_idx:0
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message

At this point, all file creates and deletes has stopped. So, I started creating file again to run the test one more time. At that point, it looked like 'lfsck_start' was hung, but it looks like the server was just really busy and, eventually, 'lfsck_start' completed. Here's what dmesg had in it during this busy period:

LustreError: 11-0: scratch-OST0005-osc-MDT0000: Communicating with 192.168.2.113@o2ib, operation ost_statfs failed with -107.
Lustre: mdt_readpage: This server is not able to keep up with request traffic (cpu-bound).
Lustre: 746:0:(service.c:1509:ptlrpc_at_check_timed()) earlyQ=2 reqQ=0 recA=2, svcEst=31, delay=0(jiff)
Lustre: 746:0:(service.c:1306:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-510s), not sending early reply. Consider increasing at_early_margin (5)?  req@ffff880ecbf84800 x1476250892354708/t4302517158(0) o35->98357ec6-eddc-7db9-b827-938060f0c85f@192.168.2.115@o2ib:0/0 lens 392/696 e 1 to 0 dl 1407874953 ref 2 fl Interpret:/0/0 rc 0/0
Lustre: 1250:0:(service.c:2118:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (31:510s); client may timeout.  req@ffff88091bcd8400 x1476250907062608/t4302517159(0) o35->e641b843-fa22-0bd9-70de-06ac9836f5d3@192.168.2.119@o2ib:0/0 lens 392/424 e 1 to 0 dl 1407874953 ref 1 fl Complete:/0/0 rc 0/0
Lustre: scratch-OST0007-osc-MDT0002: Connection to scratch-OST0007 (at 192.168.2.114@o2ib) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 1 previous similar message
LNet: Service thread pid 1041 completed after 540.62s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
LustreError: 167-0: scratch-OST0004-osc-MDT0002: This client was evicted by scratch-OST0004; in progress operations using this service will fail.
Lustre: scratch-OST0004-osc-MDT0002: Connection restored to scratch-OST0004 (at 192.168.2.113@o2ib)
Lustre: scratch-OST0005-osc-MDT0000: Connection to scratch-OST0005 (at 192.168.2.113@o2ib) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 17 previous similar messages
LustreError: 977:0:(osp_precreate.c:466:osp_precreate_send()) scratch-OST0005-osc-MDT0000: can't precreate: rc = -107
LustreError: 167-0: scratch-OST0005-osc-MDT0000: This client was evicted by scratch-OST0005; in progress operations using this service will fail.
LustreError: Skipped 18 previous similar messages
LustreError: Skipped 73 previous similar messages
Lustre: scratch-OST0005-osc-MDT0000: Connection restored to scratch-OST0005 (at 192.168.2.113@o2ib)
Lustre: Skipped 18 previous similar messages

On the second MDS, the dmesg looked similar:

Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_20 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936584
LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_21 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680
LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484672 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x500000000:3221226498 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x400000000:3221226498 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...

I did the same workload a second time just to make sure I didn't just get lucky. 'lctl lfsck_start' completed each tie I started it and dmesg on the first MDS looks similar to the first test:

Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_20 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936584
LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_21 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680
LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484672 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x500000000:3221226498 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x400000000:3221226498 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...

I have kernel logs from the first MDS for the periods between where you see "LFSCK_START" and "LFSCK_END" in the dmesg logs above. Please let me know if you would like to see these logs.

James Nunez (Inactive) added a comment - 12/Aug/14 10:49 PM I tested the patch composed of latest master with http://review.whamcloud.com/#/c/11373/1 applied and then http://review.whamcloud.com/#/c/11304/ applied. I ran the combination of creating and deleting files twice and 'lctl lfsck_start' completed each time. During the first run of file removes and creates, I did see the 'bad ost_idx. message on the first MDS. Here's dmesg from mds01 for this test run: Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000380000400-0x00000003c0000400):5:ost Lustre: Skipped 2 previous similar messages Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000400000400-0x0000000440000400):0:mdt Lustre: Skipped 1 previous similar message Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_1 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_2 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_3 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START LustreError: 1192:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936582 Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000440000400-0x0000000480000400):1:mdt Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_4 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000480000400-0x00000004c0000400):2:mdt Lustre: cli-ctl-scratch-MDT0002: Allocated super-sequence [0x0000000480000400-0x00000004c0000400):2:mdt] Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_5 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START LustreError: 1210:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680 LustreError: 1210:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x00000004c0000400-0x0000000500000400):3:mdt Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... LustreError: 1316:0:(class_obd.c:398:class_handle_ioctl()) OBD ioctl : No Device -12066 LustreError: 1323:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936582 LustreError: 1323:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484672 ost_idx:0 LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0 LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x700000000:1024 ost_idx:0 LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x700000000:1024 ost_idx:0 LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x200000000:2147484674 ost_idx:0 LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message At this point, all file creates and deletes has stopped. So, I started creating file again to run the test one more time. At that point, it looked like 'lfsck_start' was hung, but it looks like the server was just really busy and, eventually, 'lfsck_start' completed. Here's what dmesg had in it during this busy period: LustreError: 11-0: scratch-OST0005-osc-MDT0000: Communicating with 192.168.2.113@o2ib, operation ost_statfs failed with -107. Lustre: mdt_readpage: This server is not able to keep up with request traffic (cpu-bound). Lustre: 746:0:(service.c:1509:ptlrpc_at_check_timed()) earlyQ=2 reqQ=0 recA=2, svcEst=31, delay=0(jiff) Lustre: 746:0:(service.c:1306:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-510s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff880ecbf84800 x1476250892354708/t4302517158(0) o35->98357ec6-eddc-7db9-b827-938060f0c85f@192.168.2.115@o2ib:0/0 lens 392/696 e 1 to 0 dl 1407874953 ref 2 fl Interpret:/0/0 rc 0/0 Lustre: 1250:0:(service.c:2118:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (31:510s); client may timeout. req@ffff88091bcd8400 x1476250907062608/t4302517159(0) o35->e641b843-fa22-0bd9-70de-06ac9836f5d3@192.168.2.119@o2ib:0/0 lens 392/424 e 1 to 0 dl 1407874953 ref 1 fl Complete:/0/0 rc 0/0 Lustre: scratch-OST0007-osc-MDT0002: Connection to scratch-OST0007 (at 192.168.2.114@o2ib) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 1 previous similar message LNet: Service thread pid 1041 completed after 540.62s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). LustreError: 167-0: scratch-OST0004-osc-MDT0002: This client was evicted by scratch-OST0004; in progress operations using this service will fail. Lustre: scratch-OST0004-osc-MDT0002: Connection restored to scratch-OST0004 (at 192.168.2.113@o2ib) Lustre: scratch-OST0005-osc-MDT0000: Connection to scratch-OST0005 (at 192.168.2.113@o2ib) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 17 previous similar messages LustreError: 977:0:(osp_precreate.c:466:osp_precreate_send()) scratch-OST0005-osc-MDT0000: can't precreate: rc = -107 LustreError: 167-0: scratch-OST0005-osc-MDT0000: This client was evicted by scratch-OST0005; in progress operations using this service will fail. LustreError: Skipped 18 previous similar messages LustreError: Skipped 73 previous similar messages Lustre: scratch-OST0005-osc-MDT0000: Connection restored to scratch-OST0005 (at 192.168.2.113@o2ib) Lustre: Skipped 18 previous similar messages On the second MDS, the dmesg looked similar: Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_20 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936584 LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_21 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680 LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484672 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x500000000:3221226498 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x400000000:3221226498 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... I did the same workload a second time just to make sure I didn't just get lucky. 'lctl lfsck_start' completed each tie I started it and dmesg on the first MDS looks similar to the first test: Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_20 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936584 LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_21 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680 LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484672 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x500000000:3221226498 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x400000000:3221226498 ost_idx:0 LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... I have kernel logs from the first MDS for the periods between where you see "LFSCK_START" and "LFSCK_END" in the dmesg logs above. Please let me know if you would like to see these logs.

nasf (Inactive) added a comment - 08/Aug/14 1:17 AM

Thanks James!

There is potential deadlock race condition between object
destroy and layout LFSCK. Consider the following scenario:

1) The LFSCK thread obtained the parent object firstly, at
that time, the parent object has not been destroyed yet.

2) One RPC service thread destroyed the parent and all its
children objects. Because the LFSCK is referencing the
parent object, then the parent object will be marked as
dying in RAM. On the other hand, the parent object is
referencing all its children objects, then all children
objects will be marked as dying in RAM also.

3) The LFSCK thread tries to find some child object with
the parent object referenced. Then it will find that the
child object is dying. According to the object visibility
rules: the object with dying flag cannot be returned to
others. So the LFSCK thread has to wait until the dying
object has been purged from RAM, then it can allocate a
new object (with the same FID) in RAM. Unfortunately, the
LFSCK thread itself is referencing the parent object, and
cause the parent object cannot be purged, then cause the
child object cannot be purged also. So the LFSCK thread
will fall into deadlock.

We introduce non-blocked version lu_object_find() to allow
the LFSCK thread to return failure immediately (instead of
wait) when it finds dying (child) object, then the LFSCK
thread can check whether the parent object is dying or not.
So avoid above deadlock.

Here is the patch:
http://review.whamcloud.com/11373

Please apply it together with another patch http://review.whamcloud.com/#/c/11304/

Hope the hung issues can be resolved this time!

nasf (Inactive) added a comment - 08/Aug/14 1:17 AM Thanks James! There is potential deadlock race condition between object destroy and layout LFSCK. Consider the following scenario: 1) The LFSCK thread obtained the parent object firstly, at that time, the parent object has not been destroyed yet. 2) One RPC service thread destroyed the parent and all its children objects. Because the LFSCK is referencing the parent object, then the parent object will be marked as dying in RAM. On the other hand, the parent object is referencing all its children objects, then all children objects will be marked as dying in RAM also. 3) The LFSCK thread tries to find some child object with the parent object referenced. Then it will find that the child object is dying. According to the object visibility rules: the object with dying flag cannot be returned to others. So the LFSCK thread has to wait until the dying object has been purged from RAM, then it can allocate a new object (with the same FID) in RAM. Unfortunately, the LFSCK thread itself is referencing the parent object, and cause the parent object cannot be purged, then cause the child object cannot be purged also. So the LFSCK thread will fall into deadlock. We introduce non-blocked version lu_object_find() to allow the LFSCK thread to return failure immediately (instead of wait) when it finds dying (child) object, then the LFSCK thread can check whether the parent object is dying or not. So avoid above deadlock. Here is the patch: http://review.whamcloud.com/11373 Please apply it together with another patch http://review.whamcloud.com/#/c/11304/ Hope the hung issues can be resolved this time!

James Nunez (Inactive) added a comment - 07/Aug/14 10:01 PM - edited

I ran with the latest patch. I no longer see the "ostid_to_fid()) bad ost_idx..." message on the MDS.

I ran the remove and creating files with unmatched pairs workload and then started logging and running ‘lctl lfsck_start’ on the MDS. That LFSCK completed and stopped the kernel log. I started LFSCK again with a speed limit, ‘-s 100’, and repeatedly checked to see if LFSCK completed. I dumped the kernel log since I didn’t want to miss any output (07Aug_lfsck_log1.txt). I started the kernel log again and, when I checked LFSCK’s progress, I noticed it was stuck in ‘scanning-phase1’. I dumped the kernel log again (07Aug_lfsck_log2.txt.gz, had to gzip since file too large for uploading) and issued ‘echo t > /proc/sysrq-trigger’, but the node was unresponsive. The node then crashed. I was able to execute ‘echo t > /proc/sysrq-trigger’ (07Aug_messages.txt) once the node restarted.

The OST logs are clean.

James Nunez (Inactive) added a comment - 07/Aug/14 10:01 PM - edited I ran with the latest patch. I no longer see the "ostid_to_fid()) bad ost_idx..." message on the MDS. I ran the remove and creating files with unmatched pairs workload and then started logging and running ‘lctl lfsck_start’ on the MDS. That LFSCK completed and stopped the kernel log. I started LFSCK again with a speed limit, ‘-s 100’, and repeatedly checked to see if LFSCK completed. I dumped the kernel log since I didn’t want to miss any output (07Aug_lfsck_log1.txt). I started the kernel log again and, when I checked LFSCK’s progress, I noticed it was stuck in ‘scanning-phase1’. I dumped the kernel log again (07Aug_lfsck_log2.txt.gz, had to gzip since file too large for uploading) and issued ‘echo t > /proc/sysrq-trigger’, but the node was unresponsive. The node then crashed. I was able to execute ‘echo t > /proc/sysrq-trigger’ (07Aug_messages.txt) once the node restarted. The OST logs are clean.

James Nunez (Inactive) added a comment - 07/Aug/14 1:47 PM

I am able to run 'lctl lfsck_start' more than once before it hangs. For the last results reported, I ran lfsck_start at least three times and, the run that never completed, I used the speed limit parameter to slow LFSCK down.

I'll get results for the new patch later today.

James Nunez (Inactive) added a comment - 07/Aug/14 1:47 PM I am able to run 'lctl lfsck_start' more than once before it hangs. For the last results reported, I ran lfsck_start at least three times and, the run that never completed, I used the speed limit parameter to slow LFSCK down. I'll get results for the new patch later today.

nasf (Inactive) added a comment - 07/Aug/14 6:15 AM

http://review.whamcloud.com/#/c/11304/ set5 has been committed. James, please retry. Thanks!

Hope that the message on the MDS about "ostid_to_fid()) bad ost_idx..." can be fixed by the new patch.

nasf (Inactive) added a comment - 07/Aug/14 6:15 AM http://review.whamcloud.com/#/c/11304/ set5 has been committed. James, please retry. Thanks! Hope that the message on the MDS about "ostid_to_fid()) bad ost_idx..." can be fixed by the new patch.

nasf (Inactive) added a comment - 07/Aug/14 1:59 AM - edited

Thanks James! Seems the OST trouble has been resolve, but we still need to figure out the reason for ostid_to_fid() failure on the MDS side. Do you have the kernel debug log on the MDS? And the kernel stack log via "echo t > /proc/sys/lnet/debug" on the MDS.Thanks!

BTW, is the layout LFSCK run repeatedly or just run once in your test? Have you ever finished one cycle LFSCK before the hung? Or the LFSCK will hang there just when the first LFSCK run?

nasf (Inactive) added a comment - 07/Aug/14 1:59 AM - edited Thanks James! Seems the OST trouble has been resolve, but we still need to figure out the reason for ostid_to_fid() failure on the MDS side. Do you have the kernel debug log on the MDS? And the kernel stack log via "echo t > /proc/sys/lnet/debug" on the MDS.Thanks! BTW, is the layout LFSCK run repeatedly or just run once in your test? Have you ever finished one cycle LFSCK before the hung? Or the LFSCK will hang there just when the first LFSCK run?

James Nunez (Inactive) added a comment - 06/Aug/14 9:49 PM

I tired the latest version of the patch and lfsck_start still hangs, but I don't see any error messages on the OSTs.

Dmesg from mds01 has:

LustreError: 1911:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936587
LustreError: 1924:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680
LustreError: 1924:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 1 previous similar message
…
LustreError: 1972:0:(class_obd.c:398:class_handle_ioctl()) OBD ioctl : No Device -12066
…
lfsck_start hung and
INFO: task lfsck:1922 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.g044e468.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 0000000000000002     0  1922      2 0x00000080
 ffff880b294799d0 0000000000000046 ffff880b29479970 ffffffffa057b6f5
 0000000100000000 ffffc900212e2030 0000000000000246 0000000000000246
 ffff8810314d65f8 ffff880b29479fd8 000000000000fbc8 ffff8810314d65f8
Call Trace:
 [<ffffffffa057b6f5>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06ce434>] ? htable_lookup+0x1c4/0x1e0 [obdclass]
 [<ffffffffa06cf598>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf88f>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f51bd4>] lfsck_layout_scan_stripes+0x414/0x1c40 [lfsck]
 [<ffffffffa0e19c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5396b>] lfsck_layout_master_exec_oit+0x56b/0xd00 [lfsck]
 [<ffffffffa0e42ac9>] ? osd_otable_it_next+0x3e9/0x700 [osd_ldiskfs]
 [<ffffffffa0f2e760>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f393aa>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f3abaa>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f3a840>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:1922 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.g044e468.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 0000000000000002     0  1922      2 0x00000080
 ffff880b294799d0 0000000000000046 ffff880b29479970 ffffffffa057b6f5
 0000000100000000 ffffc900212e2030 0000000000000246 0000000000000246
 ffff8810314d65f8 ffff880b29479fd8 000000000000fbc8 ffff8810314d65f8
Call Trace:
 [<ffffffffa057b6f5>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06ce434>] ? htable_lookup+0x1c4/0x1e0 [obdclass]
 [<ffffffffa06cf598>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf88f>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f51bd4>] lfsck_layout_scan_stripes+0x414/0x1c40 [lfsck]
 [<ffffffffa0e19c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5396b>] lfsck_layout_master_exec_oit+0x56b/0xd00 [lfsck]
 [<ffffffffa0e42ac9>] ? osd_otable_it_next+0x3e9/0x700 [osd_ldiskfs]
 [<ffffffffa0f2e760>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f393aa>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f3abaa>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f3a840>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

And on mds02:

LustreError: 1520:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680
LustreError: 1533:0:(lustre_idl.h:805:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484674 ost_idx:0

James Nunez (Inactive) added a comment - 06/Aug/14 9:49 PM I tired the latest version of the patch and lfsck_start still hangs, but I don't see any error messages on the OSTs. Dmesg from mds01 has: LustreError: 1911:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936587 LustreError: 1924:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680 LustreError: 1924:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 1 previous similar message … LustreError: 1972:0:(class_obd.c:398:class_handle_ioctl()) OBD ioctl : No Device -12066 … lfsck_start hung and INFO: task lfsck:1922 blocked for more than 120 seconds. Not tainted 2.6.32-431.20.3.el6_lustre.g044e468.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lfsck D 0000000000000002 0 1922 2 0x00000080 ffff880b294799d0 0000000000000046 ffff880b29479970 ffffffffa057b6f5 0000000100000000 ffffc900212e2030 0000000000000246 0000000000000246 ffff8810314d65f8 ffff880b29479fd8 000000000000fbc8 ffff8810314d65f8 Call Trace: [<ffffffffa057b6f5>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs] [<ffffffffa06ce434>] ? htable_lookup+0x1c4/0x1e0 [obdclass] [<ffffffffa06cf598>] lu_object_find_at+0xa8/0x360 [obdclass] [<ffffffff81528cae>] ? thread_return+0x4e/0x760 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa06cf88f>] lu_object_find_slice+0x1f/0x80 [obdclass] [<ffffffffa0f51bd4>] lfsck_layout_scan_stripes+0x414/0x1c40 [lfsck] [<ffffffffa0e19c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs] [<ffffffffa0f5396b>] lfsck_layout_master_exec_oit+0x56b/0xd00 [lfsck] [<ffffffffa0e42ac9>] ? osd_otable_it_next+0x3e9/0x700 [osd_ldiskfs] [<ffffffffa0f2e760>] lfsck_exec_oit+0x70/0x9e0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f393aa>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck] [<ffffffff81528cae>] ? thread_return+0x4e/0x760 [<ffffffffa0f3abaa>] lfsck_master_engine+0x36a/0x6f0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f3a840>] ? lfsck_master_engine+0x0/0x6f0 [lfsck] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task lfsck:1922 blocked for more than 120 seconds. Not tainted 2.6.32-431.20.3.el6_lustre.g044e468.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lfsck D 0000000000000002 0 1922 2 0x00000080 ffff880b294799d0 0000000000000046 ffff880b29479970 ffffffffa057b6f5 0000000100000000 ffffc900212e2030 0000000000000246 0000000000000246 ffff8810314d65f8 ffff880b29479fd8 000000000000fbc8 ffff8810314d65f8 Call Trace: [<ffffffffa057b6f5>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs] [<ffffffffa06ce434>] ? htable_lookup+0x1c4/0x1e0 [obdclass] [<ffffffffa06cf598>] lu_object_find_at+0xa8/0x360 [obdclass] [<ffffffff81528cae>] ? thread_return+0x4e/0x760 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa06cf88f>] lu_object_find_slice+0x1f/0x80 [obdclass] [<ffffffffa0f51bd4>] lfsck_layout_scan_stripes+0x414/0x1c40 [lfsck] [<ffffffffa0e19c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs] [<ffffffffa0f5396b>] lfsck_layout_master_exec_oit+0x56b/0xd00 [lfsck] [<ffffffffa0e42ac9>] ? osd_otable_it_next+0x3e9/0x700 [osd_ldiskfs] [<ffffffffa0f2e760>] lfsck_exec_oit+0x70/0x9e0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f393aa>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck] [<ffffffff81528cae>] ? thread_return+0x4e/0x760 [<ffffffffa0f3abaa>] lfsck_master_engine+0x36a/0x6f0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f3a840>] ? lfsck_master_engine+0x0/0x6f0 [lfsck] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 And on mds02: LustreError: 1520:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680 LustreError: 1533:0:(lustre_idl.h:805:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484674 ost_idx:0

nasf (Inactive) added a comment - 06/Aug/14 8:38 AM

I have updated the patch, hope to repair the issue on the OST "00100000:10000000:1.0:1407249340.116855:0:27015:0:(lfsck_layout.c:1049:lfsck_layout_lastid_store()) scratch-OST0005-osd: layout LFSCK will sync the LAST_ID for <seq> 0x0 as <oid> 77121"

nasf (Inactive) added a comment - 06/Aug/14 8:38 AM I have updated the patch, hope to repair the issue on the OST "00100000:10000000:1.0:1407249340.116855:0:27015:0:(lfsck_layout.c:1049:lfsck_layout_lastid_store()) scratch-OST0005-osd: layout LFSCK will sync the LAST_ID for <seq> 0x0 as <oid> 77121"

James Nunez (Inactive) added a comment - 05/Aug/14 2:56 PM - edited

I ran the patch with the workload described above and configuration above. With the patch, the removes completed and the 2.2 test from the test plan completed. Yet, 'lctl lfsck_start' is stuck in 'status: scanning-phase1' and the value in 'checked_phase1' is not increasing. So, lfsck_start is hung.

I don't see any errors or LBUGs on any of the OSTs. There are no "found crashed LAST_ID" messages on any of the OSTs. The only messages containing 'LAST_ID' in the OST logs are:

00100000:10000000:1.0:1407249340.116850:0:27015:0:(lfsck_engine.c:463:lfsck_master_engine()) LFSCK exit: oit_flags = 0x60003, dir_flags = 0x8004, oit_cookie = 52572, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 27015, rc = 1
00100000:10000000:1.0:1407249340.116855:0:27015:0:(lfsck_layout.c:1049:lfsck_layout_lastid_store()) scratch-OST0005-osd: layout LFSCK will sync the LAST_ID for <seq> 0x0 as <oid> 77121
00100000:10000000:1.0:1407249340.116875:0:27015:0:(lfsck_layout.c:1049:lfsck_layout_lastid_store()) scratch-OST0005-osd: layout LFSCK will sync the LAST_ID for <seq> 0x380000400 as <oid> 33
00100000:10000000:1.0:1407249340.116882:0:27015:0:(lfsck_layout.c:1049:lfsck_layout_lastid_store()) scratch-OST0005-osd: layout LFSCK will sync the LAST_ID for <seq> 0x380000401 as <oid> 2529

On mds01, dmesg contains:

Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_1 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1906:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936577
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
...
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
LustreError: 1906:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936577
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000440000400-0x0000000480000400):1:mdt
__ratelimit: 89 callbacks suppressed
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
...
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
__ratelimit: 106 callbacks suppressed
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
...
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
__ratelimit: 85 callbacks suppressed
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
...
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000480000400-0x00000004c0000400):2:mdt
Lustre: cli-ctl-scratch-MDT0002: Allocated super-sequence [0x0000000480000400-0x00000004c0000400):2:mdt]
__ratelimit: 39 callbacks suppressed
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
...
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x00000004c0000400-0x0000000500000400):3:mdt
__ratelimit: 23 callbacks suppressed
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
...
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
LustreError: 1976:0:(class_obd.c:398:class_handle_ioctl()) OBD ioctl : No Device -12066
INFO: task lfsck:1924 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.g044e468.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 0000000000000002     0  1924      2 0x00000080
 ffff8810150bd9d0 0000000000000046 ffff8810150bd970 ffffffffa057b6f5
 0000000100000000 ffffc9002260d030 0000000000000246 0000000000000246
 ffff881031607ab8 ffff8810150bdfd8 000000000000fbc8 ffff881031607ab8
Call Trace:
 [<ffffffffa057b6f5>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06ce434>] ? htable_lookup+0x1c4/0x1e0 [obdclass]
 [<ffffffffa06cf598>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf88f>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f51bd4>] lfsck_layout_scan_stripes+0x414/0x1c40 [lfsck]
 [<ffffffffa0e19c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5396b>] lfsck_layout_master_exec_oit+0x56b/0xd00 [lfsck]
 [<ffffffffa0e4282a>] ? osd_otable_it_next+0x14a/0x700 [osd_ldiskfs]
 [<ffffffffa0f2e760>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f393aa>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f3abaa>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f3a840>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

I will upload the kernel log from the MDS in a minute.

EDIT: The kernel log is 1.1 GB in size. I can't upload a file that large. Also, the log buffere was not large enough to hold all the output. If you need more information, I can run things again and dump the log more frequently.

The only LFSCK messages from the kernel log that were captured are:

00100000:10000000:0.0:1407249133.994936:0:1924:0:(lfsck_layout.c:4424:lfsck_layout_master_checkpoint()) scratch-MDT0000-osd: layout LFSCK master checkpoint at the pos [203446623]: rc = 0
00000100:00100000:3.0F:1407249134.517912:0:1929:0:(client.c:1480:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475009860:0@lo:1102
00000100:00100000:3.0:1407249134.517936:0:1929:0:(client.c:1480:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475009864:192.168.2.126@o2ib:1102
00000100:00100000:3.0:1407249134.517993:0:1929:0:(client.c:1863:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475009860:0@lo:1102
00000100:00100000:3.0:1407249134.518193:0:1929:0:(client.c:1863:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475009864:192.168.2.126@o2ib:1102
00000100:00100000:13.0:1407249164.522174:0:1929:0:(client.c:1480:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475388100:0@lo:1102
00000100:00100000:13.0:1407249164.522257:0:1929:0:(client.c:1863:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475388100:0@lo:1102
00000001:02000400:13.0F:1407249170.616979:0:1947:0:(debug.c:345:libcfs_debug_mark_buffer()) DEBUG MARKER: LFSCK_END

Please let me know if more information from the log would help.

James Nunez (Inactive) added a comment - 05/Aug/14 2:56 PM - edited I ran the patch with the workload described above and configuration above. With the patch, the removes completed and the 2.2 test from the test plan completed. Yet, 'lctl lfsck_start' is stuck in 'status: scanning-phase1' and the value in 'checked_phase1' is not increasing. So, lfsck_start is hung. I don't see any errors or LBUGs on any of the OSTs. There are no "found crashed LAST_ID" messages on any of the OSTs. The only messages containing 'LAST_ID' in the OST logs are: 00100000:10000000:1.0:1407249340.116850:0:27015:0:(lfsck_engine.c:463:lfsck_master_engine()) LFSCK exit: oit_flags = 0x60003, dir_flags = 0x8004, oit_cookie = 52572, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 27015, rc = 1 00100000:10000000:1.0:1407249340.116855:0:27015:0:(lfsck_layout.c:1049:lfsck_layout_lastid_store()) scratch-OST0005-osd: layout LFSCK will sync the LAST_ID for <seq> 0x0 as <oid> 77121 00100000:10000000:1.0:1407249340.116875:0:27015:0:(lfsck_layout.c:1049:lfsck_layout_lastid_store()) scratch-OST0005-osd: layout LFSCK will sync the LAST_ID for <seq> 0x380000400 as <oid> 33 00100000:10000000:1.0:1407249340.116882:0:27015:0:(lfsck_layout.c:1049:lfsck_layout_lastid_store()) scratch-OST0005-osd: layout LFSCK will sync the LAST_ID for <seq> 0x380000401 as <oid> 2529 On mds01, dmesg contains: Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_1 (1048576kB max) Lustre: DEBUG MARKER: LFSCK_START LustreError: 1906:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936577 debug daemon buffer overflowed; discarding 10% of pages (41 of 409) ... debug daemon buffer overflowed; discarding 10% of pages (41 of 409) LustreError: 1906:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936577 Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000440000400-0x0000000480000400):1:mdt __ratelimit: 89 callbacks suppressed debug daemon buffer overflowed; discarding 10% of pages (41 of 409) ... debug daemon buffer overflowed; discarding 10% of pages (41 of 409) __ratelimit: 106 callbacks suppressed debug daemon buffer overflowed; discarding 10% of pages (41 of 409) ... debug daemon buffer overflowed; discarding 10% of pages (41 of 409) __ratelimit: 85 callbacks suppressed debug daemon buffer overflowed; discarding 10% of pages (41 of 409) ... debug daemon buffer overflowed; discarding 10% of pages (41 of 409) Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000480000400-0x00000004c0000400):2:mdt Lustre: cli-ctl-scratch-MDT0002: Allocated super-sequence [0x0000000480000400-0x00000004c0000400):2:mdt] __ratelimit: 39 callbacks suppressed debug daemon buffer overflowed; discarding 10% of pages (41 of 409) ... debug daemon buffer overflowed; discarding 10% of pages (41 of 409) Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x00000004c0000400-0x0000000500000400):3:mdt __ratelimit: 23 callbacks suppressed debug daemon buffer overflowed; discarding 10% of pages (41 of 409) ... debug daemon buffer overflowed; discarding 10% of pages (41 of 409) Lustre: DEBUG MARKER: LFSCK_END Lustre: shutting down debug daemon thread... LustreError: 1976:0:(class_obd.c:398:class_handle_ioctl()) OBD ioctl : No Device -12066 INFO: task lfsck:1924 blocked for more than 120 seconds. Not tainted 2.6.32-431.20.3.el6_lustre.g044e468.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lfsck D 0000000000000002 0 1924 2 0x00000080 ffff8810150bd9d0 0000000000000046 ffff8810150bd970 ffffffffa057b6f5 0000000100000000 ffffc9002260d030 0000000000000246 0000000000000246 ffff881031607ab8 ffff8810150bdfd8 000000000000fbc8 ffff881031607ab8 Call Trace: [<ffffffffa057b6f5>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs] [<ffffffffa06ce434>] ? htable_lookup+0x1c4/0x1e0 [obdclass] [<ffffffffa06cf598>] lu_object_find_at+0xa8/0x360 [obdclass] [<ffffffff81528cae>] ? thread_return+0x4e/0x760 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa06cf88f>] lu_object_find_slice+0x1f/0x80 [obdclass] [<ffffffffa0f51bd4>] lfsck_layout_scan_stripes+0x414/0x1c40 [lfsck] [<ffffffffa0e19c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs] [<ffffffffa0f5396b>] lfsck_layout_master_exec_oit+0x56b/0xd00 [lfsck] [<ffffffffa0e4282a>] ? osd_otable_it_next+0x14a/0x700 [osd_ldiskfs] [<ffffffffa0f2e760>] lfsck_exec_oit+0x70/0x9e0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f393aa>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck] [<ffffffff81528cae>] ? thread_return+0x4e/0x760 [<ffffffffa0f3abaa>] lfsck_master_engine+0x36a/0x6f0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f3a840>] ? lfsck_master_engine+0x0/0x6f0 [lfsck] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 I will upload the kernel log from the MDS in a minute. EDIT: The kernel log is 1.1 GB in size. I can't upload a file that large. Also, the log buffere was not large enough to hold all the output. If you need more information, I can run things again and dump the log more frequently. The only LFSCK messages from the kernel log that were captured are: 00100000:10000000:0.0:1407249133.994936:0:1924:0:(lfsck_layout.c:4424:lfsck_layout_master_checkpoint()) scratch-MDT0000-osd: layout LFSCK master checkpoint at the pos [203446623]: rc = 0 00000100:00100000:3.0F:1407249134.517912:0:1929:0:(client.c:1480:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475009860:0@lo:1102 00000100:00100000:3.0:1407249134.517936:0:1929:0:(client.c:1480:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475009864:192.168.2.126@o2ib:1102 00000100:00100000:3.0:1407249134.517993:0:1929:0:(client.c:1863:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475009860:0@lo:1102 00000100:00100000:3.0:1407249134.518193:0:1929:0:(client.c:1863:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475009864:192.168.2.126@o2ib:1102 00000100:00100000:13.0:1407249164.522174:0:1929:0:(client.c:1480:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475388100:0@lo:1102 00000100:00100000:13.0:1407249164.522257:0:1929:0:(client.c:1863:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475388100:0@lo:1102 00000001:02000400:13.0F:1407249170.616979:0:1947:0:(debug.c:345:libcfs_debug_mark_buffer()) DEBUG MARKER: LFSCK_END Please let me know if more information from the log would help.

nasf (Inactive) added a comment - 01/Aug/14 11:56 PM

James, thanks for your quick respond. But we cannot run e2fsck together with LFSCK. Because:

1) The e2fsck will check/repair the system with offline mode, but LFSCK will run as online mode.
2) LFSCK depends on e2fsck to guarantee the local consistency. If the test scripts inject some local inconsistency for testing e2fsck, then LFSCK maybe fall into some unknown status.

So would you please try the original test as you did according to the LFSCK II test plan? I want to make sure whether the patch has fixed the original issues or not. Thanks!

nasf (Inactive) added a comment - 01/Aug/14 11:56 PM James, thanks for your quick respond. But we cannot run e2fsck together with LFSCK. Because: 1) The e2fsck will check/repair the system with offline mode, but LFSCK will run as online mode. 2) LFSCK depends on e2fsck to guarantee the local consistency. If the test scripts inject some local inconsistency for testing e2fsck, then LFSCK maybe fall into some unknown status. So would you please try the original test as you did according to the LFSCK II test plan? I want to make sure whether the patch has fixed the original issues or not. Thanks!

People

Assignee:: nasf (Inactive)

Reporter:: James Nunez (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 23/Jul/14 5:51 PM

Updated:: 05/Sep/14 4:36 AM

Resolved:: 05/Sep/14 4:36 AM