[LU-5395] lfsck_start not progressing Created: 23/Jul/14  Updated: 05/Sep/14  Resolved: 05/Sep/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Critical
Reporter: James Nunez (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: lfsck
Environment:

OpenSFS cluster with 2 MDSs with 2MDTs each, 4 OSSs with two OSTs each


Attachments: Text File 07Aug_lfsck_log1.txt     Text File 07Aug_lfsck_log2.txt.gz     Text File 07Aug_messages.txt     HTML File messages    
Issue Links:
Related
is related to LU-4688 target_destroy_export() LBUG Resolved
Severity: 3
Rank (Obsolete): 15018

 Description   

I was running the Small files create performance impact by LFSCK portion of the LFSCK Phase II test plan (LU-3423) and noticed that the speed limit flag was not working as expected.

I ran:

# lctl lfsck_start -M scratch-MDT0000 -A --reset --type layout  -s 1379
Started LFSCK on the device scratch-MDT0000: scrub layout

With the following results:

# lctl get_param -n mdd.scratch-MDT0000.lfsck_layout

name: lfsck_layout
magic: 0xb173ae14
version: 2
status: completed
flags:
param: all_targets
time_since_last_completed: 5 seconds
time_since_latest_start: 2263 seconds
time_since_last_checkpoint: 5 seconds
latest_start_position: 0
last_checkpoint_position: 241696769
first_failure_position: 0
success_count: 75
repaired_dangling: 0
repaired_unmatched_pair: 6400010
repaired_multiple_referenced: 0
repaired_orphan: 0
repaired_inconsistent_owner: 0
repaired_others: 0
skipped: 0
failed_phase1: 0
failed_phase2: 0
checked_phase1: 6912081
checked_phase2: 0
run_time_phase1: 2258 seconds
run_time_phase2: 0 seconds
average_speed_phase1: 3061 items/sec
average_speed_phase2: 0 objs/sec
real-time_speed_phase1: N/A
real-time_speed_phase2: N/A
current_position: N/A

After that, I started running 'lctl lfsck_start" on the MDS with different values for the speed limit (-s). After a couple of 'lctl lfsck_start', LFSCK is stuck in scanning-phase-1 for 16 or so hours. Currently, I see:

status: scanning-phase1

for both 'lctl get_param mdd.scratch-MDT0000.lfsck_namespace' and 'lctl get_param mdd.scratch-MDT0000.lfsck_namespace' .

I ran

# lctl lfsck_stop -M scratch-MDT0000

but it hasn't returned in the past 30 minutes.

Prior to running lfsck_stop, dmesg on mds01, where the 'lctl lfsck_start' and 'lctl lfsck_stop' commands were run, I see the following:

LNet: Service thread pid 32564 was inactive for 0.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
LNet: Service thread pid 32564 completed after 0.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
Pid: 32564, comm: mdt02_003

Call Trace:
 [<ffffffffa0576561>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa090ffc8>] ? ptlrpc_server_normal_pending+0x38/0xc0 [ptlrpc]
 [<ffffffffa0911565>] ptlrpc_wait_event+0x2c5/0x2d0 [ptlrpc]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa091ad9f>] ptlrpc_main+0x84f/0x1980 [ptlrpc]
 [<ffffffffa091a550>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x200000000:731978560 ost_idx:4294936591
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 3 previous similar messages
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:901663376 ost_idx:4294936589
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 3 previous similar messages
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:901663376 ost_idx:4294936589
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 7 previous similar messages
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:901663376 ost_idx:4294936589
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 11 previous similar messages
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x500000000:901663376 ost_idx:4294936589
LustreError: 20602:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 19 previous similar messages
LustreError: 26090:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x400000000:1515870810 ost_idx:1515870810
LustreError: 26090:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 19 previous similar messages
LustreError: 1262:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:3564759104 ost_idx:4294936583
LustreError: 1262:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 3 previous similar messages
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:2905 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 000000000000000a     0  2905      2 0x00000080
 ffff880e2a51f9e0 0000000000000046 0000000000000000 ffffffffa057bd35
 0000000100000000 ffffc90036b69030 0000000000000246 0000000000000246
 ffff881031249af8 ffff880e2a51ffd8 000000000000fbc8 ffff881031249af8
Call Trace:
 [<ffffffffa057bd35>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06cf4d8>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf7cf>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f4494e>] lfsck_layout_scan_stripes+0x47e/0x1ad0 [lfsck]
 [<ffffffffa1093c24>] ? lod_xattr_get+0x154/0x640 [lod]
 [<ffffffffa0e16c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5313b>] lfsck_layout_master_exec_oit+0x51b/0xc30 [lfsck]
 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa0f2b780>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f363da>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f37bda>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f37870>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20


 Comments   
Comment by Jodi Levi (Inactive) [ 24/Jul/14 ]

Fan Yong,
Can you please comment on this one?
Thank you!

Comment by James Nunez (Inactive) [ 29/Jul/14 ]

I tried to reproduce this error and am able to hang my system, but I'm not seeing the same errors in dmesg on the MDS. So, this may be a different issue.

Here's what I did:
run test 2.2 from the LFSCK test plan with two MDSs with two MDTs each, four OSSs with two OSTs each. We create two subdirectories on each MDT and write 5000 files into each subdirectory. Each file has a stripe of two. We've set the fail_loc on the OST to create unmatched pairs.
At the same time, I have a client node that is deleting two directories with 100,000 file in the each.

On MDS that also has the MGS, I am running lfsck_start:

lctl lfsck_start -M lscratch-MDT0000 -A --reset --type layout

After a few minutes of watching the output of

# lctl get_param -n mdd.lscratch-MDT0000.lfsck_layout

I see the following

name: lfsck_layout
magic: 0xb173ae14
version: 2
status: scanning-phase1
flags:
param: all_targets
time_since_last_completed: 6962 seconds
time_since_latest_start: 145 seconds
time_since_last_checkpoint: N/A
latest_start_position: 0
last_checkpoint_position: 0
first_failure_position: 0
success_count: 9
repaired_dangling: 0
repaired_unmatched_pair: 5340
repaired_multiple_referenced: 0
repaired_orphan: 0
repaired_inconsistent_owner: 0
repaired_others: 0
skipped: 0
failed_phase1: 0
failed_phase2: 0
checked_phase1: 5880
checked_phase2: 0
run_time_phase1: 145 seconds
run_time_phase2: 0 seconds
average_speed_phase1: 40 items/sec
average_speed_phase2: N/A
real-time_speed_phase1: 40 items/sec
real-time_speed_phase2: N/A
current_position: 28838788

From this point on, nothing in the lfsck_layout output changes except for the timers, run_time_phase1, etc. and, of course, the average_speed_phase1 drops. The number of unmatched pairs repaired has moved over the past few minutes. For example:

# lctl get_param -n mdd.lscratch-MDT0000.lfsck_layout
name: lfsck_layout
magic: 0xb173ae14
version: 2
status: scanning-phase1
flags:
param: all_targets
time_since_last_completed: 9897 seconds
time_since_latest_start: 3080 seconds
time_since_last_checkpoint: N/A
latest_start_position: 0
last_checkpoint_position: 0
first_failure_position: 0
success_count: 9
repaired_dangling: 0
repaired_unmatched_pair: 5344
repaired_multiple_referenced: 0
repaired_orphan: 0
repaired_inconsistent_owner: 0
repaired_others: 0
skipped: 0
failed_phase1: 0
failed_phase2: 0
checked_phase1: 5880
checked_phase2: 0
run_time_phase1: 3080 seconds
run_time_phase2: 0 seconds
average_speed_phase1: 1 items/sec
average_speed_phase2: N/A
real-time_speed_phase1: 1 items/sec
real-time_speed_phase2: N/A
current_position: 28838788

It looks like everything is hung, the removing of the files has not returned, the file write script has not progressed and LFSCK, looking at "check_phase1", looks like it is not progressing. So, is the file system just overloaded?

It's not clear if this is the same issue as this ticket. but it looks similar, except, I don't have the same messages in dmesg. From mds01, I see the following in dmesg:

LustreError: 13330:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:1515870810
LustreError: 13330:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 11 previous similar messages
Lustre: ctl-lscratch-MDT0000: super-sequence allocation rc = 0 [0x0000000440000400-0x0000000480000400):1:mdt
Lustre: ctl-lscratch-MDT0000: super-sequence allocation rc = 0 [0x0000000480000400-0x00000004c0000400):2:mdt
Lustre: cli-ctl-lscratch-MDT0002: Allocated super-sequence [0x0000000480000400-0x00000004c0000400):2:mdt]
LustreError: 13459:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936590
LustreError: 13459:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 11 previous similar messages
Lustre: lscratch-OST0006-osc-MDT0002: Connection to lscratch-OST0006 (at 192.168.2.114@o2ib) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 1 previous similar message
LustreError: 4242:0:(osp_precreate.c:466:osp_precreate_send()) lscratch-OST0006-osc-MDT0002: can't precreate: rc = -107
LustreError: 167-0: lscratch-OST0006-osc-MDT0002: This client was evicted by lscratch-OST0006; in progress operations using this service will fail.
LustreError: Skipped 1 previous similar message
Lustre: lscratch-OST0006-osc-MDT0002: Connection restored to lscratch-OST0006 (at 192.168.2.114@o2ib)
Lustre: Skipped 1 previous similar message
Lustre: 32473:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1406641574/real 0]  req@ffff880ca9cecc00 x1474918127882688/t0(0) o13->lscratch-OST0007-osc-MDT0000@192.168.2.114@o2ib:7/4 lens 224/368 e 0 to 1 dl 1406641581 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
Lustre: 32473:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
Lustre: lscratch-OST0007-osc-MDT0000: Connection to lscratch-OST0007 (at 192.168.2.114@o2ib) was lost; in progress operations using this service will wait for recovery to complete
Lustre: lscratch-OST0007-osc-MDT0002: Connection to lscratch-OST0007 (at 192.168.2.114@o2ib) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 1 previous similar message
LNetError: 32451:0:(o2iblnd_cb.c:3017:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 4 seconds
LNetError: 32451:0:(o2iblnd_cb.c:3080:kiblnd_check_conns()) Timed out RDMA with 192.168.2.114@o2ib (54): c: 0, oc: 0, rc: 8
...
Comment by nasf (Inactive) [ 29/Jul/14 ]

The LFSCK calls ostid_to_fid() which finds the input OSTID/index are invalid, but the caller does not check the return value, which may cause the subsequent LFSCK processing to fall into some unknown logic, and maybe hung. I will make patch to fix that at my tomorrow time.

On the other hand, I needs to find out why the input OSTID/index are invalid. James, would you please to tell me what failure(s) you injected into the system for the test? And would you please to dump the kernel threads stack via "echo t > /proc/sysrq-trigger" on the MDS when the LFSCK hung? Thanks!

Comment by James Nunez (Inactive) [ 29/Jul/14 ]

On each OST, I ran 'lctl set_param fail_loc=0x1612'.

I will attach the output from "echo t > /proc/sysrq-trigger" on mds01.

On three of the OST consoles, I see:

Message from syslogd@c12 at Jul 28 19:47:46 ...
 kernel:LustreError: 25365:0:(ldlm_lib.c:1312:target_destroy_export()) ASSERTION( atomic_read(&exp->exp_cb_count) == 0 ) failed: value: 1

Message from syslogd@c12 at Jul 28 19:47:46 ...
 kernel:LustreError: 25365:0:(ldlm_lib.c:1312:target_destroy_export()) LBUG
Write failed: Broken pipe

On the first OST, in the crash vmcore-dmesg contains:

...
<6>Lustre: *** cfs_fail_loc=1612, val=0***
<6>Lustre: Skipped 40924 previous similar messages
<4>Lustre: 29337:0:(ofd_dev.c:391:ofd_lfsck_out_notify()) lscratch-OST0000: Foun
d crashed LAST_ID, deny creating new OST-object on the device until the LAST_ID 
rebuilt successfully.
<4>Lustre: 29337:0:(ofd_dev.c:391:ofd_lfsck_out_notify()) Skipped 1 previous sim
ilar message
<3>LustreError: 29233:0:(ofd_dev.c:1538:ofd_statfs_hdl()) lscratch-OST0001: stat
fs failed: rc = -28
<3>LustreError: 29233:0:(ofd_dev.c:1538:ofd_statfs_hdl()) Skipped 1 previous sim
ilar message
<3>LustreError: 29233:0:(ofd_dev.c:1538:ofd_statfs_hdl()) lscratch-OST0000: stat
fs failed: rc = -28
<3>LustreError: 29233:0:(ofd_dev.c:1538:ofd_statfs_hdl()) Skipped 1 previous sim
ilar message
<0>LustreError: 25409:0:(ldlm_lib.c:1312:target_destroy_export()) ASSERTION( ato
mic_read(&exp->exp_cb_count) == 0 ) failed: value: 1
<0>LustreError: 25409:0:(ldlm_lib.c:1312:target_destroy_export()) LBUG
<4>Pid: 25409, comm: obd_zombid
<4>
<4>Call Trace:
<4> [<ffffffffa0512895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4> [<ffffffffa0512e97>] lbug_with_loc+0x47/0xb0 [libcfs]
<4> [<ffffffffa08656cc>] target_destroy_export+0x10c/0x150 [ptlrpc]
<4> [<ffffffffa0ecd4ed>] ofd_destroy_export+0x3d/0x1a0 [ofd]
<4> [<ffffffffa0634033>] obd_zombie_impexp_cull+0x2d3/0x5d0 [obdclass]
<4> [<ffffffffa0634395>] obd_zombie_impexp_thread+0x65/0x190 [obdclass]
<4> [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
<4> [<ffffffffa0634330>] ? obd_zombie_impexp_thread+0x0/0x190 [obdclass]
<4> [<ffffffff8109abf6>] kthread+0x96/0xa0
<0>LustreError: 27990:0:(genops.c:847:class_export_put()) ASSERTION( __v > 0 && 
__v < ((int)0x5a5a5a5a5a5a5a5a) ) failed: value: 0
<0>LustreError: 27990:0:(genops.c:847:class_export_put()) LBUG
<4>Pid: 27990, comm: jbd2/sda4-8
<4>
<4>Call Trace:
<4> [<ffffffffa0512895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4> [<ffffffffa0512e97>] lbug_with_loc+0x47/0xb0 [libcfs]
<4> [<ffffffffa0630ed1>] class_export_put+0x251/0x2c0 [obdclass]
<4> [<ffffffffa08f6b25>] tgt_cb_new_client+0x155/0x320 [ptlrpc]
<4> [<ffffffffa0d9d6c4>] osd_trans_commit_cb+0xb4/0x390 [osd_ldiskfs]
<4> [<ffffffffa04c26fa>] ldiskfs_journal_commit_callback+0x8a/0xc0 [ldiskfs]
<4> [<ffffffffa046f74f>] jbd2_journal_commit_transaction+0x10df/0x1500 [jbd2]
<4> [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
<4> [<ffffffff81084a1b>] ? try_to_del_timer_sync+0x7b/0xe0
<4> [<ffffffffa0474a48>] kjournald2+0xb8/0x220 [jbd2]
<4> [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
<4> [<ffffffffa0474990>] ? kjournald2+0x0/0x220 [jbd2]
<4> [<ffffffff8109abf6>] kthread+0x96/0xa0
<4> [<ffffffff8100c20a>] child_rip+0xa/0x20
<4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0
<4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
<4>
<0>Kernel panic - not syncing: LBUG
<4>Pid: 27990, comm: jbd2/sda4-8 Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #
1
<4>Call Trace:
<4> [<ffffffff8152859c>] ? panic+0xa7/0x16f
<4> [<ffffffffa0512eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
<4> [<ffffffffa0630ed1>] ? class_export_put+0x251/0x2c0 [obdclass]
<4> [<ffffffffa08f6b25>] ? tgt_cb_new_client+0x155/0x320 [ptlrpc]
<4> [<ffffffffa0d9d6c4>] ? osd_trans_commit_cb+0xb4/0x390 [osd_ldiskfs]
<4> [<ffffffffa04c26fa>] ? ldiskfs_journal_commit_callback+0x8a/0xc0 [ldiskfs]
<4> [<ffffffffa046f74f>] ? jbd2_journal_commit_transaction+0x10df/0x1500 [jbd2]
<4> [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
<4> [<ffffffff81084a1b>] ? try_to_del_timer_sync+0x7b/0xe0
<4> [<ffffffffa0474a48>] ? kjournald2+0xb8/0x220 [jbd2]
<4> [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
<4> [<ffffffffa0474990>] ? kjournald2+0x0/0x220 [jbd2]
<4> [<ffffffff8109abf6>] ? kthread+0x96/0xa0
<4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20
<4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0
<4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
Comment by James Nunez (Inactive) [ 29/Jul/14 ]

Output from "echo t > /proc/sysrq-trigger" on mds01 is attached.

Comment by nasf (Inactive) [ 30/Jul/14 ]

The failure looks like that:

1) The LFSCK found crashed LAST_ID files on the OSTs, that is unexpected.

2) When the LFSCK rebuilding LAST_ID files on the OSTs, it will forbid to create new OST objects. After the rebuilding, it will notify the MDS to reconnect to sync up the new last_id(s). Unfortunately, the OST hit LBUG() that looks very like LU-4688/LU-5377.

3) On the other hand, the LFSCK instance on the MDTs were scanning the system, and need to talk with the OSTs for layout verification. But because related OST(s) have crashed, related RPCs were blocked. So from the user space view, the LFSCK looks hung.

So there are two issues to be resolved:
1) Why the LAST_ID files crashed on the OSTs since we did not inject fault for that.
2) Try to resolve LU-4688/LU-5377.

Comment by nasf (Inactive) [ 01/Aug/14 ]

James, would you please to try this patch: http://review.whamcloud.com/11304

During the test, please check:
1) On the MDS side, whether there are the messages, like "LustreError: 13330:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:1515870810"
2) Nn the OST side, whether there are the messages, like "Lustre: 29337:0:(ofd_dev.c:391:ofd_lfsck_out_notify()) lscratch-OST0000: Found crashed LAST_ID, deny creating new OST-object on the device until the LAST_ID rebuilt successfully."
3) On the OST side, whether there are LBUG, like "LustreError: 25409:0:(ldlm_lib.c:1312:target_destroy_export()) ASSERTION( atomic_read(&exp->exp_cb_count) == 0 ) failed: value: 1"

Please collect the "lctl debug" message on both MDS and OSS, that contain the "lfsck" component logs.

Thanks!

Comment by James Nunez (Inactive) [ 01/Aug/14 ]

I started testing patch http://review.whamcloud.com/11304, but bad things happened right away. I think the problem is that I ran lfsck_start with sanity 17m (e2fsck) running at the same time. Obviously, this is a bad combination of things to run at the same time; It hung the client running sanity, hung lfsck_start (yet still killable) and crashed at least one OST. I'm going to try and revive the system, possibly reinstall the OS/Lustre, but I've included some details below in case this isn't user error. I have the debug logs from the MDS if you'd like to see them.

I loaded build 25553 and configured the system with the same configuration; two MDSs with two MDTs each, four OSSs with two OSTs each, and five clients. Once everything was up and mounted, I ran sanity tests 0 - 20. While this was running, I was checking the debug commands to make sure I could collect the debug messages on the MDS so, I did the following:

lctl clear
lctl debug_daemon start /tmp/lfsck_log_1 1024
lctl mark LFSCK_START
lctl lfsck_start -M scratch-MDT0000 -A --reset --type layout

Normally, lfsck_start returns quickly, but this just hung. I checked the client running sanity and saw that it was hung in test 17m. The last things on the console running sanity was:

== sanity test 17m: run e2fsck against MDT which contains short/long symlink == 13:26:27 (1406924787)
create 512 short and long symlink files under /lustre/scratch/d17m.sanitym
erase them
Waiting for local destroys to complete
recreate the 512 symlink files with a shorter string
stop and checking mds1: e2fsck -fnvd /dev/lvm-sdc/MDT0
Stopping /lustre/scratch/mdt0 (opts:) on mds01
pdsh@c19: mds01: ssh exited with exit code 1
mds01: e2fsck 1.42.7.wc1 (12-Apr-2013)
mds01: device /dev/dm-4 mounted by lustre per /proc/fs/lustre/mgs/MGS/mntdev
mds01: [QUOTA WARNING] Usage inconsistent for ID 0:actual (4018176, 1311) != expected (36864, 4)
mds01: [QUOTA WARNING] Usage inconsistent for ID 500:actual (0, 2) != expected (0, 0)
mds01: [QUOTA WARNING] Usage inconsistent for ID 0:actual (4018176, 1311) != expected (36864, 4)
mds01: [QUOTA WARNING] Usage inconsistent for ID 500:actual (0, 2) != expected (0, 0)
pdsh@c19: mds01: ssh exited with exit code 4
Warning!  /dev/lvm-sdc/MDT0 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (97217713, counted=97216741).
Fix? no

Free inodes count wrong (262143987, counted=262142678).
Fix? no

Update quota info for quota type 0? no

Update quota info for quota type 1? no


scratch-MDT0000: ********** WARNING: Filesystem still has errors **********


          13 inodes used (0.00%, out of 262144000)
           7 non-contiguous files (53.8%)
           1 non-contiguous directory (7.7%)
             # of inodes with ind/dind/tind blocks: 2/0/0
    33854287 blocks used (25.83%, out of 131072000)
           0 bad blocks
           2 large files

         137 regular files
         151 directories
           0 character device files
           0 block device files
           0 fifos
           0 links
        1025 symbolic links (513 fast symbolic links)
           0 sockets
------------
        1313 files
Starting mds1:   /dev/lvm-sdc/MDT0 /lustre/scratch/mdt0
mds01: mount.lustre: according to /etc/mtab /dev/mapper/lvm--sdc-MDT0 is already mounted on /lustre/scratch/mdt0
pdsh@c19: mds01: ssh exited with exit code 17
Start of /dev/lvm-sdc/MDT0 on mds1 failed 17

and that's where it's still sitting at.

The client's dmesg for this test is:

Lustre: DEBUG MARKER: == sanity test 17m: run e2fsck against MDT which contains short/long symlink == 13:26:27 (1406924787)
LustreError: 11-0: scratch-MDT0002-mdc-ffff88082de27000: Communicating with 192.168.2.125@o2ib, operation mds_statfs failed with -107.
Lustre: scratch-MDT0002-mdc-ffff88082de27000: Connection to scratch-MDT0002 (at 192.168.2.125@o2ib) was lost; in progress operations using this service will wait for recovery to complete
LustreError: 11-0: scratch-MDT0000-mdc-ffff88082de27000: Communicating with 192.168.2.125@o2ib, operation obd_ping failed with -107.
Lustre: scratch-MDT0003-mdc-ffff88082de27000: Connection to scratch-MDT0003 (at 192.168.2.126@o2ib) was lost; in progress operations using this service will wait for recovery to complete
LustreError: Skipped 1 previous similar message
Lustre: 31289:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1406926091/real 1406926091]  req@ffff8808086d8800 x1475266939483612/t0(0) o400->scratch-OST0001-osc-ffff88082de27000@192.168.2.111@o2ib:28/4 lens 224/224 e 0 to 1 dl 1406926098 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: scratch-OST0006-osc-ffff88082de27000: Connection to scratch-OST0006 (at 192.168.2.114@o2ib) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 1 previous similar message
Lustre: 31289:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
Lustre: 31289:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1406926091/real 1406926091]  req@ffff880807c5b400 x1475266939483608/t0(0) o400->scratch-OST0000-osc-ffff88082de27000@192.168.2.111@o2ib:28/4 lens 224/224 e 0 to 1 dl 1406926098 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: scratch-OST0000-osc-ffff88082de27000: Connection to scratch-OST0000 (at 192.168.2.111@o2ib) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 6 previous similar messages
Lustre: 31281:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1406926098/real 1406926098]  req@ffff880807c5bc00 x1475266939483672/t0(0) o8->scratch-OST0001-osc-ffff88082de27000@192.168.2.111@o2ib:28/4 lens 400/544 e 0 to 1 dl 1406926104 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 31281:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1406926124/real 1406926124]  req@ffff880807dc6c00 x1475266939483728/t0(0) o8->scratch-OST0007-osc-ffff88082de27000@192.168.2.114@o2ib:28/4 lens 400/544 e 0 to 1 dl 1406926135 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 31281:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 7 previous similar messages
Lustre: 31281:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1406926149/real 1406926149]  req@ffff880807555c00 x1475266939483780/t0(0) o8->scratch-OST0007-osc-ffff88082de27000@192.168.2.114@o2ib:28/4 lens 400/544 e 0 to 1 dl 1406926165 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 31281:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 7 previous similar messages
Lustre: 31281:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1406926174/real 0]  req@ffff880807bafc00 x1475266939483832/t0(0) o8->scratch-OST0007-osc-ffff88082de27000@192.168.2.114@o2ib:28/4 lens 400/544 e 0 to 1 dl 1406926195 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 31281:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 7 previous similar messages
LNetError: 31275:0:(o2iblnd_cb.c:3017:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 11 seconds
LNetError: 31275:0:(o2iblnd_cb.c:3080:kiblnd_check_conns()) Timed out RDMA with 192.168.2.111@o2ib (86): c: 0, oc: 0, rc: 8
LNetError: 31275:0:(o2iblnd_cb.c:3017:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 12 seconds
LNetError: 31275:0:(o2iblnd_cb.c:3080:kiblnd_check_conns()) Timed out RDMA with 192.168.2.112@o2ib (87): c: 0, oc: 0, rc: 8
Lustre: 31281:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1406926249/real 0]  req@ffff880807ecdc00 x1475266939483924/t0(0) o8->scratch-OST0007-osc-ffff88082de27000@192.168.2.114@o2ib:28/4 lens 400/544 e 0 to 1 dl 1406926275 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 31281:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 7 previous similar messages
Lustre: scratch-OST0000-osc-ffff88082de27000: Connection restored to scratch-OST0000 (at 192.168.2.111@o2ib)

The client is now hung/not responding.

On the first MDS, dmesg contains:

Lustre: DEBUG MARKER: == sanity test 17m: run e2fsck against MDT which contains short/long symlink == 13:26:27 (1406924787)
Lustre: Failing over scratch-MDT0002
Lustre: server umount scratch-MDT0002 complete
LustreError: 137-5: scratch-MDT0002_UUID: not available for connect from 192.168.2.115@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 137-5: scratch-MDT0002_UUID: not available for connect from 192.168.2.116@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: 11-0: scratch-MDT0002-osp-MDT0000: Communicating with 0@lo, operation obd_ping failed with -107.
LustreError: Skipped 1 previous similar message
Lustre: scratch-MDT0002-osp-MDT0000: Connection to scratch-MDT0002 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
LustreError: 137-5: scratch-MDT0002_UUID: not available for connect from 192.168.2.117@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 160 previous similar messages
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_1 (1048576kB max)

I killed lfsck_start and that worked.

The OST taht crashed has the following in dmesg:

<4>Lustre: DEBUG MARKER: == sanity test 17m: run e2fsck against MDT which contai
ns short/long symlink == 13:26:27 (1406924787)
<3>LustreError: 11-0: scratch-MDT0002-lwp-OST0003: Communicating with 192.168.2.
125@o2ib, operation obd_ping failed with -107.
<4>Lustre: scratch-MDT0002-lwp-OST0002: Connection to scratch-MDT0002 (at 192.16
8.2.125@o2ib) was lost; in progress operations using this service will wait for 
recovery to complete
<3>LustreError: Skipped 1 previous similar message
<4>Lustre: Failing over scratch-OST0003
<1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
<1>IP: [<ffffffffa062f2f1>] class_export_get+0x11/0x90 [obdclass]
<4>PGD 0 
<4>Oops: 0002 [#1] SMP 
<4>last sysfs file: /sys/devices/system/cpu/online
<4>CPU 0 
<4>Modules linked in: osp(U) ofd(U) lfsck(U) ost(U) mgc(U) osd_ldiskfs(U) lquota
(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(
U) lnet(U) sha512_generic sha256_generic crc32c_intel libcfs(U) ldiskfs(U) jbd2 
nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi
_cpufreq freq_table mperf nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack
 iptable_filter ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_
cm iw_cm ib_addr ipv6 iTCO_wdt iTCO_vendor_support microcode serio_raw i2c_i801 
i2c_core sg mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core e1000e ptp pps_core l
pc_ich mfd_core ioatdma dca i7core_edac edac_core shpchp ext3 jbd mbcache sd_mod
 crc_t10dif pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mo
d [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 29246, comm: lfsck Not tainted 2.6.32-431.20.3.el6_lustre.g6720a88.x86_6
4 #1 Supermicro X8DTT-H/X8DTT-H
<4>RIP: 0010:[<ffffffffa062f2f1>]  [<ffffffffa062f2f1>] class_export_get+0x11/0x
90 [obdclass]
<4>RSP: 0018:ffff8807da509b90  EFLAGS: 00010286
<4>RAX: ffff8807d9485084 RBX: 0000000000000000 RCX: ffff8807da509bbe
<4>RDX: 0000000000000000 RSI: ffff8807da509bb0 RDI: 0000000000000000
<4>RBP: ffff8807da509ba0 R08: 0000000000000030 R09: 000000000000001b
<4>R10: 000000000000000f R11: 000000000000000f R12: ffff8807cfb8eab8
<4>R13: ffff8807d9219044 R14: ffff8807d9485078 R15: ffff8807cfb8eac8
<4>FS:  0000000000000000(0000) GS:ffff880045c00000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 0000000000000040 CR3: 000000082a3a4000 CR4: 00000000000007f0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process lfsck (pid: 29246, threadinfo ffff8807da508000, task ffff8807da404080
)
<4>Stack:
<4> 0000000000000000 ffff8807da509bb0 ffff8807da509c20 ffffffffa068f1fc
<4><d> 2d68637461726373 003030303054444d ffff8807da509c20 0000000000000246
<4><d> 0068637461726373 ffff8807d9f9ecc0 ffff8807d9f9ecc0 000000000894fadf
<4>Call Trace:
<4> [<ffffffffa068f1fc>] lustre_find_lwp_by_index+0xec/0x1a0 [obdclass]
<4> [<ffffffffa0e7c235>] lfsck_layout_slave_notify_master+0x2b5/0x900 [lfsck]
<4> [<ffffffffa0e7d54b>] lfsck_layout_slave_double_scan+0xccb/0x1040 [lfsck]
<4> [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
<4> [<ffffffffa0e4d170>] lfsck_double_scan+0x50/0x1d0 [lfsck]
<4> [<ffffffffa0e5b974>] lfsck_master_engine+0x134/0x6f0 [lfsck]
<4> [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
<4> [<ffffffffa0e5b840>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
<4> [<ffffffff8109abf6>] kthread+0x96/0xa0
<4> [<ffffffff8100c20a>] child_rip+0xa/0x20
<4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0
<4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
<4>Code: 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 
<6>Lustre: scratch-OST0003: Not available for connect from 192.168.2.125@o2ib (s
topping)
<4>00 00 e8 52 ff ff ff c9 c3 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 89 fb
 <f0> ff 47 40 f6 05 88 54 f1 ff 40 74 63 f6 05 7b 54 f1 ff 20 74 
<1>RIP  [<ffffffffa062f2f1>] class_export_get+0x11/0x90 [obdclass]
<4> RSP <ffff8807da509b90>
<4>CR2: 0000000000000040

Again, I have the output from 'lctl debug' for the first MDS is that would help.

Comment by nasf (Inactive) [ 01/Aug/14 ]

James, thanks for your quick respond. But we cannot run e2fsck together with LFSCK. Because:

1) The e2fsck will check/repair the system with offline mode, but LFSCK will run as online mode.
2) LFSCK depends on e2fsck to guarantee the local consistency. If the test scripts inject some local inconsistency for testing e2fsck, then LFSCK maybe fall into some unknown status.

So would you please try the original test as you did according to the LFSCK II test plan? I want to make sure whether the patch has fixed the original issues or not. Thanks!

Comment by James Nunez (Inactive) [ 05/Aug/14 ]

I ran the patch with the workload described above and configuration above. With the patch, the removes completed and the 2.2 test from the test plan completed. Yet, 'lctl lfsck_start' is stuck in 'status: scanning-phase1' and the value in 'checked_phase1' is not increasing. So, lfsck_start is hung.

I don't see any errors or LBUGs on any of the OSTs. There are no "found crashed LAST_ID" messages on any of the OSTs. The only messages containing 'LAST_ID' in the OST logs are:

00100000:10000000:1.0:1407249340.116850:0:27015:0:(lfsck_engine.c:463:lfsck_master_engine()) LFSCK exit: oit_flags = 0x60003, dir_flags = 0x8004, oit_cookie = 52572, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 27015, rc = 1
00100000:10000000:1.0:1407249340.116855:0:27015:0:(lfsck_layout.c:1049:lfsck_layout_lastid_store()) scratch-OST0005-osd: layout LFSCK will sync the LAST_ID for <seq> 0x0 as <oid> 77121
00100000:10000000:1.0:1407249340.116875:0:27015:0:(lfsck_layout.c:1049:lfsck_layout_lastid_store()) scratch-OST0005-osd: layout LFSCK will sync the LAST_ID for <seq> 0x380000400 as <oid> 33
00100000:10000000:1.0:1407249340.116882:0:27015:0:(lfsck_layout.c:1049:lfsck_layout_lastid_store()) scratch-OST0005-osd: layout LFSCK will sync the LAST_ID for <seq> 0x380000401 as <oid> 2529

On mds01, dmesg contains:

Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_1 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1906:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936577
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
...
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
LustreError: 1906:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936577
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000440000400-0x0000000480000400):1:mdt
__ratelimit: 89 callbacks suppressed
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
...
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
__ratelimit: 106 callbacks suppressed
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
...
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
__ratelimit: 85 callbacks suppressed
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
...
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000480000400-0x00000004c0000400):2:mdt
Lustre: cli-ctl-scratch-MDT0002: Allocated super-sequence [0x0000000480000400-0x00000004c0000400):2:mdt]
__ratelimit: 39 callbacks suppressed
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
...
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x00000004c0000400-0x0000000500000400):3:mdt
__ratelimit: 23 callbacks suppressed
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
...
debug daemon buffer overflowed; discarding 10% of pages (41 of 409)
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
LustreError: 1976:0:(class_obd.c:398:class_handle_ioctl()) OBD ioctl : No Device -12066
INFO: task lfsck:1924 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.g044e468.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 0000000000000002     0  1924      2 0x00000080
 ffff8810150bd9d0 0000000000000046 ffff8810150bd970 ffffffffa057b6f5
 0000000100000000 ffffc9002260d030 0000000000000246 0000000000000246
 ffff881031607ab8 ffff8810150bdfd8 000000000000fbc8 ffff881031607ab8
Call Trace:
 [<ffffffffa057b6f5>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06ce434>] ? htable_lookup+0x1c4/0x1e0 [obdclass]
 [<ffffffffa06cf598>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf88f>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f51bd4>] lfsck_layout_scan_stripes+0x414/0x1c40 [lfsck]
 [<ffffffffa0e19c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5396b>] lfsck_layout_master_exec_oit+0x56b/0xd00 [lfsck]
 [<ffffffffa0e4282a>] ? osd_otable_it_next+0x14a/0x700 [osd_ldiskfs]
 [<ffffffffa0f2e760>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f393aa>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f3abaa>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f3a840>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

I will upload the kernel log from the MDS in a minute.

EDIT: The kernel log is 1.1 GB in size. I can't upload a file that large. Also, the log buffere was not large enough to hold all the output. If you need more information, I can run things again and dump the log more frequently.

The only LFSCK messages from the kernel log that were captured are:

00100000:10000000:0.0:1407249133.994936:0:1924:0:(lfsck_layout.c:4424:lfsck_layout_master_checkpoint()) scratch-MDT0000-osd: layout LFSCK master checkpoint at the pos [203446623]: rc = 0
00000100:00100000:3.0F:1407249134.517912:0:1929:0:(client.c:1480:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475009860:0@lo:1102
00000100:00100000:3.0:1407249134.517936:0:1929:0:(client.c:1480:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475009864:192.168.2.126@o2ib:1102
00000100:00100000:3.0:1407249134.517993:0:1929:0:(client.c:1863:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475009860:0@lo:1102
00000100:00100000:3.0:1407249134.518193:0:1929:0:(client.c:1863:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475009864:192.168.2.126@o2ib:1102
00000100:00100000:13.0:1407249164.522174:0:1929:0:(client.c:1480:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475388100:0@lo:1102
00000100:00100000:13.0:1407249164.522257:0:1929:0:(client.c:1863:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc lfsck_layout:scratch-MDT0002-mdtlov_UUID:1929:1475559475388100:0@lo:1102
00000001:02000400:13.0F:1407249170.616979:0:1947:0:(debug.c:345:libcfs_debug_mark_buffer()) DEBUG MARKER: LFSCK_END

Please let me know if more information from the log would help.

Comment by nasf (Inactive) [ 06/Aug/14 ]

I have updated the patch, hope to repair the issue on the OST "00100000:10000000:1.0:1407249340.116855:0:27015:0:(lfsck_layout.c:1049:lfsck_layout_lastid_store()) scratch-OST0005-osd: layout LFSCK will sync the LAST_ID for <seq> 0x0 as <oid> 77121"

Comment by James Nunez (Inactive) [ 06/Aug/14 ]

I tired the latest version of the patch and lfsck_start still hangs, but I don't see any error messages on the OSTs.

Dmesg from mds01 has:

LustreError: 1911:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936587
LustreError: 1924:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680
LustreError: 1924:0:(lustre_idl.h:775:ostid_to_fid()) Skipped 1 previous similar message
…
LustreError: 1972:0:(class_obd.c:398:class_handle_ioctl()) OBD ioctl : No Device -12066
…
lfsck_start hung and
INFO: task lfsck:1922 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.g044e468.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 0000000000000002     0  1922      2 0x00000080
 ffff880b294799d0 0000000000000046 ffff880b29479970 ffffffffa057b6f5
 0000000100000000 ffffc900212e2030 0000000000000246 0000000000000246
 ffff8810314d65f8 ffff880b29479fd8 000000000000fbc8 ffff8810314d65f8
Call Trace:
 [<ffffffffa057b6f5>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06ce434>] ? htable_lookup+0x1c4/0x1e0 [obdclass]
 [<ffffffffa06cf598>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf88f>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f51bd4>] lfsck_layout_scan_stripes+0x414/0x1c40 [lfsck]
 [<ffffffffa0e19c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5396b>] lfsck_layout_master_exec_oit+0x56b/0xd00 [lfsck]
 [<ffffffffa0e42ac9>] ? osd_otable_it_next+0x3e9/0x700 [osd_ldiskfs]
 [<ffffffffa0f2e760>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f393aa>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f3abaa>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f3a840>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck:1922 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.20.3.el6_lustre.g044e468.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck         D 0000000000000002     0  1922      2 0x00000080
 ffff880b294799d0 0000000000000046 ffff880b29479970 ffffffffa057b6f5
 0000000100000000 ffffc900212e2030 0000000000000246 0000000000000246
 ffff8810314d65f8 ffff880b29479fd8 000000000000fbc8 ffff8810314d65f8
Call Trace:
 [<ffffffffa057b6f5>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
 [<ffffffffa06ce434>] ? htable_lookup+0x1c4/0x1e0 [obdclass]
 [<ffffffffa06cf598>] lu_object_find_at+0xa8/0x360 [obdclass]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06cf88f>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f51bd4>] lfsck_layout_scan_stripes+0x414/0x1c40 [lfsck]
 [<ffffffffa0e19c68>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
 [<ffffffffa0f5396b>] lfsck_layout_master_exec_oit+0x56b/0xd00 [lfsck]
 [<ffffffffa0e42ac9>] ? osd_otable_it_next+0x3e9/0x700 [osd_ldiskfs]
 [<ffffffffa0f2e760>] lfsck_exec_oit+0x70/0x9e0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f393aa>] lfsck_master_oit_engine+0x41a/0x18b0 [lfsck]
 [<ffffffff81528cae>] ? thread_return+0x4e/0x760
 [<ffffffffa0f3abaa>] lfsck_master_engine+0x36a/0x6f0 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f3a840>] ? lfsck_master_engine+0x0/0x6f0 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

And on mds02:

LustreError: 1520:0:(lustre_idl.h:775:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680
LustreError: 1533:0:(lustre_idl.h:805:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484674 ost_idx:0
Comment by nasf (Inactive) [ 07/Aug/14 ]

Thanks James! Seems the OST trouble has been resolve, but we still need to figure out the reason for ostid_to_fid() failure on the MDS side. Do you have the kernel debug log on the MDS? And the kernel stack log via "echo t > /proc/sys/lnet/debug" on the MDS.Thanks!

BTW, is the layout LFSCK run repeatedly or just run once in your test? Have you ever finished one cycle LFSCK before the hung? Or the LFSCK will hang there just when the first LFSCK run?

Comment by nasf (Inactive) [ 07/Aug/14 ]

http://review.whamcloud.com/#/c/11304/ set5 has been committed. James, please retry. Thanks!

Hope that the message on the MDS about "ostid_to_fid()) bad ost_idx..." can be fixed by the new patch.

Comment by James Nunez (Inactive) [ 07/Aug/14 ]

I am able to run 'lctl lfsck_start' more than once before it hangs. For the last results reported, I ran lfsck_start at least three times and, the run that never completed, I used the speed limit parameter to slow LFSCK down.

I'll get results for the new patch later today.

Comment by James Nunez (Inactive) [ 07/Aug/14 ]

I ran with the latest patch. I no longer see the "ostid_to_fid()) bad ost_idx..." message on the MDS.

I ran the remove and creating files with unmatched pairs workload and then started logging and running ‘lctl lfsck_start’ on the MDS. That LFSCK completed and stopped the kernel log. I started LFSCK again with a speed limit, ‘-s 100’, and repeatedly checked to see if LFSCK completed. I dumped the kernel log since I didn’t want to miss any output (07Aug_lfsck_log1.txt). I started the kernel log again and, when I checked LFSCK’s progress, I noticed it was stuck in ‘scanning-phase1’. I dumped the kernel log again (07Aug_lfsck_log2.txt.gz, had to gzip since file too large for uploading) and issued ‘echo t > /proc/sysrq-trigger’, but the node was unresponsive. The node then crashed. I was able to execute ‘echo t > /proc/sysrq-trigger’ (07Aug_messages.txt) once the node restarted.

The OST logs are clean.

Comment by nasf (Inactive) [ 08/Aug/14 ]

Thanks James!

There is potential deadlock race condition between object
destroy and layout LFSCK. Consider the following scenario:

1) The LFSCK thread obtained the parent object firstly, at
that time, the parent object has not been destroyed yet.

2) One RPC service thread destroyed the parent and all its
children objects. Because the LFSCK is referencing the
parent object, then the parent object will be marked as
dying in RAM. On the other hand, the parent object is
referencing all its children objects, then all children
objects will be marked as dying in RAM also.

3) The LFSCK thread tries to find some child object with
the parent object referenced. Then it will find that the
child object is dying. According to the object visibility
rules: the object with dying flag cannot be returned to
others. So the LFSCK thread has to wait until the dying
object has been purged from RAM, then it can allocate a
new object (with the same FID) in RAM. Unfortunately, the
LFSCK thread itself is referencing the parent object, and
cause the parent object cannot be purged, then cause the
child object cannot be purged also. So the LFSCK thread
will fall into deadlock.

We introduce non-blocked version lu_object_find() to allow
the LFSCK thread to return failure immediately (instead of
wait) when it finds dying (child) object, then the LFSCK
thread can check whether the parent object is dying or not.
So avoid above deadlock.

Here is the patch:
http://review.whamcloud.com/11373

Please apply it together with another patch http://review.whamcloud.com/#/c/11304/

Hope the hung issues can be resolved this time!

Comment by James Nunez (Inactive) [ 12/Aug/14 ]

I tested the patch composed of latest master with http://review.whamcloud.com/#/c/11373/1 applied and then http://review.whamcloud.com/#/c/11304/ applied. I ran the combination of creating and deleting files twice and 'lctl lfsck_start' completed each time.

During the first run of file removes and creates, I did see the 'bad ost_idx. message on the first MDS. Here's dmesg from mds01 for this test run:

Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000380000400-0x00000003c0000400):5:ost
Lustre: Skipped 2 previous similar messages
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000400000400-0x0000000440000400):0:mdt
Lustre: Skipped 1 previous similar message
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_1 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_2 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_3 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1192:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936582
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000440000400-0x0000000480000400):1:mdt
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_4 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000480000400-0x00000004c0000400):2:mdt
Lustre: cli-ctl-scratch-MDT0002: Allocated super-sequence [0x0000000480000400-0x00000004c0000400):2:mdt]
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_5 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1210:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680
LustreError: 1210:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x00000004c0000400-0x0000000500000400):3:mdt
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
LustreError: 1316:0:(class_obd.c:398:class_handle_ioctl()) OBD ioctl : No Device -12066
LustreError: 1323:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936582
LustreError: 1323:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484672 ost_idx:0
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x700000000:1024 ost_idx:0
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x700000000:1024 ost_idx:0
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x200000000:2147484674 ost_idx:0
LustreError: 1323:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message

At this point, all file creates and deletes has stopped. So, I started creating file again to run the test one more time. At that point, it looked like 'lfsck_start' was hung, but it looks like the server was just really busy and, eventually, 'lfsck_start' completed. Here's what dmesg had in it during this busy period:

LustreError: 11-0: scratch-OST0005-osc-MDT0000: Communicating with 192.168.2.113@o2ib, operation ost_statfs failed with -107.
Lustre: mdt_readpage: This server is not able to keep up with request traffic (cpu-bound).
Lustre: 746:0:(service.c:1509:ptlrpc_at_check_timed()) earlyQ=2 reqQ=0 recA=2, svcEst=31, delay=0(jiff)
Lustre: 746:0:(service.c:1306:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-510s), not sending early reply. Consider increasing at_early_margin (5)?  req@ffff880ecbf84800 x1476250892354708/t4302517158(0) o35->98357ec6-eddc-7db9-b827-938060f0c85f@192.168.2.115@o2ib:0/0 lens 392/696 e 1 to 0 dl 1407874953 ref 2 fl Interpret:/0/0 rc 0/0
Lustre: 1250:0:(service.c:2118:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (31:510s); client may timeout.  req@ffff88091bcd8400 x1476250907062608/t4302517159(0) o35->e641b843-fa22-0bd9-70de-06ac9836f5d3@192.168.2.119@o2ib:0/0 lens 392/424 e 1 to 0 dl 1407874953 ref 1 fl Complete:/0/0 rc 0/0
Lustre: scratch-OST0007-osc-MDT0002: Connection to scratch-OST0007 (at 192.168.2.114@o2ib) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 1 previous similar message
LNet: Service thread pid 1041 completed after 540.62s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
LustreError: 167-0: scratch-OST0004-osc-MDT0002: This client was evicted by scratch-OST0004; in progress operations using this service will fail.
Lustre: scratch-OST0004-osc-MDT0002: Connection restored to scratch-OST0004 (at 192.168.2.113@o2ib)
Lustre: scratch-OST0005-osc-MDT0000: Connection to scratch-OST0005 (at 192.168.2.113@o2ib) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 17 previous similar messages
LustreError: 977:0:(osp_precreate.c:466:osp_precreate_send()) scratch-OST0005-osc-MDT0000: can't precreate: rc = -107
LustreError: 167-0: scratch-OST0005-osc-MDT0000: This client was evicted by scratch-OST0005; in progress operations using this service will fail.
LustreError: Skipped 18 previous similar messages
LustreError: Skipped 73 previous similar messages
Lustre: scratch-OST0005-osc-MDT0000: Connection restored to scratch-OST0005 (at 192.168.2.113@o2ib)
Lustre: Skipped 18 previous similar messages

On the second MDS, the dmesg looked similar:

Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_20 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936584
LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_21 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680
LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484672 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x500000000:3221226498 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x400000000:3221226498 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...

I did the same workload a second time just to make sure I didn't just get lucky. 'lctl lfsck_start' completed each tie I started it and dmesg on the first MDS looks similar to the first test:

Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_20 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:4294936584
LustreError: 1450:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...
Lustre: debug daemon will attempt to start writing to /tmp/lfsck_log_21 (1048576kB max)
Lustre: DEBUG MARKER: LFSCK_START
LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) bad ost_idx, 0x0:0 ost_idx:3735879680
LustreError: 1527:0:(lustre_idl.h:770:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x300000000:2147484672 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x500000000:3221226498 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x600000000:1026 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) bad MDT0 id, 0x400000000:3221226498 ost_idx:0
LustreError: 1527:0:(lustre_idl.h:800:ostid_to_fid()) Skipped 1 previous similar message
Lustre: DEBUG MARKER: LFSCK_END
Lustre: shutting down debug daemon thread...

I have kernel logs from the first MDS for the periods between where you see "LFSCK_START" and "LFSCK_END" in the dmesg logs above. Please let me know if you would like to see these logs.

Comment by nasf (Inactive) [ 13/Aug/14 ]

James, do you have the "lctl dk" logs? Thanks!

Comment by James Nunez (Inactive) [ 13/Aug/14 ]

Yes, I have the logs and will upload them this morning.

Comment by nasf (Inactive) [ 13/Aug/14 ]

So you still hit once LFSCK hung with the latest patches applied, right? Or such hung was temporarily and the LFSCK finished finally? If you have the kernel stack dump log "echo t > /proc/sysrq-trigger" at that time point, then it will be very helpful. Thanks!

Comment by James Nunez (Inactive) [ 13/Aug/14 ]

I've uploaded the logs to /scratch/ftp/uploads/LU-5395. Here what I did yesterday and the logs that were taken:

started remove of directories containing 100,000 files each and started running test 2.2 from the LFSCK test plan, create files with unmatched pairs. Start kernel log and then LFSCK:

lctl clear
lctl debug_daemon start /tmp/lfsck_log_1 1024
lctl mark LFSCK_START
lctl lfsck_start -M scratch-MDT0000 -A --reset --type layout -s 100

I stopped LFSCK because the speed limit of 100 was too slow. Since the workload of remove files and create files with unmatched pairs was still running, I stopped the kernel log, called lfsck_log_1.txt in uploads directory, and restarted the kernel logging since I didn't want to lose information from the log.

lctl lfsck_stop -M scratch-MDT0000 -A
lctl mark LFSCK_END
lctl debug_daemon stop
lctl clear
lctl debug_daemon start /tmp/lfsck_log_2 1024
lctl mark LFSCK_START
lctl lfsck_start -M scratch-MDT0000 -A --reset --type layout

Again, 'lfsck_start' completed, but workload was still running. So, I stopped the kernel log, lfsck_log_2.txt in uploads, and started the kernel log and 'lfsck_start' again:

lctl mark LFSCK_END
   31  lctl debug_daemon stop
   32  lctl clear
   33  lctl debug_daemon start /tmp/lfsck_log_3 1024
   34  lctl mark LFSCK_START
   35  lctl lfsck_start -M scratch-MDT0000 -A --reset --type layout

I did this two more times; stop the log, start the log and start 'lctl lfsck_start'. Each time, lfsck_start completed with no hangs. The logs are lfsck_log_3.txt, lfsck_log_4.txt, and lfsck_log_5.txt in uploads. lfsck_log_4.txt and lfsck_log_5.txt capture one run of lfsck_start because I used the '-s 100' flag to see if I could get lfsck_start to hang with the speed limit flag. It did not hang.

At this point, the file create and delete workload ended. I started up one more 'lctl lfsck_start' to clean up and issues that were left behind in the file system. Since I couldn't hand lfsck_start, I thought I should try again and I started creating 100,000 files in several directories again. At this point, I was watching the status of LFSCK at /proc/fs/lustre/mdd/scratch-MDT0000/lfsck_layout and it looked like things hung. So, I issued "echo t > /proc/sysrq-trigger" and the output is in messages in uploads.

Several minutes later, things started progressing again and lfsck_start finally completed. So, I started up the same workload of remove files and create files with unmatched pairs and ran lfsck_start twice, with a speed limit of 250 and 300, while this was running. The logs for these runs are in uploads at lfsck_log_20.txt and lfsck_log_21.txt. In both cases, LFSCK completed.

Comment by nasf (Inactive) [ 14/Aug/14 ]

Thanks James! Your log is very useful. I have found at least one bad code in LFSCK that will cause the trouble message "lustre_idl.h:770:ostid_to_fid()) bad ost_idx...".

I have updated the the two patches: http://review.whamcloud.com/11373 and http://review.whamcloud.com/#/c/11304/. Would you please to retry them? And please get the same logs as you did before if still hit the trouble message. Thanks!

Comment by James Nunez (Inactive) [ 14/Aug/14 ]

I’ve tested a build of master plus the latest versions of the two patches mentioned above. Things look good, but there are some issues that I’d like you to comment on.

I ran the workload explained above, ran ‘lctl lfsck_start’ and everything completed as expected. I didn't see any hangs, I do not see any messages about “bad ost_idx” on the MDSs nor on the OSSs dmesg. So, that is good news. It looks like these patches cleared up the issue I was seeing.

Since this was successful, I decided to do the same exact thing as above, but change one thing, the fail_loc value on all OSTs. I set fail_loc to “0x1614”, which should be multiple references. When I did this, I can’t get LFSCK to complete successfully. I see in the LFSCK output:

status: partial
flags: incomplete

I started LFSCK twice and both times, it would not complete and would stop with “status: partial”. I have logs for both of these runs.

I then tried to set fail_loc to “0x1610”, dangling references, but the script fails and I see the following in the OST dmesg:

LustreError: 25382:0:(ldlm_resource.c:1150:ldlm_resource_get()) scratch-OST0000: lvbo_init failed for resource 0x6a083:0x0: rc = -2
LustreError: 25382:0:(ldlm_resource.c:1150:ldlm_resource_get()) Skipped 46 previous similar messages
Lustre: *** cfs_fail_loc=1610, val=0***
LustreError: 25644:0:(ldlm_resource.c:1150:ldlm_resource_get()) scratch-OST0001: lvbo_init failed for resource 0x6a0e3:0x0: rc = -2
LustreError: 25644:0:(ldlm_resource.c:1150:ldlm_resource_get()) Skipped 168 previous similar messages

This looks like LU-5457.

I plan to reinstall Lustre on these nodes and run the workload creating the multiple references and dangling references. Should I use this series of patches or just use master? What would provide the most information? If I can recreate this problem, I’ll open a new ticket.

Thanks.

Comment by nasf (Inactive) [ 15/Aug/14 ]

It is good news that the original issues have been resolved.

As for the "status: partial", it means that during the LFSCK running, not all the object can be verified, then the LFSCK cannot make sure whether there are some inconsistency or not. For example, during the LFSCK processing, the communication between the MDT and the OST is broken, then the LFSCK on the MDT cannot get related information from the OST for consistency verification, then some OST-ojects have to be skipped. Under such case, the LFSCK will be marked as partial.

As for the message "lvbo_init failed for resource 0x6a083:0x0: rc = -2" during the dangling test, it is normal, because we inject error and cause related OST-objects are lost or not created, then if someone try to access related OST-objects before the LFSCK repairing them, then they will get the "-2" failure as you hit in the test.

If we have the Lustre logs, then we can analysis it for the detailed reasons.

About the error injection, the dangling reference "0x1610" is set on the OST when create, the multiple reference "0x1614" is set on the MDS when create. If you want to test, please apply the patches for the LU-5395. Thanks!

Comment by James Nunez (Inactive) [ 15/Aug/14 ]

Thank you for the explanation about what to expect with the different fail_loc values.

I have uploaded two logs to /scratch/ftp/uploads/LU-5395 for each of the LFSCK runs that returned with a status of "partial"; file names lfsck_log_40.txt and lfsck_log_41.txt. Please let me know if the logs reveal anything interesting on why LFSCK could not complete.

Comment by nasf (Inactive) [ 15/Aug/14 ]

A lot of failure message like that:
"(lfsck_layout.c:2906:lfsck_layout_scan_orphan()) scratch-MDT0000-osd: layout LFSCK assistant finished the orphan scanning for OST0003: rc = -3"

That means some failures on the OST side during the LFSCK. So do you have the Lustre logs on OST? Thanks!

Comment by James Nunez (Inactive) [ 15/Aug/14 ]

I suspect what is wrong on the OSTs is that I set 'fail_loc=0x1614' on them and that caused some problems.

The OSS kernel logs were collected after all the LFSCK runs. So, it may not be obvious where one LFSCK ends and the next one starts. I do see several of the following messages in the OSS logs:

00100000:10000000:7.0:1408057678.451385:0:25634:0:(lfsck_layout.c:6274:lfsck_orphan_it_init()) scratch-OST0003-osd: init the orphan iteration: rc = -3

I've uploaded two logs to uploads; lfsck_log_oss1.txt contains the kernel log for OST0000 and OST0001 and lfsck_log_oss2.txt has the kernel log for OST0002 and OST0003

Comment by nasf (Inactive) [ 05/Sep/14 ]

The original issues of LFSCK hung have been resolved by the patches:

http://review.whamcloud.com/#/c/11304/
http://review.whamcloud.com/#/c/11373/

So close this bug. If we hit other failures in further, please open new ticket.

Generated at Sat Feb 10 01:51:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.