[LU-6109] LFSCK gets "inconsistent" flag and won't stop Created: 12/Jan/15  Updated: 16/Mar/16  Resolved: 08/Feb/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: Lustre 2.7.0, Lustre 2.8.0

Type: Bug Priority: Critical
Reporter: James Nunez (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: HB, lfsck
Environment:

OpenSFS cluster with 2 MDSs with 1MDT each, 3 OSSs with two OSTs each and three clients running Lustre (master) build # 2812


Attachments: Text File lfsck_log_mds01.txt     Text File lfsck_log_mds01_13511.txt     Text File lfsck_log_mds02.txt     Text File lfsck_log_mds02_13511.txt     Text File lfsck_logs.tgz    
Severity: 3
Rank (Obsolete): 17016

 Description   

While running the performance test (3.3.2) from the LFSCK Phase 3 test plan, I’ve run into a problem. I installed Lustre on all nodes and created a new file system. I don’t know if this matters, but the file system is empty when I run this test. I then run test 3.3.2 from the test plan; set fail_loc on the MDSs, create a small number of objects and run LFSCK. For some reason, LFSCk has flag “inconsistent” on both MDSs.

On mds01:

# cat /proc/fs/lustre/mdd/scratch-MDT0000/lfsck_namespace 
name: lfsck_namespace
magic: 0xa0629d03
version: 2
status: scanning-phase1
flags: inconsistent
param: all_targets,create_ostobj,create_mdtobj
time_since_last_completed: 1155 seconds
time_since_latest_start: 27 seconds
time_since_last_checkpoint: N/A
latest_start_position: 77, N/A, N/A
last_checkpoint_position: N/A, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 461
checked_phase2: 0
updated_phase1: 208
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 27
dirent_repaired: 0
linkea_repaired: 208
nlinks_repaired: 0
multiple_linked_checked: 187
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 2
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
success_count: 1
run_time_phase1: 28 seconds
run_time_phase2: 0 seconds
average_speed_phase1: 16 items/sec
average_speed_phase2: N/A
real_time_speed_phase1: 16 items/sec
real_time_speed_phase2: N/A
current_position: 238551041, N/A, N/A

There is nothing in dmesg except confirmation that fail_loc was set on the MDTs:

# dmesg
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000380000400-0x00000003c0000400):0:mdt
Lustre: *** cfs_fail_loc=1603, val=0***
Lustre: *** cfs_fail_loc=1603, val=0***
Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x00000003c0000400-0x0000000400000400):1:mdt
Lustre: *** cfs_fail_loc=1603, val=0***
Lustre: Skipped 101 previous similar messages
Lustre: *** cfs_fail_loc=1603, val=0***
Lustre: Skipped 3 previous similar messages
Lustre: *** cfs_fail_loc=1603, val=0***
Lustre: Skipped 100 previous similar messages
Lustre: *** cfs_fail_loc=1603, val=0***
Lustre: *** cfs_fail_loc=1603, val=0***
Lustre: Skipped 101 previous similar messages
Lustre: *** cfs_fail_loc=1603, val=0***
Lustre: Skipped 3 previous similar messages

If I wait a few minutes, the following is in dmesg:

INFO: task lfsck_namespace:469 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck_namespa D 0000000000000006     0   469      2 0x00000080
 ffff880478e87ba0 0000000000000046 0000000000000000 ffff880a9849b7e0
 ffff880a9849b7e0 ffff880d7d69f000 ffff880478e87ba0 ffffffffa06d57e9
 ffff880478e66638 ffff880478e87fd8 000000000000fbc8 ffff880478e66638
Call Trace:
 [<ffffffffa06d57e9>] ? lu_object_find_try+0x99/0x2b0 [obdclass]
 [<ffffffffa06d5a3d>] lu_object_find_at+0x3d/0xe0 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06d5b1f>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f33619>] lfsck_namespace_assistant_handler_p1+0x2d9/0x1f70 [lfsck]
 [<ffffffffa057d472>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs]
 [<ffffffffa0f23347>] lfsck_assistant_engine+0x497/0x1e00 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f22eb0>] ? lfsck_assistant_engine+0x0/0x1e00 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck_namespace:469 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck_namespa D 0000000000000006     0   469      2 0x00000080
 ffff880478e87ba0 0000000000000046 0000000000000000 ffff880a9849b7e0
 ffff880a9849b7e0 ffff880d7d69f000 ffff880478e87ba0 ffffffffa06d57e9
 ffff880478e66638 ffff880478e87fd8 000000000000fbc8 ffff880478e66638
Call Trace:
 [<ffffffffa06d57e9>] ? lu_object_find_try+0x99/0x2b0 [obdclass]
 [<ffffffffa06d5a3d>] lu_object_find_at+0x3d/0xe0 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06d5b1f>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f33619>] lfsck_namespace_assistant_handler_p1+0x2d9/0x1f70 [lfsck]
 [<ffffffffa057d472>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs]
 [<ffffffffa0f23347>] lfsck_assistant_engine+0x497/0x1e00 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f22eb0>] ? lfsck_assistant_engine+0x0/0x1e00 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

On mds02, the following is the state of LFSCK on mds02:

# cat /proc/fs/lustre/mdd/scratch-MDT0001/lfsck_namespace 
name: lfsck_namespace
magic: 0xa0629d03
version: 2
status: scanning-phase2
flags: scanned-once,inconsistent
param: all_targets,create_ostobj,create_mdtobj
time_since_last_completed: 2533 seconds
time_since_latest_start: 1405 seconds
time_since_last_checkpoint: 1405 seconds
latest_start_position: 77, N/A, N/A
last_checkpoint_position: 260571137, [0x3c0000401:0x4:0x0], 0x5d2c710b01284458
first_failure_position: N/A, N/A, N/A
checked_phase1: 16
checked_phase2: 0
updated_phase1: 0
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 6
dirent_repaired: 0
linkea_repaired: 4
nlinks_repaired: 0
multiple_linked_checked: 0
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 2
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
success_count: 1
run_time_phase1: 0 seconds
run_time_phase2: 1405 seconds
average_speed_phase1: 16 items/sec
average_speed_phase2: 0 objs/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: 0 objs/sec
current_position: [0x0:0x0:0x0]

I cannot stop nor start LFSCK again. So, I’m not sure what state it is in.

Trying to start LFSCK again fails:

# lctl lfsck_start -A -M scratch-MDT0000 -c -C --type namespace 
Fail to start LFSCK: Operation already in progress

Stopping LFSCK works on mds02:

# cat /proc/fs/lustre/mdd/scratch-MDT0001/lfsck_namespace 
name: lfsck_namespace
magic: 0xa0629d03
version: 2
status: stopped
flags: scanned-once,inconsistent
param: all_targets,create_ostobj,create_mdtobj
time_since_last_completed: 2890 seconds
time_since_latest_start: 1762 seconds
time_since_last_checkpoint: 10 seconds
latest_start_position: 77, N/A, N/A
last_checkpoint_position: 260571137, [0x3c0000401:0x4:0x0], 0x5d2c710b01284458
first_failure_position: N/A, N/A, N/A
checked_phase1: 16
checked_phase2: 0
updated_phase1: 0
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 6
dirent_repaired: 0
linkea_repaired: 4
nlinks_repaired: 0
multiple_linked_checked: 0
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 2
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
success_count: 1
run_time_phase1: 0 seconds
run_time_phase2: 1753 seconds
average_speed_phase1: 16 items/sec
average_speed_phase2: 0 objs/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A

but does not stop LFSCK on mds01

# cat /proc/fs/lustre/mdd/scratch-MDT0000/lfsck_namespace 
name: lfsck_namespace
magic: 0xa0629d03
version: 2
status: scanning-phase1
flags: inconsistent
param: all_targets,create_ostobj,create_mdtobj
time_since_last_completed: 2953 seconds
time_since_latest_start: 1825 seconds
time_since_last_checkpoint: N/A
latest_start_position: 77, N/A, N/A
last_checkpoint_position: N/A, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 461
checked_phase2: 0
updated_phase1: 208
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 27
dirent_repaired: 0
linkea_repaired: 208
nlinks_repaired: 0
multiple_linked_checked: 187
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 2
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
success_count: 1
run_time_phase1: 1826 seconds
run_time_phase2: 0 seconds
average_speed_phase1: 0 items/sec
average_speed_phase2: N/A
real_time_speed_phase1: 0 items/sec
real_time_speed_phase2: N/A
current_position: 238551041, N/A, N/A


 Comments   
Comment by James Nunez (Inactive) [ 12/Jan/15 ]

Unfortunately, this issue is very reproducible.

Comment by Jodi Levi (Inactive) [ 12/Jan/15 ]

Fan Yong,
Can you please have a look at this one and comment?
Thank you!

Comment by James Nunez (Inactive) [ 12/Jan/15 ]

The Performance test 3.3.2 was run on the previous three lustre-master tags, but this problem was not discovered earlier due to LU-5914. LU-5914 also made 3.3.2 fail.

Comment by Gerrit Updater [ 22/Jan/15 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13493
Subject: LU-6109 lfsck: NOT purge object by OI scrub
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b6d3fd89a129e7aed971ed2e0ead614b8c7bb970

Comment by nasf (Inactive) [ 22/Jan/15 ]

James, would you please to verify this patch in your environment? Thanks!

Comment by James Nunez (Inactive) [ 23/Jan/15 ]

The proposed patch, http://review.whamcloud.com/#/c/13493/4 , does not fix the problem reported in this ticket. I ran test 3.3.2 against

# lctl get_param version
version=
lustre: 2.6.92
kernel: patchless_client
build:  jenkins-arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel-29887-g2fb04bd-PRISTINE-2.6.32-431.29.2.el6_lustre.gffd1fc2.x86_64

The primary MDS, MDS01, is stuck in:

status: scanning-phase1
flags: inconsistent

On MDS02, it’s showing as LFSCK status:

status: scanning-phase2
flags: scanned-once,inconsistent

In dmesg on MDS01,

INFO: task lfsck_namespace:542 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.29.2.el6_lustre.gffd1fc2.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck_namespa D 0000000000000001     0   542      2 0x00000080
 ffff880a556f3ba0 0000000000000046 0000000000000000 ffff880597ae2360
 ffff880597ae2360 ffff880832e27000 ffff880a556f3ba0 ffffffffa06d2bc9
 ffff880a56a285f8 ffff880a556f3fd8 000000000000fbc8 ffff880a56a285f8
Call Trace:
 [<ffffffffa06d2bc9>] ? lu_object_find_try+0x99/0x2b0 [obdclass]
 [<ffffffffa06d2e1d>] lu_object_find_at+0x3d/0xe0 [obdclass]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa06d1e69>] ? lu_object_put+0x1d9/0x350 [obdclass]
 [<ffffffffa06d2eff>] lu_object_find_slice+0x1f/0x80 [obdclass]
 [<ffffffffa0f2ad19>] lfsck_namespace_assistant_handler_p1+0x2d9/0x1f70 [lfsck]
 [<ffffffffa057b442>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs]
 [<ffffffffa0f1f9f7>] lfsck_assistant_engine+0x497/0x1e00 [lfsck]
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa0f1f560>] ? lfsck_assistant_engine+0x0/0x1e00 [lfsck]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

which differs from the stack trace in the description of this ticket in that it has the added call ‘lu_object_put’.

When I issue the following on MDS01:

# lctl lfsck_stop -A -M scratch-MDT0000

LFSCK on MDS02 does stop, but LFSCK does not stop on MDS01. I can’t kill the lfsck_namespace process on MDS01.

Comment by nasf (Inactive) [ 23/Jan/15 ]

James, would you please to upload the MDS debug log to the Jira? Is there any file deleted during your test? Thanks!

Comment by James Nunez (Inactive) [ 23/Jan/15 ]

There are no files deleted during this test.

I did not capture kernel logs. So, let me run again and I'll upload loads when I have them.

Comment by nasf (Inactive) [ 23/Jan/15 ]

Thanks James! If possible, please upload the log to Jira directly, that is more easy for me to access.

Comment by nasf (Inactive) [ 23/Jan/15 ]

To be confirmed with you: before LFSCK run, you injected failure stub fail_loc=1603 on both MDTs, so all the files/directories/remote_directories/striped_directories have crashed linkEA, then run namespace LFSCK to repair them. Right?

Comment by Gerrit Updater [ 23/Jan/15 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13511
Subject: LU-6109 lfsck: check FID validity before locating object
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e248ffbf7d27417b3b85e930b0266d4719cb75ce

Comment by nasf (Inactive) [ 23/Jan/15 ]

The 13511 is based on the patch 13493. It gives more fix for the possible hung during locating object.
James, would you please to try this one when you have time. Thanks!

Comment by James Nunez (Inactive) [ 23/Jan/15 ]

Here are the logs from the MDSs with only the first patch, 13493, applied. I stopped collecting logs after a few minutes of MDS01 stuck in scanning-phase1.

The script, sets fail_loc to 1603 on the MDSs, writes/creates objects, sets fail_loc to 0 on the MDSs and then runs LFSCK namespace.

I will test with the new patch next.

Comment by James Nunez (Inactive) [ 26/Jan/15 ]

I've attached logs from the MDSs with the build for patch 13511; lfsck_log_mds01_13511.txt and lfsck_log_mds02_13511.txt.

The first time I ran test 3.3.2, it completed with no problems. I ran it a second time and LFSCK namespace on MDS01 is stuck in 'scanning-phase1' as described in this ticket.

Comment by nasf (Inactive) [ 29/Jan/15 ]

James,

Thanks for the verification. I found more clew and updated the patch (http://review.whamcloud.com/#/c/13511/). Would you please to retry the latest patch with -1 debug_log enabled? BTW, please tell me what files/subdirs you created under the striped directory for test? Thanks!

Comment by James Nunez (Inactive) [ 29/Jan/15 ]

I ran with your latest patch and uploaded a tar gzipped file at lfsck_logs.tgz.

The directories/files that are create during this test are, in order:
create a top level directory called test_dir
create another directory test_dir/sdir-0
in sdir_0, create 87 files test_dir/sdir-0/z.N, where N=0.{0..28}, 1.{29..57}, 2.{58..86}
create one remote directory at test_dir/sdir-0/rdir-1 (left empty)
create 10 local directory test_dir/sdir-0/ldir-{1..10} (left empty)
create one striped directory at test_dir/sdir-0/rdir-1 (left empty)
create 1 hard links at test_dir/sdir-0/ln.1 to z.0.0

Then the directory test_dir/sdir-1 is created with the same files and directories with the same names as above.

Comment by nasf (Inactive) [ 31/Jan/15 ]

There is memory leak in former LFSCK run, as to the subsequent LFSCK was blocked by some stale data. The root reason for the memory leak is because the LFSCK used OSD device to handle OSP (remote MDT) inconsistency by wrong. Such issues have been fixed by LU-5791 patch: http://review.whamcloud.com/13392.

So I enhanced the 13511 patch (http://review.whamcloud.com/#/c/13511/) against the 13392 patch.
James, would you please to retry the latest patch? Thanks!

BTW, I have verified with your method, but cannot reproduce the failures any longer.

Comment by James Nunez (Inactive) [ 03/Feb/15 ]

I've run with the latest version of the 13511 patch and cannot reproduce this error any more. I've run the test several times in a row and increased the number of objects written and LFSCK completes each time.

Comment by nasf (Inactive) [ 03/Feb/15 ]

Thanks James. I think the original issue should have been fixed via above two patches (13392, 13511).

Comment by Gerrit Updater [ 08/Feb/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13511/
Subject: LU-6109 lfsck: check FID validity before locating object
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4d408c9aed9adaf1f4e2ea87851728a1cf662594

Comment by Peter Jones [ 08/Feb/15 ]

Landed for 2.7

Comment by Gerrit Updater [ 05/Mar/15 ]

Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/13987
Subject: LU-6109 osd-ldiskfs: fix previous patch issue
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: defae4053b7bbe1e013af1a9ec99090d6e7abe1c

Comment by Gerrit Updater [ 08/Jul/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13987/
Subject: LU-6109 osd-ldiskfs: handle no fid-in-dirent correctly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a27201d13828d4b9bae9261dcc552c1f767aff21

Generated at Sat Feb 10 01:57:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.