[LU-6109] LFSCK gets "inconsistent" flag and won't stop Created: 12/Jan/15 Updated: 16/Mar/16 Resolved: 08/Feb/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.7.0, Lustre 2.8.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | James Nunez (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HB, lfsck | ||
| Environment: |
OpenSFS cluster with 2 MDSs with 1MDT each, 3 OSSs with two OSTs each and three clients running Lustre (master) build # 2812 |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 17016 |
| Description |
|
While running the performance test (3.3.2) from the LFSCK Phase 3 test plan, I’ve run into a problem. I installed Lustre on all nodes and created a new file system. I don’t know if this matters, but the file system is empty when I run this test. I then run test 3.3.2 from the test plan; set fail_loc on the MDSs, create a small number of objects and run LFSCK. For some reason, LFSCk has flag “inconsistent” on both MDSs. On mds01: # cat /proc/fs/lustre/mdd/scratch-MDT0000/lfsck_namespace name: lfsck_namespace magic: 0xa0629d03 version: 2 status: scanning-phase1 flags: inconsistent param: all_targets,create_ostobj,create_mdtobj time_since_last_completed: 1155 seconds time_since_latest_start: 27 seconds time_since_last_checkpoint: N/A latest_start_position: 77, N/A, N/A last_checkpoint_position: N/A, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 461 checked_phase2: 0 updated_phase1: 208 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 27 dirent_repaired: 0 linkea_repaired: 208 nlinks_repaired: 0 multiple_linked_checked: 187 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 2 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 success_count: 1 run_time_phase1: 28 seconds run_time_phase2: 0 seconds average_speed_phase1: 16 items/sec average_speed_phase2: N/A real_time_speed_phase1: 16 items/sec real_time_speed_phase2: N/A current_position: 238551041, N/A, N/A There is nothing in dmesg except confirmation that fail_loc was set on the MDTs: # dmesg Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000380000400-0x00000003c0000400):0:mdt Lustre: *** cfs_fail_loc=1603, val=0*** Lustre: *** cfs_fail_loc=1603, val=0*** Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x00000003c0000400-0x0000000400000400):1:mdt Lustre: *** cfs_fail_loc=1603, val=0*** Lustre: Skipped 101 previous similar messages Lustre: *** cfs_fail_loc=1603, val=0*** Lustre: Skipped 3 previous similar messages Lustre: *** cfs_fail_loc=1603, val=0*** Lustre: Skipped 100 previous similar messages Lustre: *** cfs_fail_loc=1603, val=0*** Lustre: *** cfs_fail_loc=1603, val=0*** Lustre: Skipped 101 previous similar messages Lustre: *** cfs_fail_loc=1603, val=0*** Lustre: Skipped 3 previous similar messages If I wait a few minutes, the following is in dmesg: INFO: task lfsck_namespace:469 blocked for more than 120 seconds.
Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck_namespa D 0000000000000006 0 469 2 0x00000080
ffff880478e87ba0 0000000000000046 0000000000000000 ffff880a9849b7e0
ffff880a9849b7e0 ffff880d7d69f000 ffff880478e87ba0 ffffffffa06d57e9
ffff880478e66638 ffff880478e87fd8 000000000000fbc8 ffff880478e66638
Call Trace:
[<ffffffffa06d57e9>] ? lu_object_find_try+0x99/0x2b0 [obdclass]
[<ffffffffa06d5a3d>] lu_object_find_at+0x3d/0xe0 [obdclass]
[<ffffffff81061d00>] ? default_wake_function+0x0/0x20
[<ffffffffa06d5b1f>] lu_object_find_slice+0x1f/0x80 [obdclass]
[<ffffffffa0f33619>] lfsck_namespace_assistant_handler_p1+0x2d9/0x1f70 [lfsck]
[<ffffffffa057d472>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs]
[<ffffffffa0f23347>] lfsck_assistant_engine+0x497/0x1e00 [lfsck]
[<ffffffff81061d00>] ? default_wake_function+0x0/0x20
[<ffffffffa0f22eb0>] ? lfsck_assistant_engine+0x0/0x1e00 [lfsck]
[<ffffffff8109abf6>] kthread+0x96/0xa0
[<ffffffff8100c20a>] child_rip+0xa/0x20
[<ffffffff8109ab60>] ? kthread+0x0/0xa0
[<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task lfsck_namespace:469 blocked for more than 120 seconds.
Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck_namespa D 0000000000000006 0 469 2 0x00000080
ffff880478e87ba0 0000000000000046 0000000000000000 ffff880a9849b7e0
ffff880a9849b7e0 ffff880d7d69f000 ffff880478e87ba0 ffffffffa06d57e9
ffff880478e66638 ffff880478e87fd8 000000000000fbc8 ffff880478e66638
Call Trace:
[<ffffffffa06d57e9>] ? lu_object_find_try+0x99/0x2b0 [obdclass]
[<ffffffffa06d5a3d>] lu_object_find_at+0x3d/0xe0 [obdclass]
[<ffffffff81061d00>] ? default_wake_function+0x0/0x20
[<ffffffffa06d5b1f>] lu_object_find_slice+0x1f/0x80 [obdclass]
[<ffffffffa0f33619>] lfsck_namespace_assistant_handler_p1+0x2d9/0x1f70 [lfsck]
[<ffffffffa057d472>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs]
[<ffffffffa0f23347>] lfsck_assistant_engine+0x497/0x1e00 [lfsck]
[<ffffffff81061d00>] ? default_wake_function+0x0/0x20
[<ffffffffa0f22eb0>] ? lfsck_assistant_engine+0x0/0x1e00 [lfsck]
[<ffffffff8109abf6>] kthread+0x96/0xa0
[<ffffffff8100c20a>] child_rip+0xa/0x20
[<ffffffff8109ab60>] ? kthread+0x0/0xa0
[<ffffffff8100c200>] ? child_rip+0x0/0x20
On mds02, the following is the state of LFSCK on mds02: # cat /proc/fs/lustre/mdd/scratch-MDT0001/lfsck_namespace name: lfsck_namespace magic: 0xa0629d03 version: 2 status: scanning-phase2 flags: scanned-once,inconsistent param: all_targets,create_ostobj,create_mdtobj time_since_last_completed: 2533 seconds time_since_latest_start: 1405 seconds time_since_last_checkpoint: 1405 seconds latest_start_position: 77, N/A, N/A last_checkpoint_position: 260571137, [0x3c0000401:0x4:0x0], 0x5d2c710b01284458 first_failure_position: N/A, N/A, N/A checked_phase1: 16 checked_phase2: 0 updated_phase1: 0 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 6 dirent_repaired: 0 linkea_repaired: 4 nlinks_repaired: 0 multiple_linked_checked: 0 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 2 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 success_count: 1 run_time_phase1: 0 seconds run_time_phase2: 1405 seconds average_speed_phase1: 16 items/sec average_speed_phase2: 0 objs/sec real_time_speed_phase1: N/A real_time_speed_phase2: 0 objs/sec current_position: [0x0:0x0:0x0] I cannot stop nor start LFSCK again. So, I’m not sure what state it is in. Trying to start LFSCK again fails: # lctl lfsck_start -A -M scratch-MDT0000 -c -C --type namespace Fail to start LFSCK: Operation already in progress Stopping LFSCK works on mds02: # cat /proc/fs/lustre/mdd/scratch-MDT0001/lfsck_namespace name: lfsck_namespace magic: 0xa0629d03 version: 2 status: stopped flags: scanned-once,inconsistent param: all_targets,create_ostobj,create_mdtobj time_since_last_completed: 2890 seconds time_since_latest_start: 1762 seconds time_since_last_checkpoint: 10 seconds latest_start_position: 77, N/A, N/A last_checkpoint_position: 260571137, [0x3c0000401:0x4:0x0], 0x5d2c710b01284458 first_failure_position: N/A, N/A, N/A checked_phase1: 16 checked_phase2: 0 updated_phase1: 0 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 6 dirent_repaired: 0 linkea_repaired: 4 nlinks_repaired: 0 multiple_linked_checked: 0 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 2 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 success_count: 1 run_time_phase1: 0 seconds run_time_phase2: 1753 seconds average_speed_phase1: 16 items/sec average_speed_phase2: 0 objs/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A but does not stop LFSCK on mds01 # cat /proc/fs/lustre/mdd/scratch-MDT0000/lfsck_namespace name: lfsck_namespace magic: 0xa0629d03 version: 2 status: scanning-phase1 flags: inconsistent param: all_targets,create_ostobj,create_mdtobj time_since_last_completed: 2953 seconds time_since_latest_start: 1825 seconds time_since_last_checkpoint: N/A latest_start_position: 77, N/A, N/A last_checkpoint_position: N/A, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 461 checked_phase2: 0 updated_phase1: 208 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 27 dirent_repaired: 0 linkea_repaired: 208 nlinks_repaired: 0 multiple_linked_checked: 187 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 2 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 success_count: 1 run_time_phase1: 1826 seconds run_time_phase2: 0 seconds average_speed_phase1: 0 items/sec average_speed_phase2: N/A real_time_speed_phase1: 0 items/sec real_time_speed_phase2: N/A current_position: 238551041, N/A, N/A |
| Comments |
| Comment by James Nunez (Inactive) [ 12/Jan/15 ] |
|
Unfortunately, this issue is very reproducible. |
| Comment by Jodi Levi (Inactive) [ 12/Jan/15 ] |
|
Fan Yong, |
| Comment by James Nunez (Inactive) [ 12/Jan/15 ] |
|
The Performance test 3.3.2 was run on the previous three lustre-master tags, but this problem was not discovered earlier due to |
| Comment by Gerrit Updater [ 22/Jan/15 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13493 |
| Comment by nasf (Inactive) [ 22/Jan/15 ] |
|
James, would you please to verify this patch in your environment? Thanks! |
| Comment by James Nunez (Inactive) [ 23/Jan/15 ] |
|
The proposed patch, http://review.whamcloud.com/#/c/13493/4 , does not fix the problem reported in this ticket. I ran test 3.3.2 against # lctl get_param version version= lustre: 2.6.92 kernel: patchless_client build: jenkins-arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel-29887-g2fb04bd-PRISTINE-2.6.32-431.29.2.el6_lustre.gffd1fc2.x86_64 The primary MDS, MDS01, is stuck in: status: scanning-phase1 flags: inconsistent On MDS02, it’s showing as LFSCK status: status: scanning-phase2 flags: scanned-once,inconsistent In dmesg on MDS01, INFO: task lfsck_namespace:542 blocked for more than 120 seconds.
Not tainted 2.6.32-431.29.2.el6_lustre.gffd1fc2.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lfsck_namespa D 0000000000000001 0 542 2 0x00000080
ffff880a556f3ba0 0000000000000046 0000000000000000 ffff880597ae2360
ffff880597ae2360 ffff880832e27000 ffff880a556f3ba0 ffffffffa06d2bc9
ffff880a56a285f8 ffff880a556f3fd8 000000000000fbc8 ffff880a56a285f8
Call Trace:
[<ffffffffa06d2bc9>] ? lu_object_find_try+0x99/0x2b0 [obdclass]
[<ffffffffa06d2e1d>] lu_object_find_at+0x3d/0xe0 [obdclass]
[<ffffffff81061d00>] ? default_wake_function+0x0/0x20
[<ffffffffa06d1e69>] ? lu_object_put+0x1d9/0x350 [obdclass]
[<ffffffffa06d2eff>] lu_object_find_slice+0x1f/0x80 [obdclass]
[<ffffffffa0f2ad19>] lfsck_namespace_assistant_handler_p1+0x2d9/0x1f70 [lfsck]
[<ffffffffa057b442>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs]
[<ffffffffa0f1f9f7>] lfsck_assistant_engine+0x497/0x1e00 [lfsck]
[<ffffffff81061d00>] ? default_wake_function+0x0/0x20
[<ffffffffa0f1f560>] ? lfsck_assistant_engine+0x0/0x1e00 [lfsck]
[<ffffffff8109abf6>] kthread+0x96/0xa0
[<ffffffff8100c20a>] child_rip+0xa/0x20
[<ffffffff8109ab60>] ? kthread+0x0/0xa0
[<ffffffff8100c200>] ? child_rip+0x0/0x20
which differs from the stack trace in the description of this ticket in that it has the added call ‘lu_object_put’. When I issue the following on MDS01: # lctl lfsck_stop -A -M scratch-MDT0000 LFSCK on MDS02 does stop, but LFSCK does not stop on MDS01. I can’t kill the lfsck_namespace process on MDS01. |
| Comment by nasf (Inactive) [ 23/Jan/15 ] |
|
James, would you please to upload the MDS debug log to the Jira? Is there any file deleted during your test? Thanks! |
| Comment by James Nunez (Inactive) [ 23/Jan/15 ] |
|
There are no files deleted during this test. I did not capture kernel logs. So, let me run again and I'll upload loads when I have them. |
| Comment by nasf (Inactive) [ 23/Jan/15 ] |
|
Thanks James! If possible, please upload the log to Jira directly, that is more easy for me to access. |
| Comment by nasf (Inactive) [ 23/Jan/15 ] |
|
To be confirmed with you: before LFSCK run, you injected failure stub fail_loc=1603 on both MDTs, so all the files/directories/remote_directories/striped_directories have crashed linkEA, then run namespace LFSCK to repair them. Right? |
| Comment by Gerrit Updater [ 23/Jan/15 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13511 |
| Comment by nasf (Inactive) [ 23/Jan/15 ] |
|
The 13511 is based on the patch 13493. It gives more fix for the possible hung during locating object. |
| Comment by James Nunez (Inactive) [ 23/Jan/15 ] |
|
Here are the logs from the MDSs with only the first patch, 13493, applied. I stopped collecting logs after a few minutes of MDS01 stuck in scanning-phase1. The script, sets fail_loc to 1603 on the MDSs, writes/creates objects, sets fail_loc to 0 on the MDSs and then runs LFSCK namespace. I will test with the new patch next. |
| Comment by James Nunez (Inactive) [ 26/Jan/15 ] |
|
I've attached logs from the MDSs with the build for patch 13511; lfsck_log_mds01_13511.txt and lfsck_log_mds02_13511.txt. The first time I ran test 3.3.2, it completed with no problems. I ran it a second time and LFSCK namespace on MDS01 is stuck in 'scanning-phase1' as described in this ticket. |
| Comment by nasf (Inactive) [ 29/Jan/15 ] |
|
James, Thanks for the verification. I found more clew and updated the patch (http://review.whamcloud.com/#/c/13511/). Would you please to retry the latest patch with -1 debug_log enabled? BTW, please tell me what files/subdirs you created under the striped directory for test? Thanks! |
| Comment by James Nunez (Inactive) [ 29/Jan/15 ] |
|
I ran with your latest patch and uploaded a tar gzipped file at lfsck_logs.tgz. The directories/files that are create during this test are, in order: Then the directory test_dir/sdir-1 is created with the same files and directories with the same names as above. |
| Comment by nasf (Inactive) [ 31/Jan/15 ] |
|
There is memory leak in former LFSCK run, as to the subsequent LFSCK was blocked by some stale data. The root reason for the memory leak is because the LFSCK used OSD device to handle OSP (remote MDT) inconsistency by wrong. Such issues have been fixed by So I enhanced the 13511 patch (http://review.whamcloud.com/#/c/13511/) against the 13392 patch. BTW, I have verified with your method, but cannot reproduce the failures any longer. |
| Comment by James Nunez (Inactive) [ 03/Feb/15 ] |
|
I've run with the latest version of the 13511 patch and cannot reproduce this error any more. I've run the test several times in a row and increased the number of objects written and LFSCK completes each time. |
| Comment by nasf (Inactive) [ 03/Feb/15 ] |
|
Thanks James. I think the original issue should have been fixed via above two patches (13392, 13511). |
| Comment by Gerrit Updater [ 08/Feb/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13511/ |
| Comment by Peter Jones [ 08/Feb/15 ] |
|
Landed for 2.7 |
| Comment by Gerrit Updater [ 05/Mar/15 ] |
|
Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/13987 |
| Comment by Gerrit Updater [ 08/Jul/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13987/ |