Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.7.0
-
OpenSFS cluster with 2 MDSs with 1MDT each, 3 OSSs with two OSTs each and three clients running Lustre (master) build # 2812
-
3
-
17016
Description
While running the performance test (3.3.2) from the LFSCK Phase 3 test plan, I’ve run into a problem. I installed Lustre on all nodes and created a new file system. I don’t know if this matters, but the file system is empty when I run this test. I then run test 3.3.2 from the test plan; set fail_loc on the MDSs, create a small number of objects and run LFSCK. For some reason, LFSCk has flag “inconsistent” on both MDSs.
On mds01:
# cat /proc/fs/lustre/mdd/scratch-MDT0000/lfsck_namespace name: lfsck_namespace magic: 0xa0629d03 version: 2 status: scanning-phase1 flags: inconsistent param: all_targets,create_ostobj,create_mdtobj time_since_last_completed: 1155 seconds time_since_latest_start: 27 seconds time_since_last_checkpoint: N/A latest_start_position: 77, N/A, N/A last_checkpoint_position: N/A, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 461 checked_phase2: 0 updated_phase1: 208 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 27 dirent_repaired: 0 linkea_repaired: 208 nlinks_repaired: 0 multiple_linked_checked: 187 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 2 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 success_count: 1 run_time_phase1: 28 seconds run_time_phase2: 0 seconds average_speed_phase1: 16 items/sec average_speed_phase2: N/A real_time_speed_phase1: 16 items/sec real_time_speed_phase2: N/A current_position: 238551041, N/A, N/A
There is nothing in dmesg except confirmation that fail_loc was set on the MDTs:
# dmesg Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000380000400-0x00000003c0000400):0:mdt Lustre: *** cfs_fail_loc=1603, val=0*** Lustre: *** cfs_fail_loc=1603, val=0*** Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x00000003c0000400-0x0000000400000400):1:mdt Lustre: *** cfs_fail_loc=1603, val=0*** Lustre: Skipped 101 previous similar messages Lustre: *** cfs_fail_loc=1603, val=0*** Lustre: Skipped 3 previous similar messages Lustre: *** cfs_fail_loc=1603, val=0*** Lustre: Skipped 100 previous similar messages Lustre: *** cfs_fail_loc=1603, val=0*** Lustre: *** cfs_fail_loc=1603, val=0*** Lustre: Skipped 101 previous similar messages Lustre: *** cfs_fail_loc=1603, val=0*** Lustre: Skipped 3 previous similar messages
If I wait a few minutes, the following is in dmesg:
INFO: task lfsck_namespace:469 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lfsck_namespa D 0000000000000006 0 469 2 0x00000080 ffff880478e87ba0 0000000000000046 0000000000000000 ffff880a9849b7e0 ffff880a9849b7e0 ffff880d7d69f000 ffff880478e87ba0 ffffffffa06d57e9 ffff880478e66638 ffff880478e87fd8 000000000000fbc8 ffff880478e66638 Call Trace: [<ffffffffa06d57e9>] ? lu_object_find_try+0x99/0x2b0 [obdclass] [<ffffffffa06d5a3d>] lu_object_find_at+0x3d/0xe0 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa06d5b1f>] lu_object_find_slice+0x1f/0x80 [obdclass] [<ffffffffa0f33619>] lfsck_namespace_assistant_handler_p1+0x2d9/0x1f70 [lfsck] [<ffffffffa057d472>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs] [<ffffffffa0f23347>] lfsck_assistant_engine+0x497/0x1e00 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f22eb0>] ? lfsck_assistant_engine+0x0/0x1e00 [lfsck] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task lfsck_namespace:469 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lfsck_namespa D 0000000000000006 0 469 2 0x00000080 ffff880478e87ba0 0000000000000046 0000000000000000 ffff880a9849b7e0 ffff880a9849b7e0 ffff880d7d69f000 ffff880478e87ba0 ffffffffa06d57e9 ffff880478e66638 ffff880478e87fd8 000000000000fbc8 ffff880478e66638 Call Trace: [<ffffffffa06d57e9>] ? lu_object_find_try+0x99/0x2b0 [obdclass] [<ffffffffa06d5a3d>] lu_object_find_at+0x3d/0xe0 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa06d5b1f>] lu_object_find_slice+0x1f/0x80 [obdclass] [<ffffffffa0f33619>] lfsck_namespace_assistant_handler_p1+0x2d9/0x1f70 [lfsck] [<ffffffffa057d472>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs] [<ffffffffa0f23347>] lfsck_assistant_engine+0x497/0x1e00 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa0f22eb0>] ? lfsck_assistant_engine+0x0/0x1e00 [lfsck] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20
On mds02, the following is the state of LFSCK on mds02:
# cat /proc/fs/lustre/mdd/scratch-MDT0001/lfsck_namespace name: lfsck_namespace magic: 0xa0629d03 version: 2 status: scanning-phase2 flags: scanned-once,inconsistent param: all_targets,create_ostobj,create_mdtobj time_since_last_completed: 2533 seconds time_since_latest_start: 1405 seconds time_since_last_checkpoint: 1405 seconds latest_start_position: 77, N/A, N/A last_checkpoint_position: 260571137, [0x3c0000401:0x4:0x0], 0x5d2c710b01284458 first_failure_position: N/A, N/A, N/A checked_phase1: 16 checked_phase2: 0 updated_phase1: 0 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 6 dirent_repaired: 0 linkea_repaired: 4 nlinks_repaired: 0 multiple_linked_checked: 0 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 2 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 success_count: 1 run_time_phase1: 0 seconds run_time_phase2: 1405 seconds average_speed_phase1: 16 items/sec average_speed_phase2: 0 objs/sec real_time_speed_phase1: N/A real_time_speed_phase2: 0 objs/sec current_position: [0x0:0x0:0x0]
I cannot stop nor start LFSCK again. So, I’m not sure what state it is in.
Trying to start LFSCK again fails:
# lctl lfsck_start -A -M scratch-MDT0000 -c -C --type namespace Fail to start LFSCK: Operation already in progress
Stopping LFSCK works on mds02:
# cat /proc/fs/lustre/mdd/scratch-MDT0001/lfsck_namespace name: lfsck_namespace magic: 0xa0629d03 version: 2 status: stopped flags: scanned-once,inconsistent param: all_targets,create_ostobj,create_mdtobj time_since_last_completed: 2890 seconds time_since_latest_start: 1762 seconds time_since_last_checkpoint: 10 seconds latest_start_position: 77, N/A, N/A last_checkpoint_position: 260571137, [0x3c0000401:0x4:0x0], 0x5d2c710b01284458 first_failure_position: N/A, N/A, N/A checked_phase1: 16 checked_phase2: 0 updated_phase1: 0 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 6 dirent_repaired: 0 linkea_repaired: 4 nlinks_repaired: 0 multiple_linked_checked: 0 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 2 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 success_count: 1 run_time_phase1: 0 seconds run_time_phase2: 1753 seconds average_speed_phase1: 16 items/sec average_speed_phase2: 0 objs/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A
but does not stop LFSCK on mds01
# cat /proc/fs/lustre/mdd/scratch-MDT0000/lfsck_namespace name: lfsck_namespace magic: 0xa0629d03 version: 2 status: scanning-phase1 flags: inconsistent param: all_targets,create_ostobj,create_mdtobj time_since_last_completed: 2953 seconds time_since_latest_start: 1825 seconds time_since_last_checkpoint: N/A latest_start_position: 77, N/A, N/A last_checkpoint_position: N/A, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 461 checked_phase2: 0 updated_phase1: 208 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 27 dirent_repaired: 0 linkea_repaired: 208 nlinks_repaired: 0 multiple_linked_checked: 187 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 2 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 success_count: 1 run_time_phase1: 1826 seconds run_time_phase2: 0 seconds average_speed_phase1: 0 items/sec average_speed_phase2: N/A real_time_speed_phase1: 0 items/sec real_time_speed_phase2: N/A current_position: 238551041, N/A, N/A
There is memory leak in former LFSCK run, as to the subsequent LFSCK was blocked by some stale data. The root reason for the memory leak is because the LFSCK used OSD device to handle OSP (remote MDT) inconsistency by wrong. Such issues have been fixed by
LU-5791patch: http://review.whamcloud.com/13392.So I enhanced the 13511 patch (http://review.whamcloud.com/#/c/13511/) against the 13392 patch.
James, would you please to retry the latest patch? Thanks!
BTW, I have verified with your method, but cannot reproduce the failures any longer.