Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6109

LFSCK gets "inconsistent" flag and won't stop

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.7.0, Lustre 2.8.0
    • Lustre 2.7.0
    • OpenSFS cluster with 2 MDSs with 1MDT each, 3 OSSs with two OSTs each and three clients running Lustre (master) build # 2812
    • 3
    • 17016

    Description

      While running the performance test (3.3.2) from the LFSCK Phase 3 test plan, I’ve run into a problem. I installed Lustre on all nodes and created a new file system. I don’t know if this matters, but the file system is empty when I run this test. I then run test 3.3.2 from the test plan; set fail_loc on the MDSs, create a small number of objects and run LFSCK. For some reason, LFSCk has flag “inconsistent” on both MDSs.

      On mds01:

      # cat /proc/fs/lustre/mdd/scratch-MDT0000/lfsck_namespace 
      name: lfsck_namespace
      magic: 0xa0629d03
      version: 2
      status: scanning-phase1
      flags: inconsistent
      param: all_targets,create_ostobj,create_mdtobj
      time_since_last_completed: 1155 seconds
      time_since_latest_start: 27 seconds
      time_since_last_checkpoint: N/A
      latest_start_position: 77, N/A, N/A
      last_checkpoint_position: N/A, N/A, N/A
      first_failure_position: N/A, N/A, N/A
      checked_phase1: 461
      checked_phase2: 0
      updated_phase1: 208
      updated_phase2: 0
      failed_phase1: 0
      failed_phase2: 0
      directories: 27
      dirent_repaired: 0
      linkea_repaired: 208
      nlinks_repaired: 0
      multiple_linked_checked: 187
      multiple_linked_repaired: 0
      unknown_inconsistency: 0
      unmatched_pairs_repaired: 0
      dangling_repaired: 0
      multiple_referenced_repaired: 0
      bad_file_type_repaired: 0
      lost_dirent_repaired: 0
      local_lost_found_scanned: 0
      local_lost_found_moved: 0
      local_lost_found_skipped: 0
      local_lost_found_failed: 0
      striped_dirs_scanned: 0
      striped_dirs_repaired: 0
      striped_dirs_failed: 0
      striped_dirs_disabled: 0
      striped_dirs_skipped: 0
      striped_shards_scanned: 2
      striped_shards_repaired: 0
      striped_shards_failed: 0
      striped_shards_skipped: 0
      name_hash_repaired: 0
      success_count: 1
      run_time_phase1: 28 seconds
      run_time_phase2: 0 seconds
      average_speed_phase1: 16 items/sec
      average_speed_phase2: N/A
      real_time_speed_phase1: 16 items/sec
      real_time_speed_phase2: N/A
      current_position: 238551041, N/A, N/A
      

      There is nothing in dmesg except confirmation that fail_loc was set on the MDTs:

      # dmesg
      Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000380000400-0x00000003c0000400):0:mdt
      Lustre: *** cfs_fail_loc=1603, val=0***
      Lustre: *** cfs_fail_loc=1603, val=0***
      Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x00000003c0000400-0x0000000400000400):1:mdt
      Lustre: *** cfs_fail_loc=1603, val=0***
      Lustre: Skipped 101 previous similar messages
      Lustre: *** cfs_fail_loc=1603, val=0***
      Lustre: Skipped 3 previous similar messages
      Lustre: *** cfs_fail_loc=1603, val=0***
      Lustre: Skipped 100 previous similar messages
      Lustre: *** cfs_fail_loc=1603, val=0***
      Lustre: *** cfs_fail_loc=1603, val=0***
      Lustre: Skipped 101 previous similar messages
      Lustre: *** cfs_fail_loc=1603, val=0***
      Lustre: Skipped 3 previous similar messages
      

      If I wait a few minutes, the following is in dmesg:

      INFO: task lfsck_namespace:469 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      lfsck_namespa D 0000000000000006     0   469      2 0x00000080
       ffff880478e87ba0 0000000000000046 0000000000000000 ffff880a9849b7e0
       ffff880a9849b7e0 ffff880d7d69f000 ffff880478e87ba0 ffffffffa06d57e9
       ffff880478e66638 ffff880478e87fd8 000000000000fbc8 ffff880478e66638
      Call Trace:
       [<ffffffffa06d57e9>] ? lu_object_find_try+0x99/0x2b0 [obdclass]
       [<ffffffffa06d5a3d>] lu_object_find_at+0x3d/0xe0 [obdclass]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa06d5b1f>] lu_object_find_slice+0x1f/0x80 [obdclass]
       [<ffffffffa0f33619>] lfsck_namespace_assistant_handler_p1+0x2d9/0x1f70 [lfsck]
       [<ffffffffa057d472>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs]
       [<ffffffffa0f23347>] lfsck_assistant_engine+0x497/0x1e00 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f22eb0>] ? lfsck_assistant_engine+0x0/0x1e00 [lfsck]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      INFO: task lfsck_namespace:469 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      lfsck_namespa D 0000000000000006     0   469      2 0x00000080
       ffff880478e87ba0 0000000000000046 0000000000000000 ffff880a9849b7e0
       ffff880a9849b7e0 ffff880d7d69f000 ffff880478e87ba0 ffffffffa06d57e9
       ffff880478e66638 ffff880478e87fd8 000000000000fbc8 ffff880478e66638
      Call Trace:
       [<ffffffffa06d57e9>] ? lu_object_find_try+0x99/0x2b0 [obdclass]
       [<ffffffffa06d5a3d>] lu_object_find_at+0x3d/0xe0 [obdclass]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa06d5b1f>] lu_object_find_slice+0x1f/0x80 [obdclass]
       [<ffffffffa0f33619>] lfsck_namespace_assistant_handler_p1+0x2d9/0x1f70 [lfsck]
       [<ffffffffa057d472>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs]
       [<ffffffffa0f23347>] lfsck_assistant_engine+0x497/0x1e00 [lfsck]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0f22eb0>] ? lfsck_assistant_engine+0x0/0x1e00 [lfsck]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      On mds02, the following is the state of LFSCK on mds02:

      # cat /proc/fs/lustre/mdd/scratch-MDT0001/lfsck_namespace 
      name: lfsck_namespace
      magic: 0xa0629d03
      version: 2
      status: scanning-phase2
      flags: scanned-once,inconsistent
      param: all_targets,create_ostobj,create_mdtobj
      time_since_last_completed: 2533 seconds
      time_since_latest_start: 1405 seconds
      time_since_last_checkpoint: 1405 seconds
      latest_start_position: 77, N/A, N/A
      last_checkpoint_position: 260571137, [0x3c0000401:0x4:0x0], 0x5d2c710b01284458
      first_failure_position: N/A, N/A, N/A
      checked_phase1: 16
      checked_phase2: 0
      updated_phase1: 0
      updated_phase2: 0
      failed_phase1: 0
      failed_phase2: 0
      directories: 6
      dirent_repaired: 0
      linkea_repaired: 4
      nlinks_repaired: 0
      multiple_linked_checked: 0
      multiple_linked_repaired: 0
      unknown_inconsistency: 0
      unmatched_pairs_repaired: 0
      dangling_repaired: 0
      multiple_referenced_repaired: 0
      bad_file_type_repaired: 0
      lost_dirent_repaired: 0
      local_lost_found_scanned: 0
      local_lost_found_moved: 0
      local_lost_found_skipped: 0
      local_lost_found_failed: 0
      striped_dirs_scanned: 0
      striped_dirs_repaired: 0
      striped_dirs_failed: 0
      striped_dirs_disabled: 0
      striped_dirs_skipped: 0
      striped_shards_scanned: 2
      striped_shards_repaired: 0
      striped_shards_failed: 0
      striped_shards_skipped: 0
      name_hash_repaired: 0
      success_count: 1
      run_time_phase1: 0 seconds
      run_time_phase2: 1405 seconds
      average_speed_phase1: 16 items/sec
      average_speed_phase2: 0 objs/sec
      real_time_speed_phase1: N/A
      real_time_speed_phase2: 0 objs/sec
      current_position: [0x0:0x0:0x0]
      

      I cannot stop nor start LFSCK again. So, I’m not sure what state it is in.

      Trying to start LFSCK again fails:

      # lctl lfsck_start -A -M scratch-MDT0000 -c -C --type namespace 
      Fail to start LFSCK: Operation already in progress
      

      Stopping LFSCK works on mds02:

      # cat /proc/fs/lustre/mdd/scratch-MDT0001/lfsck_namespace 
      name: lfsck_namespace
      magic: 0xa0629d03
      version: 2
      status: stopped
      flags: scanned-once,inconsistent
      param: all_targets,create_ostobj,create_mdtobj
      time_since_last_completed: 2890 seconds
      time_since_latest_start: 1762 seconds
      time_since_last_checkpoint: 10 seconds
      latest_start_position: 77, N/A, N/A
      last_checkpoint_position: 260571137, [0x3c0000401:0x4:0x0], 0x5d2c710b01284458
      first_failure_position: N/A, N/A, N/A
      checked_phase1: 16
      checked_phase2: 0
      updated_phase1: 0
      updated_phase2: 0
      failed_phase1: 0
      failed_phase2: 0
      directories: 6
      dirent_repaired: 0
      linkea_repaired: 4
      nlinks_repaired: 0
      multiple_linked_checked: 0
      multiple_linked_repaired: 0
      unknown_inconsistency: 0
      unmatched_pairs_repaired: 0
      dangling_repaired: 0
      multiple_referenced_repaired: 0
      bad_file_type_repaired: 0
      lost_dirent_repaired: 0
      local_lost_found_scanned: 0
      local_lost_found_moved: 0
      local_lost_found_skipped: 0
      local_lost_found_failed: 0
      striped_dirs_scanned: 0
      striped_dirs_repaired: 0
      striped_dirs_failed: 0
      striped_dirs_disabled: 0
      striped_dirs_skipped: 0
      striped_shards_scanned: 2
      striped_shards_repaired: 0
      striped_shards_failed: 0
      striped_shards_skipped: 0
      name_hash_repaired: 0
      success_count: 1
      run_time_phase1: 0 seconds
      run_time_phase2: 1753 seconds
      average_speed_phase1: 16 items/sec
      average_speed_phase2: 0 objs/sec
      real_time_speed_phase1: N/A
      real_time_speed_phase2: N/A
      current_position: N/A
      

      but does not stop LFSCK on mds01

      # cat /proc/fs/lustre/mdd/scratch-MDT0000/lfsck_namespace 
      name: lfsck_namespace
      magic: 0xa0629d03
      version: 2
      status: scanning-phase1
      flags: inconsistent
      param: all_targets,create_ostobj,create_mdtobj
      time_since_last_completed: 2953 seconds
      time_since_latest_start: 1825 seconds
      time_since_last_checkpoint: N/A
      latest_start_position: 77, N/A, N/A
      last_checkpoint_position: N/A, N/A, N/A
      first_failure_position: N/A, N/A, N/A
      checked_phase1: 461
      checked_phase2: 0
      updated_phase1: 208
      updated_phase2: 0
      failed_phase1: 0
      failed_phase2: 0
      directories: 27
      dirent_repaired: 0
      linkea_repaired: 208
      nlinks_repaired: 0
      multiple_linked_checked: 187
      multiple_linked_repaired: 0
      unknown_inconsistency: 0
      unmatched_pairs_repaired: 0
      dangling_repaired: 0
      multiple_referenced_repaired: 0
      bad_file_type_repaired: 0
      lost_dirent_repaired: 0
      local_lost_found_scanned: 0
      local_lost_found_moved: 0
      local_lost_found_skipped: 0
      local_lost_found_failed: 0
      striped_dirs_scanned: 0
      striped_dirs_repaired: 0
      striped_dirs_failed: 0
      striped_dirs_disabled: 0
      striped_dirs_skipped: 0
      striped_shards_scanned: 2
      striped_shards_repaired: 0
      striped_shards_failed: 0
      striped_shards_skipped: 0
      name_hash_repaired: 0
      success_count: 1
      run_time_phase1: 1826 seconds
      run_time_phase2: 0 seconds
      average_speed_phase1: 0 items/sec
      average_speed_phase2: N/A
      real_time_speed_phase1: 0 items/sec
      real_time_speed_phase2: N/A
      current_position: 238551041, N/A, N/A
      

      Attachments

        1. lfsck_log_mds01_13511.txt
          402 kB
        2. lfsck_log_mds01.txt
          403 kB
        3. lfsck_log_mds02_13511.txt
          11 kB
        4. lfsck_log_mds02.txt
          10 kB
        5. lfsck_logs.tgz
          0.2 kB

        Activity

          [LU-6109] LFSCK gets "inconsistent" flag and won't stop

          There is memory leak in former LFSCK run, as to the subsequent LFSCK was blocked by some stale data. The root reason for the memory leak is because the LFSCK used OSD device to handle OSP (remote MDT) inconsistency by wrong. Such issues have been fixed by LU-5791 patch: http://review.whamcloud.com/13392.

          So I enhanced the 13511 patch (http://review.whamcloud.com/#/c/13511/) against the 13392 patch.
          James, would you please to retry the latest patch? Thanks!

          BTW, I have verified with your method, but cannot reproduce the failures any longer.

          yong.fan nasf (Inactive) added a comment - There is memory leak in former LFSCK run, as to the subsequent LFSCK was blocked by some stale data. The root reason for the memory leak is because the LFSCK used OSD device to handle OSP (remote MDT) inconsistency by wrong. Such issues have been fixed by LU-5791 patch: http://review.whamcloud.com/13392 . So I enhanced the 13511 patch ( http://review.whamcloud.com/#/c/13511/ ) against the 13392 patch. James, would you please to retry the latest patch? Thanks! BTW, I have verified with your method, but cannot reproduce the failures any longer.
          jamesanunez James Nunez (Inactive) added a comment - - edited

          I ran with your latest patch and uploaded a tar gzipped file at lfsck_logs.tgz.

          The directories/files that are create during this test are, in order:
          create a top level directory called test_dir
          create another directory test_dir/sdir-0
          in sdir_0, create 87 files test_dir/sdir-0/z.N, where N=0.{0..28}, 1.{29..57}, 2.{58..86}
          create one remote directory at test_dir/sdir-0/rdir-1 (left empty)
          create 10 local directory test_dir/sdir-0/ldir-{1..10} (left empty)
          create one striped directory at test_dir/sdir-0/rdir-1 (left empty)
          create 1 hard links at test_dir/sdir-0/ln.1 to z.0.0

          Then the directory test_dir/sdir-1 is created with the same files and directories with the same names as above.

          jamesanunez James Nunez (Inactive) added a comment - - edited I ran with your latest patch and uploaded a tar gzipped file at lfsck_logs.tgz. The directories/files that are create during this test are, in order: create a top level directory called test_dir create another directory test_dir/sdir-0 in sdir_0, create 87 files test_dir/sdir-0/z.N, where N=0.{0..28}, 1.{29..57}, 2.{58..86} create one remote directory at test_dir/sdir-0/rdir-1 (left empty) create 10 local directory test_dir/sdir-0/ldir-{1..10} (left empty) create one striped directory at test_dir/sdir-0/rdir-1 (left empty) create 1 hard links at test_dir/sdir-0/ln.1 to z.0.0 Then the directory test_dir/sdir-1 is created with the same files and directories with the same names as above.

          James,

          Thanks for the verification. I found more clew and updated the patch (http://review.whamcloud.com/#/c/13511/). Would you please to retry the latest patch with -1 debug_log enabled? BTW, please tell me what files/subdirs you created under the striped directory for test? Thanks!

          yong.fan nasf (Inactive) added a comment - James, Thanks for the verification. I found more clew and updated the patch ( http://review.whamcloud.com/#/c/13511/ ). Would you please to retry the latest patch with -1 debug_log enabled? BTW, please tell me what files/subdirs you created under the striped directory for test? Thanks!

          I've attached logs from the MDSs with the build for patch 13511; lfsck_log_mds01_13511.txt and lfsck_log_mds02_13511.txt.

          The first time I ran test 3.3.2, it completed with no problems. I ran it a second time and LFSCK namespace on MDS01 is stuck in 'scanning-phase1' as described in this ticket.

          jamesanunez James Nunez (Inactive) added a comment - I've attached logs from the MDSs with the build for patch 13511; lfsck_log_mds01_13511.txt and lfsck_log_mds02_13511.txt. The first time I ran test 3.3.2, it completed with no problems. I ran it a second time and LFSCK namespace on MDS01 is stuck in 'scanning-phase1' as described in this ticket.

          Here are the logs from the MDSs with only the first patch, 13493, applied. I stopped collecting logs after a few minutes of MDS01 stuck in scanning-phase1.

          The script, sets fail_loc to 1603 on the MDSs, writes/creates objects, sets fail_loc to 0 on the MDSs and then runs LFSCK namespace.

          I will test with the new patch next.

          jamesanunez James Nunez (Inactive) added a comment - Here are the logs from the MDSs with only the first patch, 13493, applied. I stopped collecting logs after a few minutes of MDS01 stuck in scanning-phase1. The script, sets fail_loc to 1603 on the MDSs, writes/creates objects, sets fail_loc to 0 on the MDSs and then runs LFSCK namespace. I will test with the new patch next.

          The 13511 is based on the patch 13493. It gives more fix for the possible hung during locating object.
          James, would you please to try this one when you have time. Thanks!

          yong.fan nasf (Inactive) added a comment - The 13511 is based on the patch 13493. It gives more fix for the possible hung during locating object. James, would you please to try this one when you have time. Thanks!

          Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13511
          Subject: LU-6109 lfsck: check FID validity before locating object
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: e248ffbf7d27417b3b85e930b0266d4719cb75ce

          gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13511 Subject: LU-6109 lfsck: check FID validity before locating object Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e248ffbf7d27417b3b85e930b0266d4719cb75ce

          To be confirmed with you: before LFSCK run, you injected failure stub fail_loc=1603 on both MDTs, so all the files/directories/remote_directories/striped_directories have crashed linkEA, then run namespace LFSCK to repair them. Right?

          yong.fan nasf (Inactive) added a comment - To be confirmed with you: before LFSCK run, you injected failure stub fail_loc=1603 on both MDTs, so all the files/directories/remote_directories/striped_directories have crashed linkEA, then run namespace LFSCK to repair them. Right?

          Thanks James! If possible, please upload the log to Jira directly, that is more easy for me to access.

          yong.fan nasf (Inactive) added a comment - Thanks James! If possible, please upload the log to Jira directly, that is more easy for me to access.

          There are no files deleted during this test.

          I did not capture kernel logs. So, let me run again and I'll upload loads when I have them.

          jamesanunez James Nunez (Inactive) added a comment - There are no files deleted during this test. I did not capture kernel logs. So, let me run again and I'll upload loads when I have them.

          James, would you please to upload the MDS debug log to the Jira? Is there any file deleted during your test? Thanks!

          yong.fan nasf (Inactive) added a comment - James, would you please to upload the MDS debug log to the Jira? Is there any file deleted during your test? Thanks!

          People

            yong.fan nasf (Inactive)
            jamesanunez James Nunez (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: