Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4917

LFSCK run time reported is incorrect during check

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.6.0
    • Lustre 2.6.0
    • OpenSFS cluster with four MDTs, three OSTs with two OSSs each, and six clients
    • 3
    • 13582

    Description

      While LFSCK is running, the total run time of LFSCK is attributed to phase 1 even when LFSCK is running phase 2. When LFSCK ends or is stopped, the time is broken out into phase 1 and phase 2.

      For example, while LFSCk is running, we see that we are in phase 2 (status), but all the time is attributed to phase 1 (run_time_phase1)

      # lctl get_param -n mdd.scratch-MDT0000.lfsck_layout
      name: lfsck_layout
      magic: 0xb173ae14
      version: 2
      status: scanning-phase2
      flags: scanned-once
      param: all_targets,orphan
      time_since_last_completed: 321 seconds
      time_since_latest_start: 214 seconds
      time_since_last_checkpoint: 102 seconds
      latest_start_position: 0
      last_checkpoint_position: 47185921
      first_failure_position: 0
      success_count: 120
      repaired_dangling: 0
      repaired_unmatched_pair: 0
      repaired_multiple_referenced: 0
      repaired_orphan: 0
      repaired_inconsistent_owner: 0
      repaired_others: 0
      skipped: 0
      failed_phase1: 64
      failed_phase2: 25
      checked_phase1: 8668585
      checked_phase2: 0
      run_time_phase1: 214 seconds
      run_time_phase2: 0 seconds
      average_speed_phase1: 40507 items/sec
      average_speed_phase2: N/A
      real-time_speed_phase1: 22165 items/sec
      real-time_speed_phase2: N/A
      current_position: [0x100070000:0x71dde4:0x0]
      

      When LFSCK ends, we see that phase 1 only took 112 seconds, not the 214 seconds reported above, and the rest of the time was spent in phase 2.

      # lctl get_param -n mdd.scratch-MDT0000.lfsck_layout
      name: lfsck_layout
      magic: 0xb173ae14
      version: 2
      status: completed
      flags:
      param: all_targets,orphan
      time_since_last_completed: 10 seconds
      time_since_latest_start: 239 seconds
      time_since_last_checkpoint: 10 seconds
      latest_start_position: 0
      last_checkpoint_position: 47185921
      first_failure_position: 0
      success_count: 121
      repaired_dangling: 0
      repaired_unmatched_pair: 0
      repaired_multiple_referenced: 0
      repaired_orphan: 0
      repaired_inconsistent_owner: 0
      repaired_others: 0
      skipped: 0
      failed_phase1: 64
      failed_phase2: 27
      checked_phase1: 6400022
      checked_phase2: 2592717
      run_time_phase1: 112 seconds
      run_time_phase2: 117 seconds
      average_speed_phase1: 57143 items/sec
      average_speed_phase2: 22159 objs/sec
      real-time_speed_phase1: N/A
      real-time_speed_phase2: N/A
      current_position: N/A
      

      Attachments

        Activity

          People

            yong.fan nasf (Inactive)
            jamesanunez James Nunez (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: