[LU-4917] LFSCK run time reported is incorrect during check Created: 16/Apr/14  Updated: 18/Apr/14  Resolved: 18/Apr/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Critical
Reporter: James Nunez (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: lfsck
Environment:

OpenSFS cluster with four MDTs, three OSTs with two OSSs each, and six clients


Severity: 3
Rank (Obsolete): 13582

 Description   

While LFSCK is running, the total run time of LFSCK is attributed to phase 1 even when LFSCK is running phase 2. When LFSCK ends or is stopped, the time is broken out into phase 1 and phase 2.

For example, while LFSCk is running, we see that we are in phase 2 (status), but all the time is attributed to phase 1 (run_time_phase1)

# lctl get_param -n mdd.scratch-MDT0000.lfsck_layout
name: lfsck_layout
magic: 0xb173ae14
version: 2
status: scanning-phase2
flags: scanned-once
param: all_targets,orphan
time_since_last_completed: 321 seconds
time_since_latest_start: 214 seconds
time_since_last_checkpoint: 102 seconds
latest_start_position: 0
last_checkpoint_position: 47185921
first_failure_position: 0
success_count: 120
repaired_dangling: 0
repaired_unmatched_pair: 0
repaired_multiple_referenced: 0
repaired_orphan: 0
repaired_inconsistent_owner: 0
repaired_others: 0
skipped: 0
failed_phase1: 64
failed_phase2: 25
checked_phase1: 8668585
checked_phase2: 0
run_time_phase1: 214 seconds
run_time_phase2: 0 seconds
average_speed_phase1: 40507 items/sec
average_speed_phase2: N/A
real-time_speed_phase1: 22165 items/sec
real-time_speed_phase2: N/A
current_position: [0x100070000:0x71dde4:0x0]

When LFSCK ends, we see that phase 1 only took 112 seconds, not the 214 seconds reported above, and the rest of the time was spent in phase 2.

# lctl get_param -n mdd.scratch-MDT0000.lfsck_layout
name: lfsck_layout
magic: 0xb173ae14
version: 2
status: completed
flags:
param: all_targets,orphan
time_since_last_completed: 10 seconds
time_since_latest_start: 239 seconds
time_since_last_checkpoint: 10 seconds
latest_start_position: 0
last_checkpoint_position: 47185921
first_failure_position: 0
success_count: 121
repaired_dangling: 0
repaired_unmatched_pair: 0
repaired_multiple_referenced: 0
repaired_orphan: 0
repaired_inconsistent_owner: 0
repaired_others: 0
skipped: 0
failed_phase1: 64
failed_phase2: 27
checked_phase1: 6400022
checked_phase2: 2592717
run_time_phase1: 112 seconds
run_time_phase2: 117 seconds
average_speed_phase1: 57143 items/sec
average_speed_phase2: 22159 objs/sec
real-time_speed_phase1: N/A
real-time_speed_phase2: N/A
current_position: N/A


 Comments   
Comment by nasf (Inactive) [ 17/Apr/14 ]

Here is the patch:

http://review.whamcloud.com/9984

Comment by nasf (Inactive) [ 18/Apr/14 ]

The patch has been landed to master.

Generated at Sat Feb 10 01:46:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.