[LU-4917] LFSCK run time reported is incorrect during check Created: 16/Apr/14 Updated: 18/Apr/14 Resolved: 18/Apr/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | James Nunez (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | lfsck | ||
| Environment: |
OpenSFS cluster with four MDTs, three OSTs with two OSSs each, and six clients |
||
| Severity: | 3 |
| Rank (Obsolete): | 13582 |
| Description |
|
While LFSCK is running, the total run time of LFSCK is attributed to phase 1 even when LFSCK is running phase 2. When LFSCK ends or is stopped, the time is broken out into phase 1 and phase 2. For example, while LFSCk is running, we see that we are in phase 2 (status), but all the time is attributed to phase 1 (run_time_phase1) # lctl get_param -n mdd.scratch-MDT0000.lfsck_layout name: lfsck_layout magic: 0xb173ae14 version: 2 status: scanning-phase2 flags: scanned-once param: all_targets,orphan time_since_last_completed: 321 seconds time_since_latest_start: 214 seconds time_since_last_checkpoint: 102 seconds latest_start_position: 0 last_checkpoint_position: 47185921 first_failure_position: 0 success_count: 120 repaired_dangling: 0 repaired_unmatched_pair: 0 repaired_multiple_referenced: 0 repaired_orphan: 0 repaired_inconsistent_owner: 0 repaired_others: 0 skipped: 0 failed_phase1: 64 failed_phase2: 25 checked_phase1: 8668585 checked_phase2: 0 run_time_phase1: 214 seconds run_time_phase2: 0 seconds average_speed_phase1: 40507 items/sec average_speed_phase2: N/A real-time_speed_phase1: 22165 items/sec real-time_speed_phase2: N/A current_position: [0x100070000:0x71dde4:0x0] When LFSCK ends, we see that phase 1 only took 112 seconds, not the 214 seconds reported above, and the rest of the time was spent in phase 2. # lctl get_param -n mdd.scratch-MDT0000.lfsck_layout name: lfsck_layout magic: 0xb173ae14 version: 2 status: completed flags: param: all_targets,orphan time_since_last_completed: 10 seconds time_since_latest_start: 239 seconds time_since_last_checkpoint: 10 seconds latest_start_position: 0 last_checkpoint_position: 47185921 first_failure_position: 0 success_count: 121 repaired_dangling: 0 repaired_unmatched_pair: 0 repaired_multiple_referenced: 0 repaired_orphan: 0 repaired_inconsistent_owner: 0 repaired_others: 0 skipped: 0 failed_phase1: 64 failed_phase2: 27 checked_phase1: 6400022 checked_phase2: 2592717 run_time_phase1: 112 seconds run_time_phase2: 117 seconds average_speed_phase1: 57143 items/sec average_speed_phase2: 22159 objs/sec real-time_speed_phase1: N/A real-time_speed_phase2: N/A current_position: N/A |
| Comments |
| Comment by nasf (Inactive) [ 17/Apr/14 ] |
|
Here is the patch: |
| Comment by nasf (Inactive) [ 18/Apr/14 ] |
|
The patch has been landed to master. |