[LU-7255] sanity-lfsck test_10 FAIL: (16) unexpected status Created: 05/Oct/15  Updated: 15/Jun/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0, Lustre 2.11.0, Lustre 2.10.7, Lustre 2.12.1, Lustre 2.10.8, Lustre 2.12.8
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity-lfsck test 10 fails with ' (16) unexpected status'. Logs are at: https://testing.hpdd.intel.com/test_sets/0b832922-6b10-11e5-ae45-5254006e85c2

From the test_log, it may be that the LSFCK namespace scan just didn't have enough time to complete because 'status: scanning-phase2', unless it's just hung in scanning-phase2.

The LFSCK namespace output from the test_log:

Update not seen after 32s: wanted 'completed' got 'scanning-phase2'
CMD: shadow-4vm8 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: scanning-phase2
flags: scanned-once,inconsistent
param:
last_completed_time: 1443996764
time_since_last_completed: 105 seconds
latest_start_time: 1443996809
time_since_latest_start: 60 seconds
last_checkpoint_time: 1443996844
time_since_last_checkpoint: 25 seconds
latest_start_position: 12, N/A, N/A
last_checkpoint_position: 26097, [0x2000013a2:0x2:0x0], 0x2b5be1c15cced998
first_failure_position: N/A, N/A, N/A
checked_phase1: 8077
checked_phase2: 4652
updated_phase1: 3990
updated_phase2: 4142
failed_phase1: 0
failed_phase2: 0
directories: 1004
dirent_repaired: 0
linkea_repaired: 3990
nlinks_repaired: 0
multiple_linked_checked: 6988
multiple_linked_repaired: 1
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 0
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
success_count: 1
run_time_phase1: 35 seconds
run_time_phase2: 24 seconds
average_speed_phase1: 230 items/sec
average_speed_phase2: 193 objs/sec
average_speed_total: 215 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: 197 objs/sec
current_position: [0x2000013a1:0x9da:0x0]
 sanity-lfsck test_10: @@@@@@ FAIL: (16) unexpected status 


 Comments   
Comment by Saurabh Tandan (Inactive) [ 16/Dec/15 ]

Server: Master, Build# 3266, Tag 2.7.64 , RHEL 7
Client: 2.5.5, b2_5_fe/62
https://testing.hpdd.intel.com/test_sets/f0681b28-9fff-11e5-a33d-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 19/Jan/16 ]

Another instance found for interop : EL6.7 Server/2.5.5 Client
Server: master, build# 3303, RHEL 6.7
Client: 2.5.5, b2_5_fe/62
https://testing.hpdd.intel.com/test_sets/2b1ea868-bad6-11e5-9137-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 10/Feb/16 ]

Another instance found for interop tag 2.7.66 - EL6.7 Server/2.5.5 Client, build# 3316
https://testing.hpdd.intel.com/test_sets/aeccad56-cc9f-11e5-963e-5254006e85c2

Comment by Andreas Dilger [ 24/Mar/22 ]

This subtest is failing very intermittently - 38 failures with this "unexpected status" message in the past 9 months, 8 failures in the past 4 weeks.

Comment by Alexander Zarochentsev [ 15/Dec/22 ]

+1 hit https://testing.whamcloud.com/test_sets/9163c092-9b77-4b94-a28f-6f95570f4a9d

Comment by Serguei Smirnov [ 15/Jun/23 ]

+1 on master: https://testing.whamcloud.com/test_sets/50222a23-a8ca-43d7-9401-35c9c70002d4

Generated at Sat Feb 10 02:07:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.