[LU-7793] sanity-lfsck test_23a fails with ‘(10) unexpected status' Created: 18/Feb/16  Updated: 08/Dec/16  Resolved: 19/Feb/16

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0, Lustre 2.9.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

autotest review-dne-part-2


Issue Links:
Duplicate
duplicates LU-7256 sanity-lfsck TIMEOUT on umount /mnt/mds4 Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity-lfsck test 23a fails due to the status of LFSCK is ‘partial’ after 32 seconds. From the test_log

Update not seen after 32s: wanted 'completed' got 'partial'
CMD: shadow-39vm7 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: partial
flags: incomplete
param: all_targets,create_mdtobj
last_completed_time: 1455743680
time_since_last_completed: 41 seconds
latest_start_time: 1455743680
time_since_latest_start: 41 seconds
last_checkpoint_time: 1455743680
time_since_last_checkpoint: 41 seconds
latest_start_position: 12, N/A, N/A
last_checkpoint_position: 25143, [0x200000405:0x1:0x0], 0x51d748fbf7f3ae2b
first_failure_position: N/A, N/A, N/A
checked_phase1: 7
checked_phase2: 15
updated_phase1: 1
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 4
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 0
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 1
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 0
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
success_count: 7
run_time_phase1: 0 seconds
run_time_phase2: 0 seconds
average_speed_phase1: 7 items/sec
average_speed_phase2: 15 objs/sec
average_speed_total: 11 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
 sanity-lfsck test_23a: @@@@@@ FAIL: (10) unexpected status 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4670:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:4704:error()
  = /usr/lib64/lustre/tests/sanity-lfsck.sh:3061:test_23a()
  = /usr/lib64/lustre/tests/test-framework.sh:4951:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:4988:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4853:run_test()
  = /usr/lib64/lustre/tests/sanity-lfsck.sh:3072:main()

Here are some logs from recent autotest sessions that have failed with this error:
2015-12-12 05:53:50 - https://testing.hpdd.intel.com/test_sets/3d1e8ce4-a0bf-11e5-8bbb-5254006e85c2
2016-01-18 07:17:43 - https://testing.hpdd.intel.com/test_sets/0a93dca6-bde4-11e5-b446-5254006e85c2
2016-02-16 15:33:34 - https://testing.hpdd.intel.com/test_sets/e7437c0c-d4e0-11e5-8530-5254006e85c2
2016-02-17 20:49:39 - https://testing.hpdd.intel.com/test_sets/800ec35c-d5df-11e5-afbd-5254006e85c2



 Comments   
Comment by nasf (Inactive) [ 19/Feb/16 ]

It is because the former LFSCK instance on the MDT1 did not completed before the test_23a start, as to the new LFSCK instance cannot start on the MDT1.

00100000:10000000:1.0:1449900936.689141:0:17840:0:(lfsck_lib.c:2061:lfsck_async_interpret_common()) lustre-MDT0000-osd: fail to notify MDT 1 for lfsck_namespace start: rc = -114

That is a known issue, and have been resolved by the patch http://review.whamcloud.com/#/c/17406/

Comment by nasf (Inactive) [ 19/Feb/16 ]

It is another failure instance of LU-7256

Generated at Sat Feb 10 02:11:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.