[LU-10732] sanity-lfsck test_9a: FAIL: (7) Failed to get expected 'completed' Created: 27/Feb/18  Updated: 19/Apr/18  Resolved: 19/Apr/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Minor
Reporter: Jian Yu Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: zfs
Environment:

FSTYPE=zfs


Issue Links:
Related
is related to LU-8856 ZFS-MDT 100% full. Cannot delete files. Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity-lfsck test_9a - (7) Failed to get expected 'completed'
^^^^^^^^^^^^^ DO NOT REMOVE LINE ABOVE ^^^^^^^^^^^^^

This issue was created by maloo for jianyu <jian.yu@intel.com>

This issue relates to the following test suite run:

test_9a failed with the following error:

Update not seen after 30s: wanted 'completed' got 'scanning-phase2'
(7) Failed to get expected 'completed'

Maloo reports:
https://testing.hpdd.intel.com/test_sets/9b5dd5d6-1bc3-11e8-a7cd-52540065bddc
https://testing.hpdd.intel.com/test_sets/2bcb4f88-1bdb-11e8-bd00-52540065bddc
https://testing.hpdd.intel.com/test_sets/a1a3d500-1bd5-11e8-a10a-52540065bddc



 Comments   
Comment by Jian Yu [ 27/Feb/18 ]

The failure occurred about 7 times in last 3 days, which is affecting patch testing on master branch.

Comment by Peter Jones [ 27/Feb/18 ]

Seems to be due to the LU-8856 landing and so that has been reverted

Comment by Alex Zhuravlev [ 27/Feb/18 ]

I just tried locally w/o LU-8856:
sanity-lfsck test_9a: @@@@@@ FAIL: (7) Failed to get expected 'completed'

Comment by Alex Zhuravlev [ 27/Feb/18 ]

https://testing.hpdd.intel.com/sub_tests/query?utf8=✓&test_set%5Btest_set_script_id%5D=4f25830c-64fe-11e2-bfb2-52540035b04c&sub_test%5Bsub_test_script_id%5D=52f5ae12-64fe-11e2-bfb2-52540035b04c&status%5B%5D=FAIL&query_bugs=&warn%5Bnotice%5D=false&builds=&hosts=&gerrit=&test_session%5Btest_group%5D%5B%5D=&test_session%5Buser_id%5D%5B%5D=&test_session%5Bquery_recent_period%5D=2332800&test_session%5Bstart_date%5D=&test_session%5Bend_date%5D=&test_node%5Bos_type_id%5D=&test_node%5Bdistribution_type_id%5D=&test_node%5Barchitecture_type_id%5D=&test_node%5Bfile_system_type_id%5D=&test_node%5Blustre_branch_id%5D=&test_node_network%5Bnetwork_type_id%5D=&commit=Update+results&num_results=50

it's been failing long before that patch.

Comment by Bob Glossman (Inactive) [ 27/Feb/18 ]

another on master:
https://testing.hpdd.intel.com/test_sets/dea2fe62-1bfe-11e8-a7cd-52540065bddc

Comment by nasf (Inactive) [ 28/Feb/18 ]

The logs show that the layout LFSCK on the OST ran very slowly, as to the LFSCK engine on the MDT had to wait for the OST when move to the 2nd phase scanning. But the test scripts does NOT set speed limitation on the OST. And this trouble only happened on the ZFS-based backend.

Comment by Alex Zhuravlev [ 02/Mar/18 ]

https://review.whamcloud.com/#/c/31488/ – this helped me locally, but not sure about autotest..

Comment by Gerrit Updater [ 19/Apr/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31488/
Subject: LU-10732 tests: sanity-lfsck to reset speed limit
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 08342ffa94f559a767c9affbad75440edb6ba024

Comment by Peter Jones [ 19/Apr/18 ]

Landed for 2.12

Generated at Sat Feb 10 02:37:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.