[LU-9437] sanity-lfsck test_33: only 0 of 4 MDTs are in completed Created: 02/May/17 Updated: 03/Nov/18 Resolved: 03/Mar/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0, Lustre 2.10.2, Lustre 2.10.3 |
| Fix Version/s: | Lustre 2.11.0, Lustre 2.10.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | James Casper | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | dne, zfs | ||
| Environment: |
trevis-50, full, DNE+ZFS |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
https://testing.hpdd.intel.com/test_sessions/30cc75b6-594f-4255-accf-24fe11bdd565 Just before the failure occured: From test_log: Started LFSCK on the device lustre-MDT0000: scrub namespace
CMD: trevis-50vm7 /usr/sbin/lctl lfsck_query -t namespace -M lustre-MDT0000 -w |
awk '/^namespace_mdts_completed/ { print \$2 }'
CMD: trevis-50vm7 /usr/sbin/lctl lfsck_query -t namespace -M lustre-MDT0000
namespace_mdts_init: 0
namespace_mdts_scanning-phase1: 0
namespace_mdts_scanning-phase2: 0
namespace_mdts_completed: 0
namespace_mdts_failed: 3
namespace_mdts_stopped: 0
namespace_mdts_paused: 0
namespace_mdts_crashed: 0
namespace_mdts_partial: 0
namespace_mdts_co-failed: 1
namespace_mdts_co-stopped: 0
namespace_mdts_co-paused: 0
namespace_mdts_unknown: 0
And then the same output for namespace_osts, but all are 0. Then the failure was seen: sanity-lfsck test_33: @@@@@@ FAIL: (5) only 0 of 4 MDTs are in completed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4931:error() = /usr/lib64/lustre/tests/sanity-lfsck.sh:142:wait_all_targets_blocked() = /usr/lib64/lustre/tests/sanity-lfsck.sh:5046:test_33() = /usr/lib64/lustre/tests/test-framework.sh:5207:run_one() = /usr/lib64/lustre/tests/test-framework.sh:5246:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:5093:run_test() = /usr/lib64/lustre/tests/sanity-lfsck.sh:5052:main() |
| Comments |
| Comment by James Nunez (Inactive) [ 01/Feb/18 ] |
|
This test continues to fail in full test sessions for DNE with ZFS, but there are also some hangs during this test and right after the test fails. I can’t find much information about the hang in the logs, but here are a few links to recent test_33 hangs: I can open a new ticket if the hang is a separate issue from the test failure. |
| Comment by Minh Diep [ 06/Feb/18 ] |
|
we have started dne-zfs-part-2 and hit this bug https://testing.hpdd.intel.com/test_sets/1e81caf6-0b54-11e8-a7cd-52540065bddc |
| Comment by Emoly Liu [ 08/Feb/18 ] |
|
+1 on master: |
| Comment by Jian Yu [ 08/Feb/18 ] |
|
This failure occurred more than 50 times in one week, which is affecting patch testing on master branch: |
| Comment by Gerrit Updater [ 12/Feb/18 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/31266 |
| Comment by Gerrit Updater [ 03/Mar/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31266/ |
| Comment by Gerrit Updater [ 05/Mar/18 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31518 |
| Comment by Gerrit Updater [ 05/Apr/18 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31518/ |