[LU-9437] sanity-lfsck test_33: only 0 of 4 MDTs are in completed Created: 02/May/17  Updated: 03/Nov/18  Resolved: 03/Mar/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0, Lustre 2.10.2, Lustre 2.10.3
Fix Version/s: Lustre 2.11.0, Lustre 2.10.4

Type: Bug Priority: Critical
Reporter: James Casper Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: dne, zfs
Environment:

trevis-50, full, DNE+ZFS
EL7, master branch, v2.9.56.11, b3565


Issue Links:
Related
is related to LU-11151 sanity-lfsck test_33: (5) only 0 of 4... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

https://testing.hpdd.intel.com/test_sessions/30cc75b6-594f-4255-accf-24fe11bdd565

Just before the failure occured:

From test_log:

Started LFSCK on the device lustre-MDT0000: scrub namespace
CMD: trevis-50vm7 /usr/sbin/lctl lfsck_query -t namespace -M lustre-MDT0000 -w |
		      awk '/^namespace_mdts_completed/ { print \$2 }'
CMD: trevis-50vm7 /usr/sbin/lctl lfsck_query -t namespace -M lustre-MDT0000
namespace_mdts_init: 0
namespace_mdts_scanning-phase1: 0
namespace_mdts_scanning-phase2: 0
namespace_mdts_completed: 0
namespace_mdts_failed: 3
namespace_mdts_stopped: 0
namespace_mdts_paused: 0
namespace_mdts_crashed: 0
namespace_mdts_partial: 0
namespace_mdts_co-failed: 1
namespace_mdts_co-stopped: 0
namespace_mdts_co-paused: 0
namespace_mdts_unknown: 0

And then the same output for namespace_osts, but all are 0. Then the failure was seen:

 sanity-lfsck test_33: @@@@@@ FAIL: (5) only 0 of 4 MDTs are in completed 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4931:error()
  = /usr/lib64/lustre/tests/sanity-lfsck.sh:142:wait_all_targets_blocked()
  = /usr/lib64/lustre/tests/sanity-lfsck.sh:5046:test_33()
  = /usr/lib64/lustre/tests/test-framework.sh:5207:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5246:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:5093:run_test()
  = /usr/lib64/lustre/tests/sanity-lfsck.sh:5052:main()


 Comments   
Comment by James Nunez (Inactive) [ 01/Feb/18 ]

This test continues to fail in full test sessions for DNE with ZFS, but there are also some hangs during this test and right after the test fails. I can’t find much information about the hang in the logs, but here are a few links to recent test_33 hangs:
https://testing.hpdd.intel.com/test_sets/14b87b72-004c-11e8-bd00-52540065bddc
https://testing.hpdd.intel.com/test_sets/ea7624b0-fd83-11e7-a7cd-52540065bddc
https://testing.hpdd.intel.com/sub_tests/b2d23068-f710-11e7-a6ad-52540065bddc

I can open a new ticket if the hang is a separate issue from the test failure.

Comment by Minh Diep [ 06/Feb/18 ]

we have started dne-zfs-part-2 and hit this bug

https://testing.hpdd.intel.com/test_sets/1e81caf6-0b54-11e8-a7cd-52540065bddc

Comment by Emoly Liu [ 08/Feb/18 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/11e161d6-0ca7-11e8-a10a-52540065bddc

Comment by Jian Yu [ 08/Feb/18 ]

This failure occurred more than 50 times in one week, which is affecting patch testing on master branch:
https://testing.hpdd.intel.com/test_sets/8844fac6-0ce4-11e8-a6ad-52540065bddc
https://testing.hpdd.intel.com/test_sets/d13951c6-0ceb-11e8-a6ad-52540065bddc
https://testing.hpdd.intel.com/test_sets/5d7a4c72-0cdc-11e8-a7cd-52540065bddc

Comment by Gerrit Updater [ 12/Feb/18 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/31266
Subject: LU-9437 lfsck: handle LMV EA for migrating directory
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 19f5edf2f0a52686e58c420168f5a88559812115

Comment by Gerrit Updater [ 03/Mar/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31266/
Subject: LU-9437 lfsck: handle LMV EA for migrating directory
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 05cfe91c2714a77f5ad3de4a7e58e20b6df17b83

Comment by Gerrit Updater [ 05/Mar/18 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31518
Subject: LU-9437 lfsck: handle LMV EA for migrating directory
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 17527a603219098225c31a9f9887ddca3e72aeb5

Comment by Gerrit Updater [ 05/Apr/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31518/
Subject: LU-9437 lfsck: handle LMV EA for migrating directory
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 251cce39e896dc668c1338ea15fa145a6b262362

Generated at Sat Feb 10 02:26:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.