[LU-13825] sanity-lfsck test_18b: (2) MDS4 is not the expected 'completed' Created: 28/Jul/20  Updated: 25/Feb/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.6
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Olaf Faaland <faaland1@llnl.gov>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/8f623a2c-dce5-440b-a641-3b30e4b1b1e1

test_18b failed with the following error:

(2) MDS4 is not the expected 'completed'

Leading up to that in the logs:

Trigger layout LFSCK on all devices to find out orphan OST-object
CMD: trevis-25vm4 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t layout -r -o
Started LFSCK on the device lustre-MDT0000: scrub layout

followed by

Update not seen after 120s: wanted 'completed' got 'scanning-phase2'
 sanity-lfsck test_18b: @@@@@@ FAIL: (2) MDS4 is not the expected 'completed' 

Bash trace dump is:

  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5907:error()
  = /usr/lib64/lustre/tests/sanity-lfsck.sh:2197:test_18b()
  = /usr/lib64/lustre/tests/test-framework.sh:6210:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:6259:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:6099:run_test()
  = /usr/lib64/lustre/tests/sanity-lfsck.sh:2260:main()

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-lfsck test_18b - (2) MDS4 is not the expected 'completed'



 Comments   
Comment by James Nunez (Inactive) [ 08/Dec/20 ]

We have a similar failure in sanity-lfsck test_18c where we are still in the 'scanning-phase2' stage, but enough time elapsed that the scan should have completed; https://testing.whamcloud.com/test_sets/1918c84b-4ba7-4f12-a8ab-68fc9cbca110 . This failure is for 2.12.6 testing.

CMD: trevis-209vm4 /usr/sbin/lctl get_param -n 			mdd.lustre-MDT0002.lfsck_layout |
			awk '/^status/ { print \$2 }'
Update not seen after 120s: wanted 'completed' got 'scanning-phase2'
 sanity-lfsck test_18c: @@@@@@ FAIL: (2) MDS3 is not the expected 'completed' 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5907:error()
  = /usr/lib64/lustre/tests/sanity-lfsck.sh:2329:test_18c()
Comment by Andreas Dilger [ 26/Jan/21 ]

+2 similar failures on test_18e in the past week, all of them on ZFS:

https://testing.whamcloud.com/sub_tests/91a11d27-ffaa-4384-823b-bd0273368622
https://testing.whamcloud.com/sub_tests/15c0d51a-214e-4703-900d-9398e2daa681

Comment by Andreas Dilger [ 25/Feb/22 ]

+1 on b2_12+zfs: https://testing.whamcloud.com/test_sets/c61d02ed-b7b2-4610-9ccd-81f59aa7dec5

Generated at Sat Feb 10 03:04:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.