[LU-8810] sanity-lfsck test_18d: @@@@@@ FAIL: (3.0) MDS1 is not the expected 'scanning-phase2' Created: 08/Nov/16  Updated: 17/Dec/16  Resolved: 17/Dec/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: nasf (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
== sanity-lfsck test 18d: Find out orphan OST-object and repair it (4) =============================== 16:03:13 (1477411393)
#####
The target MDT-object layout EA slot is occpuied by some new
created OST-object when repair dangling reference case. Such
conflict OST-object has never been modified. Then when found
the orphan OST-object, LFSCK will replace it with the orphan
OST-object.
#####
[0x280000400:0x4:0x0]
/mnt/lustre/d18d.sanity-lfsck/a1/f1
lmm_stripe_count:   1
lmm_stripe_size:    1048576
lmm_pattern:        1
lmm_layout_gen:     0
lmm_stripe_offset:  0
	obdidx		 objid		 objid		 group
	     0	             2	          0x2	             0

[0x280000400:0x5:0x0]
/mnt/lustre/d18d.sanity-lfsck/a1/f2
lmm_stripe_count:   1
lmm_stripe_size:    1048576
lmm_pattern:        1
lmm_layout_gen:     0
lmm_stripe_offset:  0
	obdidx		 objid		 objid		 group
	     0	             3	          0x3	             0

Inject failure to make /mnt/lustre/d18d.sanity-lfsck/a1/f1 and /mnt/lustre/d18d.sanity-lfsck/a1/f2
to reference the same OST-object (which is f1's OST-obejct).
Then drop /mnt/lustre/d18d.sanity-lfsck/a1/f1 and its OST-object, so f2 becomes
dangling reference case, but f2's old OST-object is there.

fail_loc=0x1618
fail_loc=0
stopall to cleanup object cache
setupall
pdsh@fre0127: fre0125: ssh exited with exit code 1
pdsh@fre0127: fre0125: ssh exited with exit code 1
pdsh@fre0127: fre0125: ssh exited with exit code 1
pdsh@fre0127: fre0125: ssh exited with exit code 1
pdsh@fre0127: fre0125: ssh exited with exit code 1
pdsh@fre0127: fre0125: ssh exited with exit code 1
pdsh@fre0127: fre0126: ssh exited with exit code 1
pdsh@fre0127: fre0126: ssh exited with exit code 1
pdsh@fre0127: fre0126: ssh exited with exit code 1
pdsh@fre0127: fre0126: ssh exited with exit code 1
The file size should be incorrect since dangling referenced
ls: cannot access /mnt/lustre/d18d.sanity-lfsck/a1/f2: No such file or directory
fail_val=5
fail_loc=0x1602
Trigger layout LFSCK on all devices to find out orphan OST-object
Started LFSCK on the device lustre-MDT0000: scrub layout
Waiting 120 secs for update
Waiting 110 secs for update
Waiting 100 secs for update
Waiting 90 secs for update
Waiting 80 secs for update
Waiting 70 secs for update
Waiting 60 secs for update
Waiting 50 secs for update
Waiting 40 secs for update
Waiting 30 secs for update
Waiting 20 secs for update
Waiting 10 secs for update
Update not seen after 120s: wanted 'scanning-phase2' got 'completed'
 sanity-lfsck test_18d: @@@@@@ FAIL: (3.0) MDS1 is not the expected 'scanning-phase2' 
...
Resetting fail_loc on all nodes...done.
FAIL 18d (214s)



 Comments   
Comment by Gerrit Updater [ 08/Nov/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/23650
Subject: LU-8810 tests: skip non-crucial LFSCK intermediateness check
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9e9eb8a3082a97a94712e1bed6883718763ed3bb

Comment by Gerrit Updater [ 17/Dec/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23650/
Subject: LU-8810 tests: skip non-crucial LFSCK intermediateness check
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 790b56d82cf82dbaf30c1d1788e647d1e4a8dee0

Generated at Sat Feb 10 02:20:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.