[LU-3041] Interop 1.8.9<->2.4 failure on test suite lfsck Created: 27/Mar/13  Updated: 13/Jun/13  Resolved: 13/Jun/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 1.8.9

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-3132 lfsck: failed to recreate /mnt/lustre... Closed
Severity: 3
Rank (Obsolete): 7421

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/d2c16ae8-948b-11e2-93c6-52540035b04c.

lustre-OST0000: ********** WARNING: Filesystem still has errors **********


         187 inodes used (0.14%, out of 131072)
           1 non-contiguous file (0.5%)
           0 non-contiguous directories (0.0%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 181
       35296 blocks used (6.73%, out of 524288)
           0 bad blocks
           1 large file

         139 regular files
          39 directories
           0 character device files
           0 block device files
           0 fifos
           0 links
           0 symbolic links (0 fast symbolic links)
           0 sockets
------------
         178 files
Memory used: 2428k/21180k (738k/1691k), time:  0.15/ 0.05/ 0.01
I/O read: 6MB, write: 0MB, rate: 39.22MB/s
 lfsck : @@@@@@ FAIL: e2fsck -d -v -t -t -f -n --mdsdb /home/autotest/.autotest/shared_dir/2013-03-22/182230-70011888563720/mdsdb --ostdb /home/autotest/.autotest/shared_dir/2013-03-22/182230-70011888563720/ostdb-0 /dev/mapper/lvm--OSS-P1 returned 4, should be <= 1 


 Comments   
Comment by Niu Yawei (Inactive) [ 27/Mar/13 ]

I think it's the 1.8 test script problem:

00:45:47:CMD: client-27vm4 debugfs -w -f /home/autotest/.autotest/shared_dir/2013-03-22/182230-70011888563720/debugfs.0VwKusgYux /dev/mapper/lvm--OSS-P1
00:45:47:client-27vm4: debugfs 1.42.6.wc2 (10-Dec-2012)
00:45:47:debugfs: rm O/0/d10/10
00:45:47:
00:45:47:debugfs: rm O/0/d11/11

lfsck.sh of 1.8 is still using debugfs to remove the objects, which result in quota accounting inconsistency on OST:

00:46:04:client-27vm4: [QUOTA WARNING] Usage inconsistent for ID 0:actual (1814528, 178) != expected (2830336, 180)
00:46:04:client-27vm4: [QUOTA WARNING] Usage inconsistent for ID 0:actual (1814528, 178) != expected (2830336, 180)

If we want to run 1.8 test script lfsck.sh (same to 2.1) against 2.4 servers, we have to backport the script fix of LU-2663 to 1.8 (and 2.1), see: http://review.whamcloud.com/#change,5186

BTW: The fix of LU-2694 should be backport to 1.8 as well (it's already in 2.1), see: http://review.whamcloud.com/#change,5762

Comment by Peter Jones [ 28/Mar/13 ]

Emoly

Could you please take care of this one?

Thanks

Peter

Comment by Emoly Liu [ 29/Mar/13 ]

OK, I will backport these patches and have a test.

Comment by Emoly Liu [ 07/Apr/13 ]

I create a patch http://review.whamcloud.com/#change,5968 per Niu's advice.

But now I find even on the original b1_8 branch, lfsck can't pass on my local VM. I hit the same error to https://maloo.whamcloud.com/test_sessions/3acb15f6-9f83-11e2-9f27-52540035b04c

I will create another patch for test only to see if lfsck script has any problem on b1_8. The result is at https://maloo.whamcloud.com/test_sessions/61ecab6c-a051-11e2-898f-52540035b04c .

Comment by Niu Yawei (Inactive) [ 08/Apr/13 ]

But now I find even on the original b1_8 branch, lfsck can't pass on my local VM. I hit the same error to https://maloo.whamcloud.com/test_sessions/3acb15f6-9f83-11e2-9f27-52540035b04c

Emoly, can the patch fix this failure?

Comment by Emoly Liu [ 08/Apr/13 ]

Now there are two lfsck issues we discussed. One is interop 1.8<->2.4 failure, the other is b1_8 lfsck failure.

This patch can't fix the latter one.

I just added serverjob and serverbuildno to Test-Parameters in the patch commit msg. Let's see if this patch can fix interop failure.

Comment by Emoly Liu [ 09/Apr/13 ]

The original b1_8 lfsck test log shows

lfsck 1.42.6.wc2 (10-Dec-2012)
lfsck: ost_idx 0: pass1: check for duplicate objects
lfsck: ost_idx 0: pass1 OK (20 files total)
lfsck: ost_idx 0: pass2: check for missing inode objects
[0]: failed to recreate /mnt/lustre/d0.lfsck/testfile.7 missing obj 0:12
lfsck: ost_idx 0: pass2 ERROR: 1 dangling inodes found (21 files total)

I can reproduce this error easily. I will file it in another ticket and look into it.

Comment by Emoly Liu [ 13/Jun/13 ]

Patch landed for b1_8

Generated at Sat Feb 10 01:30:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.