[LU-2694] lfsck in e2fsprogs is out of date Created: 28/Jan/13  Updated: 22/Mar/13  Resolved: 12/Mar/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0, Lustre 2.1.5

Type: Bug Priority: Blocker
Reporter: Niu Yawei (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: LB

Issue Links:
Related
is related to LU-2663 lfsck: e2fsck [QUOTA WARNING] Usage i... Resolved
Severity: 3
Rank (Obsolete): 6283

 Description   

Looks the old lfsck in e2fsprogs hasn't been actively maintained for quite some time, and it can't work with the latest lustre now.

One obvious defect is: lustre has changed the objects directory on OST after FID-on-OST landing, the lfsck is still searching objects under old O/0.



 Comments   
Comment by Niu Yawei (Inactive) [ 04/Feb/13 ]

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/baa5681c-6c5c-11e2-91d6-52540035b04c.

Comment by Jian Yu [ 25/Feb/13 ]

Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/176
Lustre master server build: http://build.whamcloud.com/job/lustre-master/1269
Distro/Arch: RHEL6.3/x86_64

lfsck also failed with the same issue: https://maloo.whamcloud.com/test_sets/6a4f7e3e-7d78-11e2-85d0-52540035b04c

Comment by Peter Jones [ 27/Feb/13 ]

Niu is going to look into this

Comment by Niu Yawei (Inactive) [ 04/Mar/13 ]

first, we'd fix the is_empty_fs in t-f, which can cause unexpected error for lfsck.sh: http://review.whamcloud.com/5576

Comment by Niu Yawei (Inactive) [ 05/Mar/13 ]

The LU-2775 (2adc20013d9c2a5969a3154b0ca93ac007b1a4e2) fixed the compatibility problem, so we don't need to fix e2fsprogs anymore.

However, I still occasionally hit failure when cleanup lustre after running lfsck:

LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) header@ffff880078391ec0[0x0, 3, [0x100000000:0x26:0x0] hash]{
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....lovsub@ffff880078391f58[0]
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....osc@ffff880078390e28id: 38 gr: 0 idx: 0 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) } header@ffff880078391ec0
LustreError: 5539:0:(lov_object.c:184:lov_init_sub())
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) } header@ffff880075173070
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) owned.
LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) header@ffff8800773987a8[0x0, 1, [0x6c:0xc68285fb:0x0]]
LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) try to own.
LustreError: 5539:0:(lcommon_cl.c:1211:cl_file_inode_init()) Failure to initialize cl object [0x6c:0xc68285fb:0x0]: -5
LustreError: 5539:0:(llite_lib.c:2161:ll_prep_inode()) new_inode -fatal: rc -5
LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) header@ffff880078391920[0x0, 3, [0x100010000:0x28:0x0] hash]{
LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) ....lovsub@ffff8800783919b8[0]
LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) ....osc@ffff880078390768id: 40 gr: 0 idx: 1 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048
LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) } header@ffff880078391920
LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) stripe 0 is already owned.
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) header@ffff8800773987a8[0x0, 1, [0xc6:0xc6828600:0x0] hash]{
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....vvp@ffff880077398840(- 0 0) inode: ffff8800675d0ab8 198/3330442752 100644 1 1 ffff880077398840 [0xc6:0xc6828600:0x0]
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....lov@ffff880074430830stripes: 1, valid, lsm{ffff880065b44240 0x0BD10BD0 1 1 0}:
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) header@ffff880078391920[0x0, 3, [0x100010000:0x28:0x0] hash]{
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....lovsub@ffff8800783919b8[0]
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....osc@ffff880078390768id: 40 gr: 0 idx: 1 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) } header@ffff880078391920
LustreError: 5539:0:(lov_object.c:184:lov_init_sub())
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) } header@ffff8800773987a8
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) owned.
LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) header@ffff88004f4f2dd8[0x0, 1, [0x200000400:0x66:0x0]]
LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) try to own.
LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) header@ffff88004f4a0470[0x0, 3, [0x100000000:0x2a:0x0] hash]{
LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) ....lovsub@ffff88004f4a0508[0]
LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) ....osc@ffff880049da2d08id: 42 gr: 0 idx: 0 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048
LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) } header@ffff88004f4a0470
LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) stripe 0 is already owned.
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) header@ffff88004f4f2cd0[0x0, 1, [0xc9:0xc6828603:0x0] hash]{
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....vvp@ffff88004f4f2d68(- 0 0) inode: ffff88007286cb78 201/3330442755 100644 1 1 ffff88004f4f2d68 [0xc9:0xc6828603:0x0]
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....lov@ffff88007286de80stripes: 1, valid, lsm{ffff88006f83cbc0 0x0BD10BD0 1 1 0}:
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) header@ffff88004f4a0470[0x0, 3, [0x100000000:0x2a:0x0] hash]{
LustreError: 5538:0:(lov_object.c:183:lov_init_sub()) header@ffff8800751719d0[0x0, 3, [0x100010000:0x29:0x0] hash]{
LustreError: 5538:0:(lov_object.c:183:lov_init_sub()) ....lovsub@ffff880075171a68[0]
LustreError: 5538:0:(lov_object.c:183:lov_init_sub()) ....osc@ffff8800751708c8id: 41 gr: 0 idx: 1 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048
LustreError: 5538:0:(lov_object.c:183:lov_init_sub()) } header@ffff8800751719d0
LustreError: 5538:0:(lov_object.c:183:lov_init_sub()) stripe 0 is already owned.
LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) header@ffff880075173388[0x0, 1, [0x200000400:0x68:0x0] hash]{
LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) ....vvp@ffff880075173420(- 0 0) inode: ffff88007ad1f678 144115205255725160/33554436 100644 1 1 ffff880075173420 [0x200000400:0x68:0x0]
LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) ....lov@ffff880075172430stripes: 1, valid, lsm{ffff88006e9419c0 0x0BD10BD0 1 1 0}:
LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) header@ffff8800751719d0[0x0, 3, [0x100010000:0x29:0x0] hash]{
LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) ....lovsub@ffff880075171a68[0]
LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) ....osc@ffff8800751708c8id: 41 gr: 0 idx: 1 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048
LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) } header@ffff8800751719d0
LustreError: 5538:0:(lov_object.c:184:lov_init_sub())
LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) } header@ffff880075173388
LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) owned.
LustreError: 5538:0:(lov_object.c:185:lov_init_sub()) header@ffff88007a5b7ee0[0x0, 1, [0xc8:0xc6828602:0x0]]
LustreError: 5538:0:(lov_object.c:185:lov_init_sub()) try to own.
LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) owned.
LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) header@ffff88004f4f2ac0[0x0, 1, [0x200000400:0x69:0x0]]
LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) try to own.
Lustre: DEBUG MARKER: lfsck : @@@@@@ FAIL: remove sub-test dirs failed

Which seems related to LU-2765. Jinshan, any idea on this failure? Thanks.

Comment by Niu Yawei (Inactive) [ 06/Mar/13 ]

Seems above error message is caused by duplicated files referencing same object in test dir, it's not related to LU-2765.

Comment by Niu Yawei (Inactive) [ 06/Mar/13 ]

Remove the test directory before cleanup master, otherwise, the duplicated files (with same object) could cause trouble when removing directory: http://review.whamcloud.com/5606

Comment by Peter Jones [ 12/Mar/13 ]

Landed for 2.4

Generated at Sat Feb 10 01:27:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.