Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0, Lustre 2.1.5
    • Lustre 2.4.0
    • 3
    • 6283

    Description

      Looks the old lfsck in e2fsprogs hasn't been actively maintained for quite some time, and it can't work with the latest lustre now.

      One obvious defect is: lustre has changed the objects directory on OST after FID-on-OST landing, the lfsck is still searching objects under old O/0.

      Attachments

        Issue Links

          Activity

            [LU-2694] lfsck in e2fsprogs is out of date
            pjones Peter Jones added a comment -

            Landed for 2.4

            pjones Peter Jones added a comment - Landed for 2.4

            Remove the test directory before cleanup master, otherwise, the duplicated files (with same object) could cause trouble when removing directory: http://review.whamcloud.com/5606

            niu Niu Yawei (Inactive) added a comment - Remove the test directory before cleanup master, otherwise, the duplicated files (with same object) could cause trouble when removing directory: http://review.whamcloud.com/5606

            Seems above error message is caused by duplicated files referencing same object in test dir, it's not related to LU-2765.

            niu Niu Yawei (Inactive) added a comment - Seems above error message is caused by duplicated files referencing same object in test dir, it's not related to LU-2765 .

            The LU-2775 (2adc20013d9c2a5969a3154b0ca93ac007b1a4e2) fixed the compatibility problem, so we don't need to fix e2fsprogs anymore.

            However, I still occasionally hit failure when cleanup lustre after running lfsck:

            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) header@ffff880078391ec0[0x0, 3, [0x100000000:0x26:0x0] hash]{
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....lovsub@ffff880078391f58[0]
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....osc@ffff880078390e28id: 38 gr: 0 idx: 0 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) } header@ffff880078391ec0
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub())
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) } header@ffff880075173070
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) owned.
            LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) header@ffff8800773987a8[0x0, 1, [0x6c:0xc68285fb:0x0]]
            LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) try to own.
            LustreError: 5539:0:(lcommon_cl.c:1211:cl_file_inode_init()) Failure to initialize cl object [0x6c:0xc68285fb:0x0]: -5
            LustreError: 5539:0:(llite_lib.c:2161:ll_prep_inode()) new_inode -fatal: rc -5
            LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) header@ffff880078391920[0x0, 3, [0x100010000:0x28:0x0] hash]{
            LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) ....lovsub@ffff8800783919b8[0]
            LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) ....osc@ffff880078390768id: 40 gr: 0 idx: 1 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048
            LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) } header@ffff880078391920
            LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) stripe 0 is already owned.
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) header@ffff8800773987a8[0x0, 1, [0xc6:0xc6828600:0x0] hash]{
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....vvp@ffff880077398840(- 0 0) inode: ffff8800675d0ab8 198/3330442752 100644 1 1 ffff880077398840 [0xc6:0xc6828600:0x0]
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....lov@ffff880074430830stripes: 1, valid, lsm{ffff880065b44240 0x0BD10BD0 1 1 0}:
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) header@ffff880078391920[0x0, 3, [0x100010000:0x28:0x0] hash]{
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....lovsub@ffff8800783919b8[0]
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....osc@ffff880078390768id: 40 gr: 0 idx: 1 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) } header@ffff880078391920
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub())
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) } header@ffff8800773987a8
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) owned.
            LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) header@ffff88004f4f2dd8[0x0, 1, [0x200000400:0x66:0x0]]
            LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) try to own.
            LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) header@ffff88004f4a0470[0x0, 3, [0x100000000:0x2a:0x0] hash]{
            LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) ....lovsub@ffff88004f4a0508[0]
            LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) ....osc@ffff880049da2d08id: 42 gr: 0 idx: 0 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048
            LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) } header@ffff88004f4a0470
            LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) stripe 0 is already owned.
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) header@ffff88004f4f2cd0[0x0, 1, [0xc9:0xc6828603:0x0] hash]{
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....vvp@ffff88004f4f2d68(- 0 0) inode: ffff88007286cb78 201/3330442755 100644 1 1 ffff88004f4f2d68 [0xc9:0xc6828603:0x0]
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....lov@ffff88007286de80stripes: 1, valid, lsm{ffff88006f83cbc0 0x0BD10BD0 1 1 0}:
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) header@ffff88004f4a0470[0x0, 3, [0x100000000:0x2a:0x0] hash]{
            LustreError: 5538:0:(lov_object.c:183:lov_init_sub()) header@ffff8800751719d0[0x0, 3, [0x100010000:0x29:0x0] hash]{
            LustreError: 5538:0:(lov_object.c:183:lov_init_sub()) ....lovsub@ffff880075171a68[0]
            LustreError: 5538:0:(lov_object.c:183:lov_init_sub()) ....osc@ffff8800751708c8id: 41 gr: 0 idx: 1 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048
            LustreError: 5538:0:(lov_object.c:183:lov_init_sub()) } header@ffff8800751719d0
            LustreError: 5538:0:(lov_object.c:183:lov_init_sub()) stripe 0 is already owned.
            LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) header@ffff880075173388[0x0, 1, [0x200000400:0x68:0x0] hash]{
            LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) ....vvp@ffff880075173420(- 0 0) inode: ffff88007ad1f678 144115205255725160/33554436 100644 1 1 ffff880075173420 [0x200000400:0x68:0x0]
            LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) ....lov@ffff880075172430stripes: 1, valid, lsm{ffff88006e9419c0 0x0BD10BD0 1 1 0}:
            LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) header@ffff8800751719d0[0x0, 3, [0x100010000:0x29:0x0] hash]{
            LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) ....lovsub@ffff880075171a68[0]
            LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) ....osc@ffff8800751708c8id: 41 gr: 0 idx: 1 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048
            LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) } header@ffff8800751719d0
            LustreError: 5538:0:(lov_object.c:184:lov_init_sub())
            LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) } header@ffff880075173388
            LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) owned.
            LustreError: 5538:0:(lov_object.c:185:lov_init_sub()) header@ffff88007a5b7ee0[0x0, 1, [0xc8:0xc6828602:0x0]]
            LustreError: 5538:0:(lov_object.c:185:lov_init_sub()) try to own.
            LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) owned.
            LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) header@ffff88004f4f2ac0[0x0, 1, [0x200000400:0x69:0x0]]
            LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) try to own.
            Lustre: DEBUG MARKER: lfsck : @@@@@@ FAIL: remove sub-test dirs failed
            

            Which seems related to LU-2765. Jinshan, any idea on this failure? Thanks.

            niu Niu Yawei (Inactive) added a comment - The LU-2775 (2adc20013d9c2a5969a3154b0ca93ac007b1a4e2) fixed the compatibility problem, so we don't need to fix e2fsprogs anymore. However, I still occasionally hit failure when cleanup lustre after running lfsck: LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) header@ffff880078391ec0[0x0, 3, [0x100000000:0x26:0x0] hash]{ LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....lovsub@ffff880078391f58[0] LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....osc@ffff880078390e28id: 38 gr: 0 idx: 0 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048 LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) } header@ffff880078391ec0 LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) } header@ffff880075173070 LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) owned. LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) header@ffff8800773987a8[0x0, 1, [0x6c:0xc68285fb:0x0]] LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) try to own. LustreError: 5539:0:(lcommon_cl.c:1211:cl_file_inode_init()) Failure to initialize cl object [0x6c:0xc68285fb:0x0]: -5 LustreError: 5539:0:(llite_lib.c:2161:ll_prep_inode()) new_inode -fatal: rc -5 LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) header@ffff880078391920[0x0, 3, [0x100010000:0x28:0x0] hash]{ LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) ....lovsub@ffff8800783919b8[0] LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) ....osc@ffff880078390768id: 40 gr: 0 idx: 1 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048 LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) } header@ffff880078391920 LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) stripe 0 is already owned. LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) header@ffff8800773987a8[0x0, 1, [0xc6:0xc6828600:0x0] hash]{ LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....vvp@ffff880077398840(- 0 0) inode: ffff8800675d0ab8 198/3330442752 100644 1 1 ffff880077398840 [0xc6:0xc6828600:0x0] LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....lov@ffff880074430830stripes: 1, valid, lsm{ffff880065b44240 0x0BD10BD0 1 1 0}: LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) header@ffff880078391920[0x0, 3, [0x100010000:0x28:0x0] hash]{ LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....lovsub@ffff8800783919b8[0] LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....osc@ffff880078390768id: 40 gr: 0 idx: 1 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048 LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) } header@ffff880078391920 LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) } header@ffff8800773987a8 LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) owned. LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) header@ffff88004f4f2dd8[0x0, 1, [0x200000400:0x66:0x0]] LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) try to own. LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) header@ffff88004f4a0470[0x0, 3, [0x100000000:0x2a:0x0] hash]{ LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) ....lovsub@ffff88004f4a0508[0] LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) ....osc@ffff880049da2d08id: 42 gr: 0 idx: 0 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048 LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) } header@ffff88004f4a0470 LustreError: 5539:0:(lov_object.c:183:lov_init_sub()) stripe 0 is already owned. LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) header@ffff88004f4f2cd0[0x0, 1, [0xc9:0xc6828603:0x0] hash]{ LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....vvp@ffff88004f4f2d68(- 0 0) inode: ffff88007286cb78 201/3330442755 100644 1 1 ffff88004f4f2d68 [0xc9:0xc6828603:0x0] LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) ....lov@ffff88007286de80stripes: 1, valid, lsm{ffff88006f83cbc0 0x0BD10BD0 1 1 0}: LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) header@ffff88004f4a0470[0x0, 3, [0x100000000:0x2a:0x0] hash]{ LustreError: 5538:0:(lov_object.c:183:lov_init_sub()) header@ffff8800751719d0[0x0, 3, [0x100010000:0x29:0x0] hash]{ LustreError: 5538:0:(lov_object.c:183:lov_init_sub()) ....lovsub@ffff880075171a68[0] LustreError: 5538:0:(lov_object.c:183:lov_init_sub()) ....osc@ffff8800751708c8id: 41 gr: 0 idx: 1 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048 LustreError: 5538:0:(lov_object.c:183:lov_init_sub()) } header@ffff8800751719d0 LustreError: 5538:0:(lov_object.c:183:lov_init_sub()) stripe 0 is already owned. LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) header@ffff880075173388[0x0, 1, [0x200000400:0x68:0x0] hash]{ LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) ....vvp@ffff880075173420(- 0 0) inode: ffff88007ad1f678 144115205255725160/33554436 100644 1 1 ffff880075173420 [0x200000400:0x68:0x0] LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) ....lov@ffff880075172430stripes: 1, valid, lsm{ffff88006e9419c0 0x0BD10BD0 1 1 0}: LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) header@ffff8800751719d0[0x0, 3, [0x100010000:0x29:0x0] hash]{ LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) ....lovsub@ffff880075171a68[0] LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) ....osc@ffff8800751708c8id: 41 gr: 0 idx: 1 gen: 0 kms_valid: 1 kms 1048576 rc: 0 force_sync: 0 min_xid: 0 size: 1048576 mtime: 1362472487 atime: 0 ctime: 1362472487 blocks: 2048 LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) } header@ffff8800751719d0 LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) } header@ffff880075173388 LustreError: 5538:0:(lov_object.c:184:lov_init_sub()) owned. LustreError: 5538:0:(lov_object.c:185:lov_init_sub()) header@ffff88007a5b7ee0[0x0, 1, [0xc8:0xc6828602:0x0]] LustreError: 5538:0:(lov_object.c:185:lov_init_sub()) try to own. LustreError: 5539:0:(lov_object.c:184:lov_init_sub()) owned. LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) header@ffff88004f4f2ac0[0x0, 1, [0x200000400:0x69:0x0]] LustreError: 5539:0:(lov_object.c:185:lov_init_sub()) try to own. Lustre: DEBUG MARKER: lfsck : @@@@@@ FAIL: remove sub-test dirs failed Which seems related to LU-2765 . Jinshan, any idea on this failure? Thanks.

            first, we'd fix the is_empty_fs in t-f, which can cause unexpected error for lfsck.sh: http://review.whamcloud.com/5576

            niu Niu Yawei (Inactive) added a comment - first, we'd fix the is_empty_fs in t-f, which can cause unexpected error for lfsck.sh: http://review.whamcloud.com/5576
            pjones Peter Jones added a comment -

            Niu is going to look into this

            pjones Peter Jones added a comment - Niu is going to look into this
            yujian Jian Yu added a comment -

            Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/176
            Lustre master server build: http://build.whamcloud.com/job/lustre-master/1269
            Distro/Arch: RHEL6.3/x86_64

            lfsck also failed with the same issue: https://maloo.whamcloud.com/test_sets/6a4f7e3e-7d78-11e2-85d0-52540035b04c

            yujian Jian Yu added a comment - Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/176 Lustre master server build: http://build.whamcloud.com/job/lustre-master/1269 Distro/Arch: RHEL6.3/x86_64 lfsck also failed with the same issue: https://maloo.whamcloud.com/test_sets/6a4f7e3e-7d78-11e2-85d0-52540035b04c
            niu Niu Yawei (Inactive) added a comment - This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/baa5681c-6c5c-11e2-91d6-52540035b04c .

            People

              niu Niu Yawei (Inactive)
              niu Niu Yawei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: