Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3180

Test failure on test suite lfsck: Failed to find fid

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.1, Lustre 2.5.0
    • Lustre 2.4.0, Lustre 2.5.0
    • server and client: tag-2.3.64 build #1411
    • 3
    • 7749

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/ff811592-a66a-11e2-90ad-52540035b04c.

      03:53:10:Memory used: 2436k/21180k (745k/1692k), time:  0.20/ 0.07/ 0.02
      03:53:10:I/O read: 10MB, write: 0MB, rate: 50.10MB/s
      03:53:10:CMD: client-19vm1.lab.whamcloud.com PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: NAME=autotest_config sh rpc.sh _check_progs_installed lfsck 
      03:53:10:CMD: client-19vm1.lab.whamcloud.com PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: NAME=autotest_config sh rpc.sh is_mounted /mnt/lustre 
      03:53:10:lfsck -c -l --mdsdb /home/autotest/.autotest/shared_dir/2013-04-14/224508-70192991849440/mdsdb --ostdb /home/autotest/.autotest/shared_dir/2013-04-14/224508-70192991849440/ostdb-0 /home/autotest/.autotest/shared_dir/2013-04-14/224508-70192991849440/ostdb-1 /home/autotest/.autotest/shared_dir/2013-04-14/224508-70192991849440/ostdb-2 /home/autotest/.autotest/shared_dir/2013-04-14/224508-70192991849440/ostdb-3 /home/autotest/.autotest/shared_dir/2013-04-14/224508-70192991849440/ostdb-4 /home/autotest/.autotest/shared_dir/2013-04-14/224508-70192991849440/ostdb-5 /home/autotest/.autotest/shared_dir/2013-04-14/224508-70192991849440/ostdb-6 /mnt/lustre
      03:53:11:CMD: client-19vm1.lab.whamcloud.com lfsck -c -l --mdsdb /home/autotest/.autotest/shared_dir/2013-04-14/224508-70192991849440/mdsdb --ostdb /home/autotest/.autotest/shared_dir/2013-04-14/224508-70192991849440/ostdb-0 /home/autotest/.autotest/shared_dir/2013-04-14/224508-70192991849440/ostdb-1 /home/autotest/.autotest/shared_dir/2013-04-14/224508-70192991849440/ostdb-2 /home/autotest/.autotest/shared_dir/2013-04-14/224508-70192991849440/ostdb-3 /home/autotest/.autotest/shared_dir/2013-04-14/224508-70192991849440/ostdb-4 /home/autotest/.autotest/shared_dir/2013-04-14/224508-70192991849440/ostdb-5 /home/autotest/.autotest/shared_dir/2013-04-14/224508-70192991849440/ostdb-6 /mnt/lustre
      03:53:11:lfsck 1.42.6.wc2 (10-Dec-2012)
      03:53:11:lfsck: ost_idx 0: pass1: check for duplicate objects
      03:53:11:lfsck: ost_idx 0: pass1 OK (12 files total)
      03:53:11:lfsck: ost_idx 0: pass2: check for missing inode objects
      03:53:11:Failed to find fid [0x2000013a1:0xda11:0x0]: DB_NOTFOUND: No matching key/data pair found
      03:53:11:Failed to find fid [0x2000013a1:0xda15:0x0]: DB_NOTFOUND: No matching key/data pair found
      03:53:11:Failed to find fid [0x2000013a1:0xda16:0x0]: DB_NOTFOUND: No matching key/data pair found
      03:53:11:Failed to find fid [0x2000013a1:0xda13:0x0]: DB_NOTFOUND: No matching key/data pair found
      03:53:11:Failed to find fid [0x2000013a1:0xda12:0x0]: DB_NOTFOUND: No matching key/data pair found
      03:53:11:Failed to find fid [0x2000013a1:0xda14:0x0]: DB_NOTFOUND: No matching key/data pair found
      03:53:11:Failed to find fid [0x2000013a1:0xda17:0x0]: DB_NOTFOUND: No matching key/data pair found
      03:53:11:Failed to find fid [0x2000013a1:0xda18:0x0]: DB_NOTFOUND: No matching key/data pair found
      03:53:11:Failed to find fid [0x2000013a1:0xda19:0x0]: DB_NOTFOUND: No matching key/data pair found
      03:53:11:Failed to find fid [0x2000013a1:0xda1b:0x0]: DB_NOTFOUND: No matching key/data pair found
      03:53:11:Failed to find fid [0x2000013a1:0xda1a:0x0]: DB_NOTFOUND: No matching key/data pair found
      03:53:11:Failed to find fid [0x2000013a1:0xda1c:0x0]: DB_NOTFOUND: No matching key/data pair found
      

      Attachments

        Issue Links

          Activity

            [LU-3180] Test failure on test suite lfsck: Failed to find fid

            patch landed on b2_4 & master.

            niu Niu Yawei (Inactive) added a comment - patch landed on b2_4 & master.

            I created LU-3838 for the two remaining issues, this ticket can be closed.

            niu Niu Yawei (Inactive) added a comment - I created LU-3838 for the two remaining issues, this ticket can be closed.

            I created LU-3837 for the old lfsck on DNE problem. I think we'd keep this ticket open (the "Failed to find fid" problem), but lower the priority, because lfsck test won't fail for this error message.

            niu Niu Yawei (Inactive) added a comment - I created LU-3837 for the old lfsck on DNE problem. I think we'd keep this ticket open (the "Failed to find fid" problem), but lower the priority, because lfsck test won't fail for this error message.
            pjones Peter Jones added a comment -

            I certainly think that it makes sense to create new LU tickets for the remaining issues and consider the priority of those separately

            pjones Peter Jones added a comment - I certainly think that it makes sense to create new LU tickets for the remaining issues and consider the priority of those separately

            Hi, Andreas

            I don't have any idea on these two problems ("Fail to find FID" & "always return 1 on second lfsck run") so far, to fix them, I think I probably need to read most of the lfsck code, that's not a small task.

            Given that it'll be replaced with new LFSCK soon, and no customer complained about these two problems, I tend to think we'd leave them behind. Maybe we need only to fix the problem on DNE mentioned by you?

            niu Niu Yawei (Inactive) added a comment - Hi, Andreas I don't have any idea on these two problems ("Fail to find FID" & "always return 1 on second lfsck run") so far, to fix them, I think I probably need to read most of the lfsck code, that's not a small task. Given that it'll be replaced with new LFSCK soon, and no customer complained about these two problems, I tend to think we'd leave them behind. Maybe we need only to fix the problem on DNE mentioned by you?
            adilger Andreas Dilger added a comment - - edited

            Niu, how much effort do you think it is to fix these problems? It concerns me that we need to spend time to fix the old lfsck, when the new LFSCK is going to replace it soon. Also, if this has been broken since 2.1, I don't think users could be depending on it very heavily.

            Finally, I'm also concerned that if the old lfsck is run on a DNE filesystem with multiple MDTs and/or OSTs with FID-on-OST enabled, it is going to do completely the wrong thing, possibly deleting a large number of "unused" objects that are not referenced by MDT0000.

            At a minimum, a check should be added to old lfsck to refuse to run if it finds signs of DNE (e.g. O/seq, seq > 2) on the OSTs.

            adilger Andreas Dilger added a comment - - edited Niu, how much effort do you think it is to fix these problems? It concerns me that we need to spend time to fix the old lfsck, when the new LFSCK is going to replace it soon. Also, if this has been broken since 2.1, I don't think users could be depending on it very heavily. Finally, I'm also concerned that if the old lfsck is run on a DNE filesystem with multiple MDTs and/or OSTs with FID-on-OST enabled, it is going to do completely the wrong thing, possibly deleting a large number of "unused" objects that are not referenced by MDT0000. At a minimum, a check should be added to old lfsck to refuse to run if it finds signs of DNE (e.g. O/seq, seq > 2) on the OSTs.

            People

              niu Niu Yawei (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: