Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3696

sanity test_17m, test_17n: e2fsck unattached inodes failure

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.6.0, Lustre 2.5.4
    • Lustre 2.5.0, Lustre 2.6.0
    • 3
    • 9541

    Description

      I'm trying to find what test/environment/circumstances fills an OST during autotest. I ran sanity three times in a row on Toro; https://maloo.whamcloud.com/test_sessions/90d23e6c-fbe4-11e2-aaad-52540035b04c . I didn't hit the full OST problem, but I did run into sanity test 17m failures.

      On the second and, not surprisingly, third run of sanity, test 17m failed with:
      sanity test_17m: @@@@@@ FAIL: e2fsck should not report error upon short/long symlink MDT: rc=4

      This first, successful, run of test 17m has the following output:

      01:55:18:stop and checking mds1: e2fsck -fnvd /dev/lvm-MDS/P1
      01:55:18:CMD: client-24vm3 grep -c /mnt/mds1' ' /proc/mounts
      01:55:18:Stopping /mnt/mds1 (opts:-f) on client-24vm3
      01:55:18:CMD: client-24vm3 umount -d -f /mnt/mds1
      01:55:18:CMD: client-24vm3 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      01:55:18:CMD: client-24vm3 e2fsck -fnvd /dev/lvm-MDS/P1
      01:55:18:client-24vm3: e2fsck 1.42.7.wc1 (12-Apr-2013)
      01:55:18:Pass 1: Checking inodes, blocks, and sizes
      01:55:18:Pass 2: Checking directory structure
      01:55:18:Pass 3: Checking directory connectivity
      01:55:18:Pass 4: Checking reference counts
      01:55:18:Pass 5: Checking group summary information
      01:55:18:
      01:55:18:        1324 inodes used (0.13%, out of 1048576)
      01:55:18:           7 non-contiguous files (0.5%)
      01:55:18:           1 non-contiguous directory (0.1%)
      01:55:18:             # of inodes with ind/dind/tind blocks: 2/0/0
      01:55:18:      154573 blocks used (29.48%, out of 524288)
      01:55:18:           0 bad blocks
      01:55:18:           1 large file
      01:55:18:
      01:55:18:         127 regular files
      01:55:18:         137 directories
      01:55:18:           0 character device files
      01:55:18:           0 block device files
      01:55:18:           0 fifos
      01:55:18:           0 links
      01:55:18:        1051 symbolic links (526 fast symbolic links)
      01:55:18:           0 sockets
      01:55:18:------------
      01:55:18:        1315 files
      

      This second run of test 17m has the following output:

      == sanity test 17m: run e2fsck against MDT which contains short/long symlink == 04:23:23 (1375442603)
      CMD: client-24vm3 /usr/sbin/lctl get_param -n version
      CMD: client-24vm3 /usr/sbin/lctl get_param -n version
      create 512 short and long symlink files under /mnt/lustre/d0.sanity/d17m
      erase them
      Waiting for local destroys to complete
      recreate the 512 symlink files with a shorter string
      stop and checking mds1: e2fsck -fnvd /dev/lvm-MDS/P1
      CMD: client-24vm3 grep -c /mnt/mds1' ' /proc/mounts
      Stopping /mnt/mds1 (opts:-f) on client-24vm3
      CMD: client-24vm3 umount -d -f /mnt/mds1
      CMD: client-24vm3 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      CMD: client-24vm3 e2fsck -fnvd /dev/lvm-MDS/P1
      client-24vm3: e2fsck 1.42.7.wc1 (12-Apr-2013)
      client-24vm3: e2fsck_pass1:1500: increase inode 32773 badness 0 to 2
      Pass 1: Checking inodes, blocks, and sizes
      Pass 2: Checking directory structure
      Pass 3: Checking directory connectivity
      Pass 4: Checking reference counts
      Unattached inode 635
      Connect to /lost+found? no
      
      Unattached inode 636
      Connect to /lost+found? no
      
      Unattached inode 638
      Connect to /lost+found? no
      
      Unattached inode 639
      Connect to /lost+found? no
      
      Unattached inode 641
      Connect to /lost+found? no
      
      Unattached inode 645
      Connect to /lost+found? no
      
      Unattached inode 1841
      Connect to /lost+found? no
      
      Unattached inode 1842
      Connect to /lost+found? no
      
      Unattached inode 1843
      Connect to /lost+found? no
      
      Unattached inode 1844
      Connect to /lost+found? no
      
      Unattached inode 1845
      Connect to /lost+found? no
      
      Unattached inode 1846
      Connect to /lost+found? no
      
      Unattached inode 1847
      Connect to /lost+found? no
      
      Unattached inode 1848
      Connect to /lost+found? no
      
      Unattached inode 1849
      Connect to /lost+found? no
      
      Unattached inode 1850
      Connect to /lost+found? no
      
      Unattached inode 1851
      Connect to /lost+found? no
      
      Unattached inode 1852
      Connect to /lost+found? no
      
      Unattached inode 1855
      Connect to /lost+found? no
      
      Unattached inode 1894
      Connect to /lost+found? no
      
      Unattached inode 1895
      Connect to /lost+found? no
      
      Unattached inode 1896
      Connect to /lost+found? no
      
      Unattached inode 1897
      Connect to /lost+found? no
      
      Unattached inode 1898
      Connect to /lost+found? no
      
      Unattached inode 1899
      Connect to /lost+found? no
      
      Unattached inode 1900
      Connect to /lost+found? no
      
      Unattached inode 1901
      Connect to /lost+found? no
      
      Unattached inode 1902
      Connect to /lost+found? no
      
      Unattached inode 1903
      Connect to /lost+found? no
      
      Unattached inode 1904
      Connect to /lost+found? no
      
      Unattached inode 1905
      Connect to /lost+found? no
      
      Unattached inode 1908
      Connect to /lost+found? no
      
      Pass 5: Checking group summary information
      
      lustre-MDT0000: ********** WARNING: Filesystem still has errors **********
      
      
              1396 inodes used (0.13%, out of 1048576)
                40 non-contiguous files (2.9%)
                 2 non-contiguous directories (0.1%)
                   # of inodes with ind/dind/tind blocks: 18/2/0
            158490 blocks used (30.23%, out of 524288)
                 0 bad blocks
                 1 large file
      
               197 regular files
               139 directories
                 0 character device files
                 0 block device files
                 0 fifos
                 0 links
              1051 symbolic links (526 fast symbolic links)
                 0 sockets
      ------------
              1355 files
      

      Because the same test names were run in one test session, if looks like Maloo is confusing the output of one run with another and is a little confusing when looking at the logs. The time stamps also seem to be in the future of when the results were reported. Hopefully, I'm just misreading the logs and time stamps.

      Attachments

        Issue Links

          Activity

            [LU-3696] sanity test_17m, test_17n: e2fsck unattached inodes failure

            What's common to all of these tests is that they create volatile files. This is related to or is a case of LU-4140.

            jamesanunez James Nunez (Inactive) added a comment - What's common to all of these tests is that they create volatile files. This is related to or is a case of LU-4140 .

            For sanity test 185 and 187b, the calls to multiop with the V option, open a volatile file, are causing the remaining unattached inodes.

            jamesanunez James Nunez (Inactive) added a comment - For sanity test 185 and 187b, the calls to multiop with the V option, open a volatile file, are causing the remaining unattached inodes.

            For sanity test 56x, I've confirmed that the migrate, call to "lfs migrate", from a two stripe file to a single stripe file creates one unattached inode. I suspect this is the same for test 56w since it calls lfs migrate on a file and directory.

            Also, migrating from a single striped file to a file with two strips creates an unattached inode also.

            jamesanunez James Nunez (Inactive) added a comment - For sanity test 56x, I've confirmed that the migrate, call to "lfs migrate", from a two stripe file to a single stripe file creates one unattached inode. I suspect this is the same for test 56w since it calls lfs migrate on a file and directory. Also, migrating from a single striped file to a file with two strips creates an unattached inode also.

            I started with a fresh file system and ran sanity. After sanity completed, there were 16 unattached inodes found by e2fsck. The offending tests are 56w, 56x, 185, and 187b.

            jamesanunez James Nunez (Inactive) added a comment - I started with a fresh file system and ran sanity. After sanity completed, there were 16 unattached inodes found by e2fsck. The offending tests are 56w, 56x, 185, and 187b.

            If you add something like:

            lctl get_param seq.*.fid
            

            after each test in run_one() you will know which test generated the FIDs for the orphan files. Just comparing the FID sequence numbers, the test_17m sequence is (0x2000061c0 - 0x2000000400) = 24000 higher than the starting sequence, while the orphan sequence is (0x200001b71 - 0x2000000400) = 6001 higher, and by linear extrapolation this puts it 25% of the way to test 17m, which is around test_5 (or test_4 which is the likely candidate).

            It is a bit confusing that the client would be getting a new sequence so often, since that should only happen every 128000 files or if the client is remounted, which I don't think should be the case? In any case, you could also try running:

            ONLY=4 sh sanity.sh
            

            on an already-mounted filesystem and then run e2fsck -f on the MDS to check if this test is causing the leak.

            adilger Andreas Dilger added a comment - If you add something like: lctl get_param seq.*.fid after each test in run_one() you will know which test generated the FIDs for the orphan files. Just comparing the FID sequence numbers, the test_17m sequence is (0x2000061c0 - 0x2000000400) = 24000 higher than the starting sequence, while the orphan sequence is (0x200001b71 - 0x2000000400) = 6001 higher, and by linear extrapolation this puts it 25% of the way to test 17m, which is around test_5 (or test_4 which is the likely candidate). It is a bit confusing that the client would be getting a new sequence so often, since that should only happen every 128000 files or if the client is remounted, which I don't think should be the case? In any case, you could also try running: ONLY=4 sh sanity.sh on an already-mounted filesystem and then run e2fsck -f on the MDS to check if this test is causing the leak.

            I looked at all the files in the Lustre file system and, on the client, got the FID for each file:

            # lfs path2fid /lustre/lscratch
            [0x200000007:0x1:0x0]
            
            # lfs path2fid /lustre/lscratch/d17m.sanitym
            [0x2000061c0:0xc09:0x0]
            
            # lfs path2fid /lustre/lscratch/d17m.sanitym/short-*
            /lustre/lscratch/d17m.sanitym/short-0: [0x2000061c0:0xe0b:0x0]
            /lustre/lscratch/d17m.sanitym/short-1: [0x2000061c0:0xe0d:0x0]
            /lustre/lscratch/d17m.sanitym/short-2: [0x2000061c0:0xe0f:0x0]
            …
            /lustre/lscratch/d17m.sanitym/short-510: [0x2000061c0:0x1207:0x0]
            /lustre/lscratch/d17m.sanitym/short-511: [0x2000061c0:0x1209:0x0]
            
            # lfs path2fid /lustre/lscratch/d17m.sanitym/long-* 
            /lustre/lscratch/d17m.sanitym/long-0: [0x2000061c0:0xe0a:0x0]
            /lustre/lscratch/d17m.sanitym/long-1: [0x2000061c0:0xe0c:0x0]
            /lustre/lscratch/d17m.sanitym/long-2: [0x2000061c0:0xe0e:0x0]
            …
            /lustre/lscratch/d17m.sanitym/long-510: [0x2000061c0:0x1206:0x0]
            /lustre/lscratch/d17m.sanitym/long-511: [0x2000061c0:0x1208:0x0]
            

            Those are all the files and directories in the file system.

            So the FIDs for the Unattached inodes found by running e2fsck do not belong to files in the file system. Is this the correct interpretation? If so, maybe those inodes are left over from the first run of sanity that didn't get cleaned up?

            jamesanunez James Nunez (Inactive) added a comment - I looked at all the files in the Lustre file system and, on the client, got the FID for each file: # lfs path2fid /lustre/lscratch [0x200000007:0x1:0x0] # lfs path2fid /lustre/lscratch/d17m.sanitym [0x2000061c0:0xc09:0x0] # lfs path2fid /lustre/lscratch/d17m.sanitym/short-* /lustre/lscratch/d17m.sanitym/short-0: [0x2000061c0:0xe0b:0x0] /lustre/lscratch/d17m.sanitym/short-1: [0x2000061c0:0xe0d:0x0] /lustre/lscratch/d17m.sanitym/short-2: [0x2000061c0:0xe0f:0x0] … /lustre/lscratch/d17m.sanitym/short-510: [0x2000061c0:0x1207:0x0] /lustre/lscratch/d17m.sanitym/short-511: [0x2000061c0:0x1209:0x0] # lfs path2fid /lustre/lscratch/d17m.sanitym/long-* /lustre/lscratch/d17m.sanitym/long-0: [0x2000061c0:0xe0a:0x0] /lustre/lscratch/d17m.sanitym/long-1: [0x2000061c0:0xe0c:0x0] /lustre/lscratch/d17m.sanitym/long-2: [0x2000061c0:0xe0e:0x0] … /lustre/lscratch/d17m.sanitym/long-510: [0x2000061c0:0x1206:0x0] /lustre/lscratch/d17m.sanitym/long-511: [0x2000061c0:0x1208:0x0] Those are all the files and directories in the file system. So the FIDs for the Unattached inodes found by running e2fsck do not belong to files in the file system. Is this the correct interpretation? If so, maybe those inodes are left over from the first run of sanity that didn't get cleaned up?
            jamesanunez James Nunez (Inactive) added a comment - - edited

            Running the sanity suite once and then test 17m, 17m fails with "e2fsck should not report error upon short/long symlink MDT: rc=4 ". The output from e2fsck starts with:

            # e2fsck -fnvd /dev/sda3
            e2fsck 1.42.7.wc2 (07-Nov-2013)
            device /dev/sda3 mounted by lustre per /proc/fs/lustre/mgs/MGS/mntdev
            Warning!  /dev/sda3 is mounted.
            Warning: skipping journal recovery because doing a read-only filesystem check.
            Pass 1: Checking inodes, blocks, and sizes
            e2fsck_pass1:1500: increase inode 50043 badness 0 to 2
            Pass 2: Checking directory structure
            Pass 3: Checking directory connectivity
            Pass 4: Checking reference counts
            Unattached inode 605
            Connect to /lost+found? no
            
            Unattached inode 606
            Connect to /lost+found? no
            
            Unattached inode 607
            Connect to /lost+found? no
            
            Unattached inode 608
            Connect to /lost+found? no
            
            Unattached inode 609
            Connect to /lost+found? no
            
            Unattached inode 611
            Connect to /lost+found? no
            
            Unattached inode 2071
            Connect to /lost+found? no
            
            Unattached inode 2072
            Connect to /lost+found? no
            
            Unattached inode 2073
            Connect to /lost+found? no
            

            Following Andreas' comments from 27/Feb/14, I got the FIDs for the inodes that were reported by e2fsck:

            # debugfs -c /dev/sda3 -R "stat <605>"
            debugfs 1.42.7.wc2 (07-Nov-2013)
            /dev/sda3: catastrophic mode - not reading inode or group bitmaps
            Inode: 605   Type: regular    Mode:  0600   Flags: 0x0
            Generation: 728859788    Version: 0x00000008:00006684
            User:     0   Group:     0   Size: 0
            File ACL: 0    Directory ACL: 0
            Links: 1   Blockcount: 0
            Fragment:  Address: 0    Number: 0    Size: 0
             ctime: 0x5329d6db:00000000 -- Wed Mar 19 10:41:47 2014
             atime: 0x5329d6db:00000000 -- Wed Mar 19 10:41:47 2014
             mtime: 0x5329d6db:00000000 -- Wed Mar 19 10:41:47 2014
            crtime: 0x5329d6db:4bb5e868 -- Wed Mar 19 10:41:47 2014
            Size of extra inode fields: 28
            Extended attributes stored in inode body: 
              lma = "00 00 00 00 00 00 00 00 71 1b 00 00 02 00 00 00 c5 1f 00 00 00 00 00 00
             " (24)
              lma: fid=[0x200001b71:0x1fc5:0x0] compat=0 incompat=0
              lov = "d0 0b d1 0b 01 00 00 00 c5 1f 00 00 00 00 00 00 71 1b 00 00 02 00 00 00
             00 00 10 00 01 00 00 00 6b 56 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0
            0 00 00 00 00 00 " (56)
            BLOCKS:
            
            # debugfs -c /dev/sda3 -R "stat <606>"
            debugfs 1.42.7.wc2 (07-Nov-2013)
            /dev/sda3: catastrophic mode - not reading inode or group bitmaps
            Inode: 606   Type: regular    Mode:  0600   Flags: 0x0
            Generation: 728859789    Version: 0x00000008:00006688
            User:     0   Group:     0   Size: 0
            File ACL: 0    Directory ACL: 0
            Links: 1   Blockcount: 0
            Fragment:  Address: 0    Number: 0    Size: 0
             ctime: 0x5329d6db:00000000 -- Wed Mar 19 10:41:47 2014
             atime: 0x5329d6db:00000000 -- Wed Mar 19 10:41:47 2014
             mtime: 0x5329d6db:00000000 -- Wed Mar 19 10:41:47 2014
            crtime: 0x5329d6db:4e554a00 -- Wed Mar 19 10:41:47 2014
            Size of extra inode fields: 28
            Extended attributes stored in inode body: 
              lma = "00 00 00 00 00 00 00 00 71 1b 00 00 02 00 00 00 c6 1f 00 00 00 00 00 00
             " (24)
              lma: fid=[0x200001b71:0x1fc6:0x0] compat=0 incompat=0
              lov = "d0 0b d1 0b 01 00 00 00 c6 1f 00 00 00 00 00 00 71 1b 00 00 02 00 00 00
             00 00 10 00 01 00 00 00 34 4e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0
            0 00 01 00 00 00 " (56)
            BLOCKS:
            
            ...
            

            Note that the Type is "regular". Then tried to get the file name using "lfs fid2path":

            # lfs fid2path /lustre/lscratch [0x200001b71:0x1fc6:0x0]
            fid2path: error on FID [0x200001b71:0x1fc6:0x0]: No such file or directory
            # lfs fid2path /lustre/lscratch 0x200001b71:0x1fc5:0x0
            fid2path: error on FID 0x200001b71:0x1fc6:0x0: No such file or directory
            

            So, there is no file name available? I tried this for the first 10 inodes listed in the e2fsck output and got the same result; "No such file or directory".

            Just to make sure I got the commands right, I used an inode that did not come up on the e2fsck list:

            # debugfs -c /dev/sda3 -R "stat <604>"
            debugfs 1.42.7.wc2 (07-Nov-2013)
            /dev/sda3: catastrophic mode - not reading inode or group bitmaps
            Inode: 604   Type: symlink    Mode:  0777   Flags: 0x0
            Generation: 596733023    Version: 0x00000017:000009fa
            User:     0   Group:     0   Size: 10
            File ACL: 0    Directory ACL: 0
            Links: 1   Blockcount: 0
            Fragment:  Address: 0    Number: 0    Size: 0
             ctime: 0x5329ff53:00000000 -- Wed Mar 19 13:34:27 2014
             atime: 0x5329ff53:00000000 -- Wed Mar 19 13:34:27 2014
             mtime: 0x5329ff53:00000000 -- Wed Mar 19 13:34:27 2014
            crtime: 0x5329ff53:b6e3205c -- Wed Mar 19 13:34:27 2014
            Size of extra inode fields: 28
            Extended attributes stored in inode body: 
              lma = "00 00 00 00 00 00 00 00 c0 61 00 00 02 00 00 00 f5 0f 00 00 00 00 00 00
             " (24)
              lma: fid=[0x2000061c0:0xff5:0x0] compat=0 incompat=0
              link = "df f1 ea 11 01 00 00 00 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0
            0 00 1b 00 00 00 02 00 00 61 c0 00 00 0c 09 00 00 00 00 73 68 6f 72 74 2d 32 34 
            35 " (51)
            Fast_link_dest: 0123456789
            
            # lfs fid2path /lustre/lscratch [0x2000061c0:0xff5:0x0]
            /lustre/lscratch/d17m.sanitym/short-245
            

            Note that the type for the "good" inode is "symlink".

            So, what does it mean that there is no path information for the Unattached inodes?

            jamesanunez James Nunez (Inactive) added a comment - - edited Running the sanity suite once and then test 17m, 17m fails with "e2fsck should not report error upon short/long symlink MDT: rc=4 ". The output from e2fsck starts with: # e2fsck -fnvd /dev/sda3 e2fsck 1.42.7.wc2 (07-Nov-2013) device /dev/sda3 mounted by lustre per /proc/fs/lustre/mgs/MGS/mntdev Warning! /dev/sda3 is mounted. Warning: skipping journal recovery because doing a read-only filesystem check. Pass 1: Checking inodes, blocks, and sizes e2fsck_pass1:1500: increase inode 50043 badness 0 to 2 Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Unattached inode 605 Connect to /lost+found? no Unattached inode 606 Connect to /lost+found? no Unattached inode 607 Connect to /lost+found? no Unattached inode 608 Connect to /lost+found? no Unattached inode 609 Connect to /lost+found? no Unattached inode 611 Connect to /lost+found? no Unattached inode 2071 Connect to /lost+found? no Unattached inode 2072 Connect to /lost+found? no Unattached inode 2073 Connect to /lost+found? no Following Andreas' comments from 27/Feb/14, I got the FIDs for the inodes that were reported by e2fsck: # debugfs -c /dev/sda3 -R "stat <605>" debugfs 1.42.7.wc2 (07-Nov-2013) /dev/sda3: catastrophic mode - not reading inode or group bitmaps Inode: 605 Type: regular Mode: 0600 Flags: 0x0 Generation: 728859788 Version: 0x00000008:00006684 User: 0 Group: 0 Size: 0 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5329d6db:00000000 -- Wed Mar 19 10:41:47 2014 atime: 0x5329d6db:00000000 -- Wed Mar 19 10:41:47 2014 mtime: 0x5329d6db:00000000 -- Wed Mar 19 10:41:47 2014 crtime: 0x5329d6db:4bb5e868 -- Wed Mar 19 10:41:47 2014 Size of extra inode fields: 28 Extended attributes stored in inode body: lma = "00 00 00 00 00 00 00 00 71 1b 00 00 02 00 00 00 c5 1f 00 00 00 00 00 00 " (24) lma: fid=[0x200001b71:0x1fc5:0x0] compat=0 incompat=0 lov = "d0 0b d1 0b 01 00 00 00 c5 1f 00 00 00 00 00 00 71 1b 00 00 02 00 00 00 00 00 10 00 01 00 00 00 6b 56 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 00 00 00 00 00 " (56) BLOCKS: # debugfs -c /dev/sda3 -R "stat <606>" debugfs 1.42.7.wc2 (07-Nov-2013) /dev/sda3: catastrophic mode - not reading inode or group bitmaps Inode: 606 Type: regular Mode: 0600 Flags: 0x0 Generation: 728859789 Version: 0x00000008:00006688 User: 0 Group: 0 Size: 0 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5329d6db:00000000 -- Wed Mar 19 10:41:47 2014 atime: 0x5329d6db:00000000 -- Wed Mar 19 10:41:47 2014 mtime: 0x5329d6db:00000000 -- Wed Mar 19 10:41:47 2014 crtime: 0x5329d6db:4e554a00 -- Wed Mar 19 10:41:47 2014 Size of extra inode fields: 28 Extended attributes stored in inode body: lma = "00 00 00 00 00 00 00 00 71 1b 00 00 02 00 00 00 c6 1f 00 00 00 00 00 00 " (24) lma: fid=[0x200001b71:0x1fc6:0x0] compat=0 incompat=0 lov = "d0 0b d1 0b 01 00 00 00 c6 1f 00 00 00 00 00 00 71 1b 00 00 02 00 00 00 00 00 10 00 01 00 00 00 34 4e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 00 01 00 00 00 " (56) BLOCKS: ... Note that the Type is "regular". Then tried to get the file name using "lfs fid2path": # lfs fid2path /lustre/lscratch [0x200001b71:0x1fc6:0x0] fid2path: error on FID [0x200001b71:0x1fc6:0x0]: No such file or directory # lfs fid2path /lustre/lscratch 0x200001b71:0x1fc5:0x0 fid2path: error on FID 0x200001b71:0x1fc6:0x0: No such file or directory So, there is no file name available? I tried this for the first 10 inodes listed in the e2fsck output and got the same result; "No such file or directory". Just to make sure I got the commands right, I used an inode that did not come up on the e2fsck list: # debugfs -c /dev/sda3 -R "stat <604>" debugfs 1.42.7.wc2 (07-Nov-2013) /dev/sda3: catastrophic mode - not reading inode or group bitmaps Inode: 604 Type: symlink Mode: 0777 Flags: 0x0 Generation: 596733023 Version: 0x00000017:000009fa User: 0 Group: 0 Size: 10 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5329ff53:00000000 -- Wed Mar 19 13:34:27 2014 atime: 0x5329ff53:00000000 -- Wed Mar 19 13:34:27 2014 mtime: 0x5329ff53:00000000 -- Wed Mar 19 13:34:27 2014 crtime: 0x5329ff53:b6e3205c -- Wed Mar 19 13:34:27 2014 Size of extra inode fields: 28 Extended attributes stored in inode body: lma = "00 00 00 00 00 00 00 00 c0 61 00 00 02 00 00 00 f5 0f 00 00 00 00 00 00 " (24) lma: fid=[0x2000061c0:0xff5:0x0] compat=0 incompat=0 link = "df f1 ea 11 01 00 00 00 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 00 1b 00 00 00 02 00 00 61 c0 00 00 0c 09 00 00 00 00 73 68 6f 72 74 2d 32 34 35 " (51) Fast_link_dest: 0123456789 # lfs fid2path /lustre/lscratch [0x2000061c0:0xff5:0x0] /lustre/lscratch/d17m.sanitym/short-245 Note that the type for the "good" inode is "symlink". So, what does it mean that there is no path information for the Unattached inodes?

            Once sanity test_4 was disabled, test_17n stopped failing. So, although 17n isn't failing anymore, there may still be issues here.

            jamesanunez James Nunez (Inactive) added a comment - Once sanity test_4 was disabled, test_17n stopped failing. So, although 17n isn't failing anymore, there may still be issues here.

            One option for debugging this is to do something like "do_facet $SINGLEMDS debugfs -c -R "stat <644>" $MDSDEV" to print out the FID information, and then use "lfs fid2path $FID" to figure out where the file was supposed to be linked.

            adilger Andreas Dilger added a comment - One option for debugging this is to do something like " do_facet $SINGLEMDS debugfs -c -R "stat <644>" $MDSDEV " to print out the FID information, and then use " lfs fid2path $FID " to figure out where the file was supposed to be linked.

            Hit this again on master on test_17n: https://maloo.whamcloud.com/test_sets/4c95bf56-9fce-11e3-ab91-52540035b04c

            This has failed 10x in the past week on master, so I suspect some regression.

            adilger Andreas Dilger added a comment - Hit this again on master on test_17n: https://maloo.whamcloud.com/test_sets/4c95bf56-9fce-11e3-ab91-52540035b04c This has failed 10x in the past week on master, so I suspect some regression.
            sarah Sarah Liu added a comment -

            I also hit this error when running rolling upgrade test; after all of the servers and clients upgrade from 2.4.1 to 2.5, run sanity and hit this:

            https://maloo.whamcloud.com/test_sets/2a999af6-3e0a-11e3-83f5-52540035b04c
            test log shows:

            == sanity test 17m: run e2fsck against MDT which contains short/long symlink == 22:14:48 (1382764488)
            create 512 short and long symlink files under /mnt/lustre/d0.sanity/d17m
            erase them
            Waiting for local destroys to complete
            recreate the 512 symlink files with a shorter string
            stop and checking mds1: e2fsck -fnvd /dev/sdb1
            Stopping /mnt/mds1 (opts:-f) on fat-amd-1
            fat-amd-1: e2fsck 1.42.7.wc1 (12-Apr-2013)
            fat-amd-1: e2fsck_pass1:1500: increase inode 204 badness 0 to 2
            Pass 1: Checking inodes, blocks, and sizes
            Pass 2: Checking directory structure
            Pass 3: Checking directory connectivity
            Pass 4: Checking reference counts
            Unattached inode 632
            Connect to /lost+found? no
            
            Unattached inode 633
            Connect to /lost+found? no
            
            Unattached inode 644
            Connect to /lost+found? no
            
            Unattached inode 645
            Connect to /lost+found? no
            
            Unattached inode 652
            Connect to /lost+found? no
            
            Unattached inode 653
            Connect to /lost+found? no
            
            Unattached inode 1689
            Connect to /lost+found? no
            
            Unattached inode 1690
            Connect to /lost+found? no
            
            Unattached inode 1691
            Connect to /lost+found? no
            
            Unattached inode 1692
            Connect to /lost+found? no
            
            Unattached inode 1693
            Connect to /lost+found? no
            
            Unattached inode 1694
            Connect to /lost+found? no
            
            Unattached inode 1695
            Connect to /lost+found? no
            
            Unattached inode 1696
            Connect to /lost+found? no
            
            Unattached inode 1697
            Connect to /lost+found? no
            
            Unattached inode 1698
            Connect to /lost+found? no
            
            Unattached inode 1699
            Connect to /lost+found? no
            
            Unattached inode 1700
            Connect to /lost+found? no
            
            Unattached inode 1701
            Connect to /lost+found? no
            
            Unattached inode 1702
            Connect to /lost+found? no
            
            Unattached inode 1703
            Connect to /lost+found? no
            
            Unattached inode 1704
            Connect to /lost+found? no
            
            Unattached inode 1705
            Connect to /lost+found? no
            
            Unattached inode 1709
            Connect to /lost+found? no
            
            Unattached inode 1710
            Connect to /lost+found? no
            
            Unattached inode 1711
            Connect to /lost+found? no
            
            Unattached inode 1712
            Connect to /lost+found? no
            
            Unattached inode 1713
            Connect to /lost+found? no
            
            Unattached inode 1714
            Connect to /lost+found? no
            
            Unattached inode 1715
            Connect to /lost+found? no
            
            Unattached inode 1716
            Connect to /lost+found? no
            
            Unattached inode 1717
            Connect to /lost+found? no
            
            Unattached inode 1718
            Connect to /lost+found? no
            
            Unattached inode 1719
            Connect to /lost+found? no
            
            Unattached inode 1720
            Connect to /lost+found? no
            
            Unattached inode 1721
            Connect to /lost+found? no
            
            Unattached inode 1722
            Connect to /lost+found? no
            
            Unattached inode 1723
            Connect to /lost+found? no
            
            Unattached inode 1732
            Connect to /lost+found? no
            
            Unattached inode 1733
            Connect to /lost+found? no
            
            Unattached inode 1734
            Connect to /lost+found? no
            
            Unattached inode 1735
            Connect to /lost+found? no
            
            Unattached inode 1736
            Connect to /lost+found? no
            
            Unattached inode 1737
            Connect to /lost+found? no
            
            Unattached inode 1738
            Connect to /lost+found? no
            
            Unattached inode 1739
            Connect to /lost+found? no
            
            Unattached inode 1740
            Connect to /lost+found? no
            
            Unattached inode 1741
            Connect to /lost+found? no
            
            Unattached inode 1742
            Connect to /lost+found? no
            
            Unattached inode 1743
            Connect to /lost+found? no
            
            Unattached inode 1744
            Connect to /lost+found? no
            
            Unattached inode 1745
            Connect to /lost+found? no
            
            Unattached inode 1746
            Connect to /lost+found? no
            
            Unattached inode 1747
            Connect to /lost+found? no
            
            Pass 5: Checking group summary information
            
            lustre-MDT0000: ********** WARNING: Filesystem still has errors **********
            
            
                    1396 inodes used (0.14%, out of 1000184)
                      46 non-contiguous files (3.3%)
                       2 non-contiguous directories (0.1%)
                         # of inodes with ind/dind/tind blocks: 23/3/0
                  152902 blocks used (30.58%, out of 500000)
                       0 bad blocks
                       1 large file
            
                     197 regular files
                     139 directories
                       0 character device files
                       0 block device files
                       0 fifos
                       0 links
                    1051 symbolic links (526 fast symbolic links)
                       0 sockets
            ------------
                    1333 files
            Starting mds1: -o user_xattr,acl  /dev/sdb1 /mnt/mds1
            Started lustre-MDT0000
             sanity test_17m: @@@@@@ FAIL: e2fsck should not report error upon  short/long symlink MDT: rc=4 
              Trace dump:
            
            sarah Sarah Liu added a comment - I also hit this error when running rolling upgrade test; after all of the servers and clients upgrade from 2.4.1 to 2.5, run sanity and hit this: https://maloo.whamcloud.com/test_sets/2a999af6-3e0a-11e3-83f5-52540035b04c test log shows: == sanity test 17m: run e2fsck against MDT which contains short/long symlink == 22:14:48 (1382764488) create 512 short and long symlink files under /mnt/lustre/d0.sanity/d17m erase them Waiting for local destroys to complete recreate the 512 symlink files with a shorter string stop and checking mds1: e2fsck -fnvd /dev/sdb1 Stopping /mnt/mds1 (opts:-f) on fat-amd-1 fat-amd-1: e2fsck 1.42.7.wc1 (12-Apr-2013) fat-amd-1: e2fsck_pass1:1500: increase inode 204 badness 0 to 2 Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Unattached inode 632 Connect to /lost+found? no Unattached inode 633 Connect to /lost+found? no Unattached inode 644 Connect to /lost+found? no Unattached inode 645 Connect to /lost+found? no Unattached inode 652 Connect to /lost+found? no Unattached inode 653 Connect to /lost+found? no Unattached inode 1689 Connect to /lost+found? no Unattached inode 1690 Connect to /lost+found? no Unattached inode 1691 Connect to /lost+found? no Unattached inode 1692 Connect to /lost+found? no Unattached inode 1693 Connect to /lost+found? no Unattached inode 1694 Connect to /lost+found? no Unattached inode 1695 Connect to /lost+found? no Unattached inode 1696 Connect to /lost+found? no Unattached inode 1697 Connect to /lost+found? no Unattached inode 1698 Connect to /lost+found? no Unattached inode 1699 Connect to /lost+found? no Unattached inode 1700 Connect to /lost+found? no Unattached inode 1701 Connect to /lost+found? no Unattached inode 1702 Connect to /lost+found? no Unattached inode 1703 Connect to /lost+found? no Unattached inode 1704 Connect to /lost+found? no Unattached inode 1705 Connect to /lost+found? no Unattached inode 1709 Connect to /lost+found? no Unattached inode 1710 Connect to /lost+found? no Unattached inode 1711 Connect to /lost+found? no Unattached inode 1712 Connect to /lost+found? no Unattached inode 1713 Connect to /lost+found? no Unattached inode 1714 Connect to /lost+found? no Unattached inode 1715 Connect to /lost+found? no Unattached inode 1716 Connect to /lost+found? no Unattached inode 1717 Connect to /lost+found? no Unattached inode 1718 Connect to /lost+found? no Unattached inode 1719 Connect to /lost+found? no Unattached inode 1720 Connect to /lost+found? no Unattached inode 1721 Connect to /lost+found? no Unattached inode 1722 Connect to /lost+found? no Unattached inode 1723 Connect to /lost+found? no Unattached inode 1732 Connect to /lost+found? no Unattached inode 1733 Connect to /lost+found? no Unattached inode 1734 Connect to /lost+found? no Unattached inode 1735 Connect to /lost+found? no Unattached inode 1736 Connect to /lost+found? no Unattached inode 1737 Connect to /lost+found? no Unattached inode 1738 Connect to /lost+found? no Unattached inode 1739 Connect to /lost+found? no Unattached inode 1740 Connect to /lost+found? no Unattached inode 1741 Connect to /lost+found? no Unattached inode 1742 Connect to /lost+found? no Unattached inode 1743 Connect to /lost+found? no Unattached inode 1744 Connect to /lost+found? no Unattached inode 1745 Connect to /lost+found? no Unattached inode 1746 Connect to /lost+found? no Unattached inode 1747 Connect to /lost+found? no Pass 5: Checking group summary information lustre-MDT0000: ********** WARNING: Filesystem still has errors ********** 1396 inodes used (0.14%, out of 1000184) 46 non-contiguous files (3.3%) 2 non-contiguous directories (0.1%) # of inodes with ind/dind/tind blocks: 23/3/0 152902 blocks used (30.58%, out of 500000) 0 bad blocks 1 large file 197 regular files 139 directories 0 character device files 0 block device files 0 fifos 0 links 1051 symbolic links (526 fast symbolic links) 0 sockets ------------ 1333 files Starting mds1: -o user_xattr,acl /dev/sdb1 /mnt/mds1 Started lustre-MDT0000 sanity test_17m: @@@@@@ FAIL: e2fsck should not report error upon short/long symlink MDT: rc=4 Trace dump:

            People

              adilger Andreas Dilger
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: