Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.7
    • Lustre 2.10.4
    • 2
    • 9223372036854775807

    Description

      A directory has an entry for subdirectory "2fe", but the object ID stored for that entry does not exist:

      alias ll="ls -l"
      [root@catalyst101:~]# ll /p/lustre3/videousr/YLI/mmcommons/data/images_v1
      
      ls: cannot access /p/lustre3/videousr/YLI/mmcommons/data/images_v1/2fe: No such file or directory
      
      total 0
      
      d????????? ? ? ? ?            ? 2fe
      

      And when using zdb on the MDT to examine images_v1, one sees that 2fe refers to an object ID that is invalid:

      [root@porter81:snap]# zdb -ddddd porter81/mdt0 533741247
      Dataset porter81/mdt0 [ZPL], ID 148, cr_txg 98, 910G, 61852198 objects, rootbp DVA[0]=<4:88d9c400:200> DVA[1]=<5:25ca03c200:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=1214040L/1214040P fill=61852198 cksum=139cf672b7:5dc8d6146f6:f8e6add4f57c:1e27e38477f5c0                                                                                
      
          Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
       533741247    2   128K    16K   231K     512   528K  100.00  ZFS directory
                                                     192   bonus  System attributes
              dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED SPILL_BLKPTR
              dnode maxblkid: 32                                                           
              path    ???<object#533741247>                                                
              uid     0                                                                    
              gid     2093                                                                 
              atime   Mon Oct  8 11:01:28 2018                                             
              mtime   Wed Oct  3 15:53:08 2018                                             
              ctime   Wed Oct  3 15:53:08 2018                                             
              crtime  Mon Oct  1 20:53:54 2018                                             
              gen     1090081                                                              
              mode    42700                                                                
              size    2                                                                    
              parent  533740502                                                            
              links   3                                                                    
              pflags  0                                                                    
              rdev    0x0000000000000000                                                   
              SA xattrs: 204 bytes, 3 entries                                              
      
                      trusted.lma = \000\000\000\000\000\000\000\0002@\000\000\002\000\000\000\245\037\001\000\000\000\000\000                                                                    
                      trusted.link = \337\361\352\021\001\000\000\0003\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\033\000\000\000\002\000\000@F\000\0001\213\000\000\000\000images_v1                                                                                      
                      trusted.version = \022\231\236+\011\000\000\000                               
              Fat ZAP stats:                                                                        
                      Pointer table:                                                                
                              1024 elements                                                         
                              zt_blk: 0                                                             
                              zt_numblks: 0                                                         
                              zt_shift: 10                                                          
                              zt_blks_copied: 0                                                     
                              zt_nextblk: 0                                                         
                      ZAP entries: 1                                                                
                      Leaf blocks: 32                                                               
                      Total blocks: 33                                                              
                      zap_block_type: 0x8000000000000001                                            
                      zap_magic: 0x2f52ab2ab                                                        
                      zap_salt: 0x3e3cbee7f                                                         
                      Leafs with 2^n pointers:                                                      
                                5:     32 ********************************                          
                      Blocks with n*5 entries:                                                      
                                0:     32 ********************************                          
                      Blocks n/10 full:                                                             
                                1:     32 ********************************                          
                      Entries with n chunks:                                                        
                                4:      1 *                                                         
                      Buckets with n entries:                                                       
                                0:  16383 ****************************************                  
                                1:      1 *                                                         
      
                      2fe = 533742980 (type: Directory)
      Indirect blocks:
                     0 L1  6:1a0095d000:a00 20000L/a00P F=33 B=1133009/1133009
                     0  L0 4:d99372200:200 4000L/200P F=1 B=1133009/1133009
                  4000  L0 4:2b78affa00:e00 4000L/e00P F=1 B=1132989/1132989
                  8000  L0 4:1a409fa00:e00 4000L/e00P F=1 B=1133008/1133008
                  c000  L0 4:dbecc8800:e00 4000L/e00P F=1 B=1133003/1133003
                 10000  L0 4:2d07544a00:e00 4000L/e00P F=1 B=1132997/1132997
                 14000  L0 5:11130c9600:e00 4000L/e00P F=1 B=1133005/1133005
                 18000  L0 5:1053a11c00:e00 4000L/e00P F=1 B=1132991/1132991
                 1c000  L0 4:2d07545800:e00 4000L/e00P F=1 B=1132997/1132997
                 20000  L0 6:1a41dd7c00:e00 4000L/e00P F=1 B=1133002/1133002
                 24000  L0 5:112ca4cc00:e00 4000L/e00P F=1 B=1133007/1133007
                 28000  L0 5:559e31000:e00 4000L/e00P F=1 B=1133000/1133000
                 2c000  L0 4:d91a7e000:e00 4000L/e00P F=1 B=1133004/1133004
                 30000  L0 4:d99372400:e00 4000L/e00P F=1 B=1133009/1133009
                 34000  L0 4:265bf62800:e00 4000L/e00P F=1 B=1132993/1132993
                 38000  L0 6:134c5fcc00:e00 4000L/e00P F=1 B=1132992/1132992
                 3c000  L0 5:559e31e00:e00 4000L/e00P F=1 B=1133000/1133000
                 40000  L0 5:11130ca400:e00 4000L/e00P F=1 B=1133005/1133005
                 44000  L0 4:dbeccac00:e00 4000L/e00P F=1 B=1133003/1133003
                 48000  L0 4:2b78b02200:e00 4000L/e00P F=1 B=1132989/1132989
                 4c000  L0 6:134c5ff400:e00 4000L/e00P F=1 B=1132992/1132992
                 50000  L0 4:1a40a2400:e00 4000L/e00P F=1 B=1133008/1133008
                 54000  L0 5:11130cb200:e00 4000L/e00P F=1 B=1133005/1133005
                 58000  L0 6:19f0f10c00:e00 4000L/e00P F=1 B=1132991/1132991
                 5c000  L0 4:1a40a3200:e00 4000L/e00P F=1 B=1133008/1133008
                 60000  L0 7:b97b6aa00:e00 4000L/e00P F=1 B=1133004/1133004
                 64000  L0 5:112ca4f400:e00 4000L/e00P F=1 B=1133007/1133007
                 68000  L0 4:17f825800:e00 4000L/e00P F=1 B=1132999/1132999
                 6c000  L0 6:1a2429de00:e00 4000L/e00P F=1 B=1132995/1132995
                 70000  L0 6:1a41dd9a00:e00 4000L/e00P F=1 B=1133002/1133002
                 74000  L0 7:129d29e800:e00 4000L/e00P F=1 B=1133007/1133007
                 78000  L0 4:dbeccca00:e00 4000L/e00P F=1 B=1133003/1133003
                 7c000  L0 4:17f826600:e00 4000L/e00P F=1 B=1132999/1132999
                 80000  L0 5:569fa5000:e00 4000L/e00P F=1 B=1132994/1132994
      
                      segment [0000000000000000, 0000000000084000) size  528K
      
      [root@porter81:snap]# zdb -ddddd porter81/mdt0 533742980
      Dataset porter81/mdt0 [ZPL], ID 148, cr_txg 98, 910G, 61852198 objects, rootbp DVA[0]=<4:88d9c400:200> DVA[1]=<5:25ca03c200:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=1214040L/1214040P fill=61852198 cksum=139cf672b7:5dc8d6146f6:f8e6add4f57c:1e27e38477f5c0
      
          Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
      zdb: dmu_bonus_hold(533742980) failed, errno 2
      
      

      This is on a new file system that has not been used by end-users yet, but which we attempted to copy data to. More specifically:
      1, We copied about 500 million files/dirs to it
      2. We tried to use lfs migrate -M to move some large subtrees from one MDT to another, but that failed due to a Lustre 2.8 bug with lfs migrate
      3. We deleted most of the files/dirs

      • The servers did not crash, as far as I can recall, while we were performing all the copy and delete operations. But I cannot be certain of that.
      • We inspected the console logs on the servers and clients but found nothing that sounded like it indicated object creation or destruction failing.

      Attachments

        1. console.porter81.gz
          161 kB
          Olaf Faaland
        2. console.porter82.gz
          78 kB
          Olaf Faaland

        Activity

          [LU-11481] corrupt directory
          pjones Peter Jones added a comment -

          Landed for 2.10.7. Not needed on master

          pjones Peter Jones added a comment - Landed for 2.10.7. Not needed on master

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33960/
          Subject: LU-11481 utils: disable lfs migrate -m
          Project: fs/lustre-release
          Branch: b2_10
          Current Patch Set:
          Commit: 3b7e4ac3bb896d66613e9a6bafbcf6c01a1ac63d

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33960/ Subject: LU-11481 utils: disable lfs migrate -m Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 3b7e4ac3bb896d66613e9a6bafbcf6c01a1ac63d
          gerrit Gerrit Updater added a comment - - edited

          Pushed against Master by mistake.  This one will be abandoned.

          Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/34130
          Subject: LU-11481 utils: disable lfs migrate -m
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 459ba774583997e616a04715709fc2f671dbe0bb

          gerrit Gerrit Updater added a comment - - edited Pushed against Master by mistake.  This one will be abandoned. Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/34130 Subject: LU-11481 utils: disable lfs migrate -m Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 459ba774583997e616a04715709fc2f671dbe0bb
          pjones Peter Jones added a comment -

          I re-triggered it

          pjones Peter Jones added a comment - I re-triggered it

          Hello Lai,

          I've added you as a reviewer on my patch, which at last update passed tests except sanity-scrub.test_9 which seems to me like it's unrelated to my patch - but maybe I'm mistaken.  Can you kick it so that the review-dne-part-2, which includes sanity-scrub, is re-tested?

          thanks

          ofaaland Olaf Faaland added a comment - Hello Lai, I've added you as a reviewer on my patch, which at last update passed tests except sanity-scrub.test_9 which seems to me like it's unrelated to my patch - but maybe I'm mistaken.  Can you kick it so that the review-dne-part-2, which includes sanity-scrub, is re-tested? thanks
          laisiyao Lai Siyao added a comment -

          Yes, Olaf.

          laisiyao Lai Siyao added a comment - Yes, Olaf.

          In case (a) is "yes", I've uploaded a patch for b2_10.

          ofaaland Olaf Faaland added a comment - In case (a) is "yes", I've uploaded a patch for b2_10.

          Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/33960
          Subject: LU-11481 utils: disable lfs migrate -m
          Project: fs/lustre-release
          Branch: b2_10
          Current Patch Set: 1
          Commit: d08d4a3b232c0e1a6a1fb9d2ee6f315fd26ae498

          gerrit Gerrit Updater added a comment - Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/33960 Subject: LU-11481 utils: disable lfs migrate -m Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: d08d4a3b232c0e1a6a1fb9d2ee6f315fd26ae498
          ofaaland Olaf Faaland added a comment -

          Directory migration is rewritten, which fixed many issues in migration, but it's in 2.12.

          OK, then you're saying:
          (a) directory migration in 2.10 is unsafe - risks data loss - and should not be used,
          and
          (b) there is nothing more to do on this issue and no additional debug code is necessary

          Is that correct?
          Thanks

          ofaaland Olaf Faaland added a comment - Directory migration is rewritten, which fixed many issues in migration, but it's in 2.12. OK, then you're saying: (a) directory migration in 2.10 is unsafe - risks data loss - and should not be used, and (b) there is nothing more to do on this issue and no additional debug code is necessary Is that correct? Thanks
          laisiyao Lai Siyao added a comment -

          Directory migration is rewritten, which fixed many issues in migration, but it's in 2.12.

          laisiyao Lai Siyao added a comment - Directory migration is rewritten, which fixed many issues in migration, but it's in 2.12.

          People

            laisiyao Lai Siyao
            ofaaland Olaf Faaland
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: