Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.7
    • Lustre 2.10.4
    • 2
    • 9223372036854775807

    Description

      A directory has an entry for subdirectory "2fe", but the object ID stored for that entry does not exist:

      alias ll="ls -l"
      [root@catalyst101:~]# ll /p/lustre3/videousr/YLI/mmcommons/data/images_v1
      
      ls: cannot access /p/lustre3/videousr/YLI/mmcommons/data/images_v1/2fe: No such file or directory
      
      total 0
      
      d????????? ? ? ? ?            ? 2fe
      

      And when using zdb on the MDT to examine images_v1, one sees that 2fe refers to an object ID that is invalid:

      [root@porter81:snap]# zdb -ddddd porter81/mdt0 533741247
      Dataset porter81/mdt0 [ZPL], ID 148, cr_txg 98, 910G, 61852198 objects, rootbp DVA[0]=<4:88d9c400:200> DVA[1]=<5:25ca03c200:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=1214040L/1214040P fill=61852198 cksum=139cf672b7:5dc8d6146f6:f8e6add4f57c:1e27e38477f5c0                                                                                
      
          Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
       533741247    2   128K    16K   231K     512   528K  100.00  ZFS directory
                                                     192   bonus  System attributes
              dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED SPILL_BLKPTR
              dnode maxblkid: 32                                                           
              path    ???<object#533741247>                                                
              uid     0                                                                    
              gid     2093                                                                 
              atime   Mon Oct  8 11:01:28 2018                                             
              mtime   Wed Oct  3 15:53:08 2018                                             
              ctime   Wed Oct  3 15:53:08 2018                                             
              crtime  Mon Oct  1 20:53:54 2018                                             
              gen     1090081                                                              
              mode    42700                                                                
              size    2                                                                    
              parent  533740502                                                            
              links   3                                                                    
              pflags  0                                                                    
              rdev    0x0000000000000000                                                   
              SA xattrs: 204 bytes, 3 entries                                              
      
                      trusted.lma = \000\000\000\000\000\000\000\0002@\000\000\002\000\000\000\245\037\001\000\000\000\000\000                                                                    
                      trusted.link = \337\361\352\021\001\000\000\0003\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\033\000\000\000\002\000\000@F\000\0001\213\000\000\000\000images_v1                                                                                      
                      trusted.version = \022\231\236+\011\000\000\000                               
              Fat ZAP stats:                                                                        
                      Pointer table:                                                                
                              1024 elements                                                         
                              zt_blk: 0                                                             
                              zt_numblks: 0                                                         
                              zt_shift: 10                                                          
                              zt_blks_copied: 0                                                     
                              zt_nextblk: 0                                                         
                      ZAP entries: 1                                                                
                      Leaf blocks: 32                                                               
                      Total blocks: 33                                                              
                      zap_block_type: 0x8000000000000001                                            
                      zap_magic: 0x2f52ab2ab                                                        
                      zap_salt: 0x3e3cbee7f                                                         
                      Leafs with 2^n pointers:                                                      
                                5:     32 ********************************                          
                      Blocks with n*5 entries:                                                      
                                0:     32 ********************************                          
                      Blocks n/10 full:                                                             
                                1:     32 ********************************                          
                      Entries with n chunks:                                                        
                                4:      1 *                                                         
                      Buckets with n entries:                                                       
                                0:  16383 ****************************************                  
                                1:      1 *                                                         
      
                      2fe = 533742980 (type: Directory)
      Indirect blocks:
                     0 L1  6:1a0095d000:a00 20000L/a00P F=33 B=1133009/1133009
                     0  L0 4:d99372200:200 4000L/200P F=1 B=1133009/1133009
                  4000  L0 4:2b78affa00:e00 4000L/e00P F=1 B=1132989/1132989
                  8000  L0 4:1a409fa00:e00 4000L/e00P F=1 B=1133008/1133008
                  c000  L0 4:dbecc8800:e00 4000L/e00P F=1 B=1133003/1133003
                 10000  L0 4:2d07544a00:e00 4000L/e00P F=1 B=1132997/1132997
                 14000  L0 5:11130c9600:e00 4000L/e00P F=1 B=1133005/1133005
                 18000  L0 5:1053a11c00:e00 4000L/e00P F=1 B=1132991/1132991
                 1c000  L0 4:2d07545800:e00 4000L/e00P F=1 B=1132997/1132997
                 20000  L0 6:1a41dd7c00:e00 4000L/e00P F=1 B=1133002/1133002
                 24000  L0 5:112ca4cc00:e00 4000L/e00P F=1 B=1133007/1133007
                 28000  L0 5:559e31000:e00 4000L/e00P F=1 B=1133000/1133000
                 2c000  L0 4:d91a7e000:e00 4000L/e00P F=1 B=1133004/1133004
                 30000  L0 4:d99372400:e00 4000L/e00P F=1 B=1133009/1133009
                 34000  L0 4:265bf62800:e00 4000L/e00P F=1 B=1132993/1132993
                 38000  L0 6:134c5fcc00:e00 4000L/e00P F=1 B=1132992/1132992
                 3c000  L0 5:559e31e00:e00 4000L/e00P F=1 B=1133000/1133000
                 40000  L0 5:11130ca400:e00 4000L/e00P F=1 B=1133005/1133005
                 44000  L0 4:dbeccac00:e00 4000L/e00P F=1 B=1133003/1133003
                 48000  L0 4:2b78b02200:e00 4000L/e00P F=1 B=1132989/1132989
                 4c000  L0 6:134c5ff400:e00 4000L/e00P F=1 B=1132992/1132992
                 50000  L0 4:1a40a2400:e00 4000L/e00P F=1 B=1133008/1133008
                 54000  L0 5:11130cb200:e00 4000L/e00P F=1 B=1133005/1133005
                 58000  L0 6:19f0f10c00:e00 4000L/e00P F=1 B=1132991/1132991
                 5c000  L0 4:1a40a3200:e00 4000L/e00P F=1 B=1133008/1133008
                 60000  L0 7:b97b6aa00:e00 4000L/e00P F=1 B=1133004/1133004
                 64000  L0 5:112ca4f400:e00 4000L/e00P F=1 B=1133007/1133007
                 68000  L0 4:17f825800:e00 4000L/e00P F=1 B=1132999/1132999
                 6c000  L0 6:1a2429de00:e00 4000L/e00P F=1 B=1132995/1132995
                 70000  L0 6:1a41dd9a00:e00 4000L/e00P F=1 B=1133002/1133002
                 74000  L0 7:129d29e800:e00 4000L/e00P F=1 B=1133007/1133007
                 78000  L0 4:dbeccca00:e00 4000L/e00P F=1 B=1133003/1133003
                 7c000  L0 4:17f826600:e00 4000L/e00P F=1 B=1132999/1132999
                 80000  L0 5:569fa5000:e00 4000L/e00P F=1 B=1132994/1132994
      
                      segment [0000000000000000, 0000000000084000) size  528K
      
      [root@porter81:snap]# zdb -ddddd porter81/mdt0 533742980
      Dataset porter81/mdt0 [ZPL], ID 148, cr_txg 98, 910G, 61852198 objects, rootbp DVA[0]=<4:88d9c400:200> DVA[1]=<5:25ca03c200:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=1214040L/1214040P fill=61852198 cksum=139cf672b7:5dc8d6146f6:f8e6add4f57c:1e27e38477f5c0
      
          Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
      zdb: dmu_bonus_hold(533742980) failed, errno 2
      
      

      This is on a new file system that has not been used by end-users yet, but which we attempted to copy data to. More specifically:
      1, We copied about 500 million files/dirs to it
      2. We tried to use lfs migrate -M to move some large subtrees from one MDT to another, but that failed due to a Lustre 2.8 bug with lfs migrate
      3. We deleted most of the files/dirs

      • The servers did not crash, as far as I can recall, while we were performing all the copy and delete operations. But I cannot be certain of that.
      • We inspected the console logs on the servers and clients but found nothing that sounded like it indicated object creation or destruction failing.

      Attachments

        Activity

          [LU-11481] corrupt directory
          pjones Peter Jones added a comment -

          Landed for 2.10.7. Not needed on master

          pjones Peter Jones added a comment - Landed for 2.10.7. Not needed on master

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33960/
          Subject: LU-11481 utils: disable lfs migrate -m
          Project: fs/lustre-release
          Branch: b2_10
          Current Patch Set:
          Commit: 3b7e4ac3bb896d66613e9a6bafbcf6c01a1ac63d

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33960/ Subject: LU-11481 utils: disable lfs migrate -m Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 3b7e4ac3bb896d66613e9a6bafbcf6c01a1ac63d
          gerrit Gerrit Updater added a comment - - edited

          Pushed against Master by mistake.  This one will be abandoned.

          Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/34130
          Subject: LU-11481 utils: disable lfs migrate -m
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 459ba774583997e616a04715709fc2f671dbe0bb

          gerrit Gerrit Updater added a comment - - edited Pushed against Master by mistake.  This one will be abandoned. Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/34130 Subject: LU-11481 utils: disable lfs migrate -m Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 459ba774583997e616a04715709fc2f671dbe0bb
          pjones Peter Jones added a comment -

          I re-triggered it

          pjones Peter Jones added a comment - I re-triggered it

          Hello Lai,

          I've added you as a reviewer on my patch, which at last update passed tests except sanity-scrub.test_9 which seems to me like it's unrelated to my patch - but maybe I'm mistaken.  Can you kick it so that the review-dne-part-2, which includes sanity-scrub, is re-tested?

          thanks

          ofaaland Olaf Faaland added a comment - Hello Lai, I've added you as a reviewer on my patch, which at last update passed tests except sanity-scrub.test_9 which seems to me like it's unrelated to my patch - but maybe I'm mistaken.  Can you kick it so that the review-dne-part-2, which includes sanity-scrub, is re-tested? thanks
          laisiyao Lai Siyao added a comment -

          Yes, Olaf.

          laisiyao Lai Siyao added a comment - Yes, Olaf.

          In case (a) is "yes", I've uploaded a patch for b2_10.

          ofaaland Olaf Faaland added a comment - In case (a) is "yes", I've uploaded a patch for b2_10.

          Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/33960
          Subject: LU-11481 utils: disable lfs migrate -m
          Project: fs/lustre-release
          Branch: b2_10
          Current Patch Set: 1
          Commit: d08d4a3b232c0e1a6a1fb9d2ee6f315fd26ae498

          gerrit Gerrit Updater added a comment - Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/33960 Subject: LU-11481 utils: disable lfs migrate -m Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: d08d4a3b232c0e1a6a1fb9d2ee6f315fd26ae498

          People

            laisiyao Lai Siyao
            ofaaland Olaf Faaland
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: