Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2240

implement index range lookup for osd-zfs.

Details

    • 3
    • 5303

    Description

      ZFS needs a index range lookup for DNE.

      Attachments

        Issue Links

          Activity

            [LU-2240] implement index range lookup for osd-zfs.

            I'd say post-2.4.0 would be a bit safer. But yes, we don't need to keep it around too long.

            morrone Christopher Morrone (Inactive) added a comment - I'd say post-2.4.0 would be a bit safer. But yes, we don't need to keep it around too long.
            prakash Prakash Surya (Inactive) added a comment - - edited

            Well, we still need to upgrade our production side of things which needs the conversion code. But since it landed in a tag already (2.3.63), I'm personally OK with dropping it from master. We can upgrade using a 2.3.63-based tag which will fix the FIDs, and then later upgrade to a newer tag which wouldn't have the conversion code. I'd imagine that would work just fine, and then the conversion code won't be in the actual 2.4 release.

            morrone, how does that sound to you?

            prakash Prakash Surya (Inactive) added a comment - - edited Well, we still need to upgrade our production side of things which needs the conversion code. But since it landed in a tag already (2.3.63), I'm personally OK with dropping it from master. We can upgrade using a 2.3.63-based tag which will fix the FIDs, and then later upgrade to a newer tag which wouldn't have the conversion code. I'd imagine that would work just fine, and then the conversion code won't be in the actual 2.4 release. morrone , how does that sound to you?

            Prakash, do you think we need to keep this conversion code around for a while? my preference is to drop it as soon as possible.

            bzzz Alex Zhuravlev added a comment - Prakash, do you think we need to keep this conversion code around for a while? my preference is to drop it as soon as possible.

            I reformatted out Grove-Test file system using our 2.3.62-4chaos tag. Our Grove-Production filesystem doesn't have any entries in oi.7/0x200000007* so we should be OK to simply upgrade that side of things without a reformat (as far as I can tell). So I'll go ahead and resolve this ticket.

            prakash Prakash Surya (Inactive) added a comment - I reformatted out Grove-Test file system using our 2.3.62-4chaos tag. Our Grove-Production filesystem doesn't have any entries in oi.7/0x200000007* so we should be OK to simply upgrade that side of things without a reformat (as far as I can tell). So I'll go ahead and resolve this ticket.

            I can make another patch to remove those objects, but frankly this isn't a nice way to go (we've made amount of changes to on-disk format from the beginning). so if this is possible, it'd be much much better to start from released version.
            to some extent we do check on-disk consistency, with ldiskfs though. the good thing is that attributes like nlink are manipulated the same way on zfs.

            bzzz Alex Zhuravlev added a comment - I can make another patch to remove those objects, but frankly this isn't a nice way to go (we've made amount of changes to on-disk format from the beginning). so if this is possible, it'd be much much better to start from released version. to some extent we do check on-disk consistency, with ldiskfs though. the good thing is that attributes like nlink are manipulated the same way on zfs.

            After talking with Brian some more, I definitely think the issue is the improper handling of the "links" field. The first "rm" actually deleted the object from the dataset, and the subsequent removes got ENOENT because the object was already deleted. So I think the only path forward is to either hack the ZPL or Lustre to remove the entries we're interested in from the ZAPs, or reformat the filesystem. Assuming we wont have this problem on our production FS (which I need to verify, still), I'm going to pursue a reformat of our test FS to get around this.

            prakash Prakash Surya (Inactive) added a comment - After talking with Brian some more, I definitely think the issue is the improper handling of the "links" field. The first "rm" actually deleted the object from the dataset, and the subsequent removes got ENOENT because the object was already deleted. So I think the only path forward is to either hack the ZPL or Lustre to remove the entries we're interested in from the ZAPs, or reformat the filesystem. Assuming we wont have this problem on our production FS (which I need to verify, still), I'm going to pursue a reformat of our test FS to get around this.

            Sigh.. Well it let me remove files oi.7/0x200000007:0x3:0x0, oi.7/0x200000007:0x4:0x0, and oi.7/0x200000007:0x1:0x0 (inode numbers 414211, 414213, and 414209 respectively) but I'm getting ENOENT when removing the others. Using systemtap, I can see it failing in zfs_zget:

            # grove-mds2 /mnt/grove-mds2/mdt0 > stap /usr/share/doc/systemtap-1.6/examples/general/para-callgraph.stp 'module("zfs").function("*")' -c "rm ./oi.7/0x200000007:0x2:0x0/0x1010000"
            
            ... [snip] ...
            
               677 rm(94074):    ->dmu_buf_get_user db_fake=0xffff880d717f1e40
               679 rm(94074):    <-dmu_buf_get_user return=0xffff880d52c28478
               684 rm(94074):    ->sa_get_userdata hdl=0xffff880d52c28478
               687 rm(94074):    <-sa_get_userdata return=0xffff880e6030ba70
               691 rm(94074):    ->sa_buf_rele db=0xffff880d717f1e40 tag=0x0
               694 rm(94074):     ->dbuf_rele db=0xffff880d717f1e40 tag=0x0
               696 rm(94074):      ->dbuf_rele_and_unlock db=0xffff880d717f1e40 tag=0x0
               698 rm(94074):      <-dbuf_rele_and_unlock 
               699 rm(94074):     <-dbuf_rele 
               701 rm(94074):    <-sa_buf_rele 
               703 rm(94074):   <-zfs_zget return=0x2
               707 rm(94074):   ->zfs_dirent_unlock dl=0xffff880f521949c0
               710 rm(94074):   <-zfs_dirent_unlock 
               712 rm(94074):  <-zfs_dirent_lock return=0x2
               714 rm(94074):  ->rrw_exit rrl=0xffff880d5a100290 tag=0xffffffffa0505727
               716 rm(94074):  <-rrw_exit 
               718 rm(94074): <-zfs_remove return=0x2
               720 rm(94074):<-zpl_unlink return=0xfffffffffffffffe
            

            I tried removing the files in the order that they were listed in the "find" command in my previous comment. So the first "rm" for each distinct inode number succeeded, but the following calls for files referencing the same inode number failed. Perhaps due to incorrect accounting of the number of links for a given inode?

            In case it's useful, the zdb info regarding these objects is below (AFAIK the inode number correspond to its dmu object number):

            # grove-mds2 /mnt/grove-mds2/mdt0 > zdb grove-mds2/mdt0 414209 414211 414213
            Dataset grove-mds2/mdt0 [ZPL], ID 45, cr_txg 110, 4.05G, 2088710 objects
            
                Object  lvl   iblk   dblk  dsize  lsize   %full  type
                414209    1    16K   128K   128K   128K  100.00  ZFS plain file
                414211    2     4K     4K     4K     8K  100.00  ZFS directory
                414213    2     4K     4K     4K     8K  100.00  ZFS directory
            

            I'm beginning to think a reformat is our best option moving forward...

            prakash Prakash Surya (Inactive) added a comment - Sigh.. Well it let me remove files oi.7/0x200000007:0x3:0x0 , oi.7/0x200000007:0x4:0x0 , and oi.7/0x200000007:0x1:0x0 (inode numbers 414211 , 414213 , and 414209 respectively) but I'm getting ENOENT when removing the others. Using systemtap, I can see it failing in zfs_zget : # grove-mds2 /mnt/grove-mds2/mdt0 > stap /usr/share/doc/systemtap-1.6/examples/general/para-callgraph.stp 'module("zfs").function("*")' -c "rm ./oi.7/0x200000007:0x2:0x0/0x1010000" ... [snip] ... 677 rm(94074): ->dmu_buf_get_user db_fake=0xffff880d717f1e40 679 rm(94074): <-dmu_buf_get_user return=0xffff880d52c28478 684 rm(94074): ->sa_get_userdata hdl=0xffff880d52c28478 687 rm(94074): <-sa_get_userdata return=0xffff880e6030ba70 691 rm(94074): ->sa_buf_rele db=0xffff880d717f1e40 tag=0x0 694 rm(94074): ->dbuf_rele db=0xffff880d717f1e40 tag=0x0 696 rm(94074): ->dbuf_rele_and_unlock db=0xffff880d717f1e40 tag=0x0 698 rm(94074): <-dbuf_rele_and_unlock 699 rm(94074): <-dbuf_rele 701 rm(94074): <-sa_buf_rele 703 rm(94074): <-zfs_zget return=0x2 707 rm(94074): ->zfs_dirent_unlock dl=0xffff880f521949c0 710 rm(94074): <-zfs_dirent_unlock 712 rm(94074): <-zfs_dirent_lock return=0x2 714 rm(94074): ->rrw_exit rrl=0xffff880d5a100290 tag=0xffffffffa0505727 716 rm(94074): <-rrw_exit 718 rm(94074): <-zfs_remove return=0x2 720 rm(94074):<-zpl_unlink return=0xfffffffffffffffe I tried removing the files in the order that they were listed in the "find" command in my previous comment. So the first "rm" for each distinct inode number succeeded, but the following calls for files referencing the same inode number failed. Perhaps due to incorrect accounting of the number of links for a given inode? In case it's useful, the zdb info regarding these objects is below (AFAIK the inode number correspond to its dmu object number): # grove-mds2 /mnt/grove-mds2/mdt0 > zdb grove-mds2/mdt0 414209 414211 414213 Dataset grove-mds2/mdt0 [ZPL], ID 45, cr_txg 110, 4.05G, 2088710 objects Object lvl iblk dblk dsize lsize %full type 414209 1 16K 128K 128K 128K 100.00 ZFS plain file 414211 2 4K 4K 4K 8K 100.00 ZFS directory 414213 2 4K 4K 4K 8K 100.00 ZFS directory I'm beginning to think a reformat is our best option moving forward...

            People

              bzzz Alex Zhuravlev
              di.wang Di Wang (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: