Details

    • 3
    • 3010

    Description

      Observed on the MDT during weekend testing with an mixed I/O workload. I agree failure here shouldn't be fatal, however I'm not sure we should even bother telling the administrator. If it's true there's nothing that needs to be (or can be) done, then there's no point is logging this to the console. Changing it to a CDEBUG() would be preferable. If however the administrator needs to do something (which I don't believe is the case) that should be part of the message.

      2012-05-04 17:12:17 LustreError: 14242:0:(osd_handler.c:997:osd_object_destroy()) lcz-MDT0000: failed to remove [0x2000023a1:0xf3ed:0x0] from accounting ZAP for grp 36427 (-2)
      2012-05-04 17:12:17 LustreError: 14242:0:(osd_handler.c:997:osd_object_destroy()) Skipped 1 previous similar message
      2012-05-04 17:16:22 LustreError: 14244:0:(osd_handler.c:991:osd_object_destroy()) lcz-MDT0000: failed to remove [0x20000239a:0x991a:0x0] from accounting ZAP for usr 36427 (-2)
      2012-05-04 17:16:30 LustreError: 5618:0:(osd_handler.c:2003:osd_object_create()) lcz-MDT0000: failed to add [0x20000239a:0x9d46:0x0] to accounting ZAP for usr 36427 (-2)
      2012-05-04 17:16:34 LustreError: 14241:0:(osd_handler.c:991:osd_object_destroy()) lcz-MDT0000: failed to remove [0x2000013ca:0x1952b:0x0] from accounting ZAP for usr 36427 (-2)
      2012-05-04 17:16:53 LustreError: 6778:0:(osd_handler.c:997:osd_object_destroy()) lcz-MDT0000: failed to remove [0x2000013ca:0x19e3f:0x0] from accounting ZAP for grp 36427 (-2)
      2012-05-04 17:31:21 LustreError: 6778:0:(osd_handler.c:2003:osd_object_create()) lcz-MDT0000: failed to add [0x200002393:0x1ee18:0x0] to accounting ZAP for usr 36427 (-2)
      2012-05-04 17:31:32 LustreError: 6770:0:(osd_handler.c:991:osd_object_destroy()) lcz-MDT0000: failed to remove [0x200002395:0x167e0:0x0] from accounting ZAP for usr 36427 (-2)
      

      Attachments

        Issue Links

          Activity

            [LU-2435] inode accounting in osd-zfs is racy

            depends on how exactly we call dmu_tx_hold_zap(): the more specific, the less credits. unfortunately when the key is specified, then declaration becomes very expensing due to hash lock/lookup. also, the final calculation depends on inflation size:
            asize = spa_get_asize(tx->tx_pool->dp_spa, towrite + tooverwrite);

            spa_get_asize(spa_t *spa, uint64_t lsize)
            {
            return (lsize * spa_asize_inflation);

            int spa_asize_inflation = 24;

            bzzz Alex Zhuravlev added a comment - depends on how exactly we call dmu_tx_hold_zap(): the more specific, the less credits. unfortunately when the key is specified, then declaration becomes very expensing due to hash lock/lookup. also, the final calculation depends on inflation size: asize = spa_get_asize(tx->tx_pool->dp_spa, towrite + tooverwrite); spa_get_asize(spa_t *spa, uint64_t lsize) { return (lsize * spa_asize_inflation); int spa_asize_inflation = 24;

            Is ~0.5MB the worst case or on average? Can you please describe this in detail?

            jay Jinshan Xiong (Inactive) added a comment - Is ~0.5MB the worst case or on average? Can you please describe this in detail?

            each ZAP declaration adds ~0.5MB to space/memory reservation. so dnode accounting adds ~1MB, which is 20% of single-stripe object creation.

            bzzz Alex Zhuravlev added a comment - each ZAP declaration adds ~0.5MB to space/memory reservation. so dnode accounting adds ~1MB, which is 20% of single-stripe object creation.

            The latest version of the patch is actually https://github.com/zfsonlinux/zfs/pull/3983 and not 3723.

            adilger Andreas Dilger added a comment - The latest version of the patch is actually https://github.com/zfsonlinux/zfs/pull/3983 and not 3723.

            I will work on this.

            jay Jinshan Xiong (Inactive) added a comment - I will work on this.
            pjones Peter Jones added a comment -

            Jinshan

            The link #15180 does not seem to work. Could you please confirm whether that is correct?

            Peter

            pjones Peter Jones added a comment - Jinshan The link #15180 does not seem to work. Could you please confirm whether that is correct? Peter

            The proposal made by Andreas worked. This is what I have done:

            0. prepare an 'old' lustre zfs backend and copy some files into the file system with user 'tstusr';
            1. apply patch http://review.whamcloud.com/15180 to zfs repo and patch 15294 to lustre;
            2. recompile and install zfs and lustre;
            3. upgrade pool by 'zpool upgrade lustre-mdt1';
            4. mount Lustre;
            5. upgrade mdt to use native dnode accounting by 'lctl set_param osd-zfs.lustre-MDT0000.quota_native_dnused_upgrade';
            6. and it worked. I can get correct dnode use accounting by 'lfs quota -u tstusr /mnt/lustre';

            jay Jinshan Xiong (Inactive) added a comment - The proposal made by Andreas worked. This is what I have done: 0. prepare an 'old' lustre zfs backend and copy some files into the file system with user 'tstusr'; 1. apply patch http://review.whamcloud.com/15180 to zfs repo and patch 15294 to lustre; 2. recompile and install zfs and lustre; 3. upgrade pool by 'zpool upgrade lustre-mdt1'; 4. mount Lustre; 5. upgrade mdt to use native dnode accounting by 'lctl set_param osd-zfs.lustre-MDT0000.quota_native_dnused_upgrade'; 6. and it worked. I can get correct dnode use accounting by 'lfs quota -u tstusr /mnt/lustre';

            To my knowledge there is no existing code for dnode accounting in ZFS. How to implement it in a reasonable way was described above but no one has yet done that work. However once it is done and merged we'll have to maintain it forever since old filesystems may always need to be upgraded.

            behlendorf Brian Behlendorf added a comment - To my knowledge there is no existing code for dnode accounting in ZFS. How to implement it in a reasonable way was described above but no one has yet done that work. However once it is done and merged we'll have to maintain it forever since old filesystems may always need to be upgraded.

            There is already existing code to do the dnode iteration to enable user/group block quota on an existing filesystem, and I suspect that this is still part of the ZFS code, and may never be deleted. There hopefully shouldn't be the need for much new code beyond the existing block quota iterator.

            adilger Andreas Dilger added a comment - There is already existing code to do the dnode iteration to enable user/group block quota on an existing filesystem, and I suspect that this is still part of the ZFS code, and may never be deleted. There hopefully shouldn't be the need for much new code beyond the existing block quota iterator.

            I'm going to start the 2nd phase of this work. At the time of dnode accounting feature is enabled, it will check if the file system is a legacy one. In that case, it will launch a thread to iterate all dnodes in all dataset and repair dnode accounting. I have two concerns for this task:

            1. this piece of code will become dead once the scanning task is done, and new FS won't need it at all. Will ZFS upstream team accept this code? Based on the same concern, I will make an independent patch based on https://github.com/zfsonlinux/zfs/pull/2577;
            2. does it need to do anything special for snapshot and clone?

            Thanks.

            jay Jinshan Xiong (Inactive) added a comment - I'm going to start the 2nd phase of this work. At the time of dnode accounting feature is enabled, it will check if the file system is a legacy one. In that case, it will launch a thread to iterate all dnodes in all dataset and repair dnode accounting. I have two concerns for this task: 1. this piece of code will become dead once the scanning task is done, and new FS won't need it at all. Will ZFS upstream team accept this code? Based on the same concern, I will make an independent patch based on https://github.com/zfsonlinux/zfs/pull/2577 ; 2. does it need to do anything special for snapshot and clone? Thanks.

            I pushed patch 15294 to gerrit to use ZFS dnode accounting for Lustre inode accounting. This is the work phase one. The second phase is to launch a thread to iterate all dnodes in the objset of MDT to repair accounting of legacy file system.

            jay Jinshan Xiong (Inactive) added a comment - I pushed patch 15294 to gerrit to use ZFS dnode accounting for Lustre inode accounting. This is the work phase one. The second phase is to launch a thread to iterate all dnodes in the objset of MDT to repair accounting of legacy file system.

            People

              jay Jinshan Xiong (Inactive)
              behlendorf Brian Behlendorf
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: