Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7899

osd_xattr_set() to batch actual EA update

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.1, Lustre 2.11.0
    • None
    • None
    • 9223372036854775807

    Description

      moving EAs from nvlist into bonus/spill is quite expensive, we can save on this a bit collecting changes in nvlist (what we do already) and calling sa_update() from osd_trans_stop().

      Attachments

        Issue Links

          Activity

            [LU-7899] osd_xattr_set() to batch actual EA update

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/21893/
            Subject: LU-7899 osd: batch EA updates
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 2c9ff6dffdf4320af95c9db9af07a416529275f0

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/21893/ Subject: LU-7899 osd: batch EA updates Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2c9ff6dffdf4320af95c9db9af07a416529275f0

            Tested latest version of the patch, ran 24 hours on soak, so far. No significant errors.

            cliffw Cliff White (Inactive) added a comment - Tested latest version of the patch, ran 24 hours on soak, so far. No significant errors.

            What are you looking for from slab top? No hard crashes yet, restarting this morning.

            cliffw Cliff White (Inactive) added a comment - What are you looking for from slab top? No hard crashes yet, restarting this morning.

            Cliff, any more details?

            bzzz Alex Zhuravlev added a comment - Cliff, any more details?

            With this patch, i am seeing more jobs fail than succeed.

            cliffw Cliff White (Inactive) added a comment - With this patch, i am seeing more jobs fail than succeed.

            I am seeing more odd job failures. Things like:

            07/21/2017 19:10:37: Process 0(soak-16.spirit.hpdd.intel.com): FAILED in mdtest_stat, unable to stat file: Input/output error
            34395-simul.out:19:32:08: Process 10(soak-26.spirit.hpdd.intel.com): FAILED in simul_file_stat, stat failed: Input/output error
            34409-simul.out:19:31:50: Process 12(soak-26.spirit.hpdd.intel.com): FAILED in simul_truncate, truncate failed: Cannot send after transport endpoint shutdown
            34442-simul.out:19:10:37: Process 0(soak-16.spirit.hpdd.intel.com): FAILED in create_files, write in file /mnt/soaked/soaktest/test/simul/34442/simul_write.12: Input/output error
            34454-mdtestssf.out:07/21/2017 19:10:37: Process 0(soak-16.spirit.hpdd.intel.com): FAILED in mdtest_stat, unable to stat file: Input/output error
            34454-mdtestssf.out:07/21/2017 19:10:37: Process 1(soak-16.spirit.hpdd.intel.com): FAILED in mdtest_stat, unable to stat file: Input/output error
            34462-simul.out:19:25:28: Process 26(soak-29.spirit.hpdd.intel.com): FAILED in simul_truncate, truncate failed: Input/output error
            

            investigating.

            cliffw Cliff White (Inactive) added a comment - I am seeing more odd job failures. Things like: 07/21/2017 19:10:37: Process 0(soak-16.spirit.hpdd.intel.com): FAILED in mdtest_stat, unable to stat file: Input/output error 34395-simul.out:19:32:08: Process 10(soak-26.spirit.hpdd.intel.com): FAILED in simul_file_stat, stat failed: Input/output error 34409-simul.out:19:31:50: Process 12(soak-26.spirit.hpdd.intel.com): FAILED in simul_truncate, truncate failed: Cannot send after transport endpoint shutdown 34442-simul.out:19:10:37: Process 0(soak-16.spirit.hpdd.intel.com): FAILED in create_files, write in file /mnt/soaked/soaktest/test/simul/34442/simul_write.12: Input/output error 34454-mdtestssf.out:07/21/2017 19:10:37: Process 0(soak-16.spirit.hpdd.intel.com): FAILED in mdtest_stat, unable to stat file: Input/output error 34454-mdtestssf.out:07/21/2017 19:10:37: Process 1(soak-16.spirit.hpdd.intel.com): FAILED in mdtest_stat, unable to stat file: Input/output error 34462-simul.out:19:25:28: Process 26(soak-29.spirit.hpdd.intel.com): FAILED in simul_truncate, truncate failed: Input/output error investigating.

            Cliff, the patch has been rebased. please, give it a run. thanks in advance.

            bzzz Alex Zhuravlev added a comment - Cliff, the patch has been rebased. please, give it a run. thanks in advance.

            Since we have landed LU-9504, can this patch be rebased? I will be able to run it on soak.

            cliffw Cliff White (Inactive) added a comment - Since we have landed LU-9504 , can this patch be rebased? I will be able to run it on soak.

            We need a decent working baseline to apply this patch too. That is the problem. LU-9504 currently kills the system too quickly, we must have a fix for that in the baseline.

            cliffw Cliff White (Inactive) added a comment - We need a decent working baseline to apply this patch too. That is the problem. LU-9504 currently kills the system too quickly, we must have a fix for that in the baseline.

            hmm, I still can't reproduce it Cliff, is it possible to run with just this patch and watch slabtop output along the run?

            bzzz Alex Zhuravlev added a comment - hmm, I still can't reproduce it Cliff, is it possible to run with just this patch and watch slabtop output along the run?

            thanks... working on that.

            bzzz Alex Zhuravlev added a comment - thanks... working on that.

            People

              bzzz Alex Zhuravlev
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: