Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10048

osd-ldiskfs to truncate outside of main transaction

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0
    • Lustre 2.12.0, Lustre 2.10.4, Lustre 2.10.5
    • None
    • 9223372036854775807

    Description

      this is needed to implement (transaction first; locking next) order to unify locking among MDT/OST/OUT

      Attachments

        Issue Links

          Activity

            [LU-10048] osd-ldiskfs to truncate outside of main transaction

            Hello,

            We have applied the https://review.whamcloud.com/43277 ("LU-10048 ofd: take local locks within transaction") + https://review.whamcloud.com/43278 ("LU-13093 osd: fix osd_attr_set race") on the problematic filesystem. The issue never occurred after.

            We were able to reproduce this issue in 5/10 min with many creations of small files and several threads doing file migrations ("lfs migrate" between OSTs).

            eaujames Etienne Aujames added a comment - Hello, We have applied the https://review.whamcloud.com/43277 (" LU-10048 ofd: take local locks within transaction") + https://review.whamcloud.com/43278 (" LU-13093 osd: fix osd_attr_set race") on the problematic filesystem. The issue never occurred after. We were able to reproduce this issue in 5/10 min with many creations of small files and several threads doing file migrations ("lfs migrate" between OSTs).

            Hello Alex,

            Did you have the time to look to our backtrace?

            If you need more data from the crashdump I can get you some.

            eaujames Etienne Aujames added a comment - Hello Alex, Did you have the time to look to our backtrace? If you need more data from the crashdump I can get you some.

            I have added the our backtrace to this tickets: crash_bt_jbd2_locked_20210412.log

            eaujames Etienne Aujames added a comment - I have added the our backtrace to this tickets: crash_bt_jbd2_locked_20210412.log

            The issue seems to occurs relatively often with a lot of migrate threads (today 4 times on different OSTs).

            We will test the backports quickly on a production environment (maybe this week).

            I will try to get backtrace from the crashdump (manually triggered) tomorrow.

            eaujames Etienne Aujames added a comment - The issue seems to occurs relatively often with a lot of migrate threads (today 4 times on different OSTs). We will test the backports quickly on a production environment (maybe this week). I will try to get backtrace from the crashdump (manually triggered) tomorrow.

            please, generate full backtrace (

            echo t >/proc/sysrq-trigger

            ) and attach to the ticket

            bzzz Alex Zhuravlev added a comment - please, generate full backtrace ( echo t >/proc/sysrq-trigger ) and attach to the ticket

            The "LU-10048 osd: async truncate" has already landed on b2_12.

            eaujames Etienne Aujames added a comment - The " LU-10048 osd: async truncate" has already landed on b2_12.

            as for fast journal - we've got number of nodes running all the tests on RAM-backed devices 24h a day.

            bzzz Alex Zhuravlev added a comment - as for fast journal - we've got number of nodes running all the tests on RAM-backed devices 24h a day.

            People

              bzzz Alex Zhuravlev
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: