Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8714

too many update logs during soak-test.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      In the last DNE soak test, we found the recovery stuck there for very long time, (> 4 hours). It looks like there are too much update log being left during recovery.

      Each MDT has around 80k-100k records, which seems too much,

      wangdi-mac01:~ wangdi$ grep -r "mdt_index 1" /tmp/records  | wc
         76104  913248 15353624
      wangdi-mac01:~ wangdi$ grep -r "mdt_index 0" /tmp/records  | wc
         91589 1099068 19376239
      wangdi-mac01:~ wangdi$ grep -r "mdt_index 2" /tmp/records  | wc
        102798 1233576 21763151
      wangdi-mac01:~ wangdi$ grep -r "mdt_index 3" /tmp/records  | wc
         98332 1179984 20821847
      

      Unfortunately, there are not much logs to help me understanding why there are so much logs being left.

      But it seems we can make cancellation smarter. In current implementation, when one batchid is committed, it only cancel the update records for this batchid, but we actually can cancel all of update records, whose batchid < current committed batchid. Then even if some update recordss might be left for some reasons, these recordss can still be deleted by later batchid commitment.

      Attachments

        Issue Links

          Activity

            People

              di.wang Di Wang
              di.wang Di Wang
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: