Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3719

divide error in ldiskfs_mb_normalize_request on MDT

    XMLWordPrintable

Details

    • 3
    • 9578

    Description

      IU ran into an issue on their MDT where it would constantly crash after recovery. We finally got a good core dump and were able to get this bt:
      #6 [ffff882ff8ded040] divide_error at ffffffff8100bdfb
      [exception RIP: ldiskfs_mb_normalize_request+244]
      [exception RIP: ldiskfs_mb_normalize_request+244]

      This looks almost identical to LU-2480, except that it is occurring on the MDT.

      We were able to get the MDT mounted after running e2fsck and then tune2fs -E stripe_width=0,stride=0. Apparently mke2fs had set them based on values from LVM.

      I checked the line at ldiskfs_mb_normalize_request+244 in mballoc.c:
      wind = sbi->s_mb_prealloc_table[i - 1];
      tstart = ac->ac_o_ex.fe_logical;
      do_div(tstart, wind);

      It looks like the s_mb_prealloc_table isn't getting fully populated. I inspected it with crash, and that looks to be the case:
      crash> x/3xg 0xffff881822cc4d40
      0xffff881822cc4d40: 0x0000000000000c00 0x0000000000001800
      0xffff881822cc4d50: 0x0000000000000000

      0xc00 (3072) was the reported stripe_width by dumpe2fs. It appears that ldiskfs_mb_init attempts to create three entries in the table, stripe * 1, *2, and * 4. However, ldiskfs_mb_prealloc_table_add can silently fail if the entry value is > (sbi->s_blocks_per_group - 1 - 1 - sbi->s_itb_per_group). This can cause a situation where the table size is still 3, but one or more entry is zero.

      I'm not sure what the best fix is. It seems as if returning an error from ldiskfs_mb_prealloc_table_add and adjusting the table size would be ideal. Alternatively, ldiskfs_mb_normalize_request could check to make sure the table doesn't have a zero, something like:
      + for (i = 0; i < sbi>s_mb_prealloc_table_size; i++) {
      ++ for (i = 0; i < sbi->s_mb_prealloc_table_size && sbi->s_mb_prealloc_table[i]; i++) {

      Thanks.

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              kitwestneat Kit Westneat (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: