Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16981

LBUG in lod_striped_create, fewer stripes allocated than requested

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This relates to these crashes in sanity test 270a: https://knox.linuxhacker.ru/crashdb_ui_external.py.cgi?newid=68931

       

      A file is created and should have 4 stripes but one OST gets deactivated and only 3 stripes get created. There is a race condition and if the OST gets deactivated at the wrong moment, then stripe count doesn't correctly get updated and later there is a crash.

      Here are relevant lines from the debug log prior to this crash:

      00020000:00000001:4.0:1689017245.043107:0:7535:0:(lod_qos.c:2686:lod_qos_prep_create()) Process entered
      ...
      00020000:00000001:4.0:1689017245.043110:0:7535:0:(lod_qos.c:2088:lod_get_stripe_count()) Process leaving (rc=4 : 4 : 4)
      00020000:00000010:4.0:1689017245.043118:0:7535:0:(lod_qos.c:2723:lod_qos_prep_create()) kmalloced '(stripe)': 32 at ffff880295611e38.
      00020000:00000010:4.0:1689017245.043124:0:7535:0:(lod_qos.c:2726:lod_qos_prep_create()) kmalloced '(ost_indices)': 16 at ffff8802d8267868.
      00020000:00001000:4.0:1689017245.043125:0:7535:0:(lod_qos.c:2734:lod_qos_prep_create()) tgt_count 4 stripe_count 4
      ...
      00020000:00000001:4.0:1689017245.043136:0:7535:0:(lod_qos.c:1533:lod_ost_alloc_qos()) Process entered
      ...
      00020000:00000001:4.0:1689017245.043147:0:7535:0:(lod_qos.c:109:lod_statfs_and_check()) Process entered
      00000004:00000001:4.0:1689017245.043149:0:7535:0:(osp_dev.c:795:osp_statfs()) Process entered
      00000004:00001000:4.0:1689017245.043150:0:7535:0:(osp_dev.c:815:osp_statfs()) lustre-OST0000-osc-MDT0000: blocks=61184, bfree=1024, bavail=0, bsize=4096, reserved_mb_low=1, reserved_mb_high=3, files=35818, ffree=128, state=20
      00000004:00000001:4.0:1689017245.043153:0:7535:0:(osp_dev.c:833:osp_statfs()) Process leaving (rc=0 : 0 : 0)
      00020000:01000000:4.0:1689017245.043154:0:7535:0:(lod_qos.c:141:lod_statfs_and_check()) lustre-OST0000-osc-MDT0000: turns inactive
      00020000:00000001:4.0:1689017245.043155:0:7535:0:(lod_qos.c:168:lod_statfs_and_check()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4)
      ...
      00020000:00001000:4.0:1689017245.043173:0:7535:0:(lod_qos.c:1639:lod_ost_alloc_qos()) found 3 good osts
      ...
      # there are only 3 of these lines that actually allocated stripes...
      00000004:00000010:4.0:1689017245.043188:0:7535:0:(osp_dev.c:118:osp_object_alloc()) slab-alloced 'o': 456 at ffff8801a38115b0.
      ...
      00020000:00000001:4.0:1689017245.043460:0:7535:0:(lod_qos.c:1771:lod_ost_alloc_qos()) Process leaving (rc=0 : 0 : 0)
      00020000:00000001:4.0:1689017245.043462:0:7535:0:(lod_qos.c:2820:lod_qos_prep_create()) Process leaving (rc=0 : 0 : 0)

      In most cases, lod_ost_alloc_qos() will return -EAGAIN when it can't allocate enough stripes, and then lod_ost_alloc_rr() later will be called.
      lod_ost_alloc_rr() will adjust lod_comp->llc_stripe_count if it needs to be reduced because fewer stripes are allocated than requested.
      However, in this case, if an OST is deactivated after the call to ltd_qos_is_usable() on line 1592 but before lod_statfs_and_check() on line 1615, then we can end up with fewer stripes than requested, but lod_ost_alloc_qos() still returns 0, not EAGAIN, so the llc_stripe_count is never reduced to the right value.

      This can happen as long as the number of available OSTs is greater than stripe_count_min (but less than stripe_count).

      Here is how to reproduce. There might be a more elegant way to reproduce this but this works for me...

       

      # this is just to make one OST usage higher so that the QOS algorithm is used instead of RR
      lfs setstripe -i0 -c1 /mnt/lustre/bigfile && head --bytes=$((1024 * 100000)) /dev/zero > /mnt/lustre/bigfile
      # on my setup (single VM, 2 MDT, 4 OST), this triggers the LBUG pretty reliably within a few hundred loops
      for i in {0..500}; do
              lctl set_param osp.lustre-OST0000-osc-MDT0000.max_create_count=0 & lfs setstripe -c -1 /mnt/lustre/f$i
              lctl set_param osp.lustre-OST0000-osc-MDT0000.max_create_count=1000 & lfs setstripe -c -1 /mnt/lustre/g$i
      done
      

       

      Attachments

        Issue Links

          Activity

            People

              bertschinger Thomas Bertschinger
              bertschinger Thomas Bertschinger
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: