Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19078

Fix stripe_width and stride check bug introduced in LU-12158

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      LU-12158 introduced logic to limit stride and stripe_width values during filesystem creation, but also introduced two issues:

      Issue 1: Incorrect units used for IO size checks

      LU-12158 compares device-reported IO sizes (in bytes) directly against hardcoded thresholds (OPTIMIZED_STRIPE_WIDTH and OPTIMIZED_STRIDE), which are incorrectly defined as raw integers without converting to file system block units. This leads to the parameters never being set unless explicitly specified during mkfs. (This unit mismatch was mentioned by Andreas in LU-18514, too)

      Details

      During ext4/ldiskfs filesystem creation, misc/mke2fs.c retrieves:

      • /sys/block/<device>/queue/minimum_io_size
      • /sys/block/<device>/queue/optimal_io_size

      These values are reported in bytes. For example, a typical RAID 6 (8+2) configuration might return:

      minimum_io_size:    131072   # 128 KB
      optimal_io_size:   1048576   # 1 MB (8 * 128 KB)

      However, the stride and stripe_width parameters for ldiskfs are specified in file system blocks (typically 4 KB). Thus, the effective sizes tested in LU-12158 were:

      512  blocks = 2 MB
      1024 blocks = 4 MB
      2048 blocks = 8 MB
      4096 blocks = 16 MB

      LU-12158 intended to limit stripe-related values to 2 MB, but mistakenly used raw values:

      #define OPTIMIZED_STRIPE_WIDTH  512
      #define OPTIMIZED_STRIDE        512

      These values are compared directly to dev_param->{min,opt_io} sizes reported in bytes:

      	dev_param->min_io = blkid_topology_get_minimum_io_size(tp);
      	if (dev_param->min_io > OPTIMIZED_STRIDE) {
      		fprintf(stdout,
      			"detected raid stride %lu too large, use optimum %u\n",
      			dev_param->min_io, OPTIMIZED_STRIDE);
      		dev_param->min_io = OPTIMIZED_STRIDE;
      	}
      	dev_param->opt_io = blkid_topology_get_optimal_io_size(tp);
      	if (dev_param->opt_io > OPTIMIZED_STRIPE_WIDTH) {
      		fprintf(stdout,
      			"detected raid stripe width %lu too large, use optimum %u\n",
      			dev_param->opt_io, OPTIMIZED_STRIPE_WIDTH);
      		dev_param->opt_io = OPTIMIZED_STRIPE_WIDTH;
      	}
      

      Since even modest IO sizes (e.g., 128 KB) exceed 512, the values are always clamped to 512 bytes, which causes fs_param.s_raid_stride and s_raid_stripe_width to never be set because of this code in misc/mke2fs.c:

      		/* setting stripe/stride to blocksize is pointless */
      		if (dev_param.min_io > (unsigned) blocksize)
      			fs_param.s_raid_stride = dev_param.min_io / blocksize;
      		if (dev_param.opt_io > (unsigned) blocksize) {
      			fs_param.s_raid_stripe_width =
      						dev_param.opt_io / blocksize;
      		}
      

      Because 512 < 4096, the condition fails, and no RAID parameters are applied unless explicitly specified via mkfs.

      Issue 2: Misleading warnings even when custom values are used

      The patch also introduces warnings:

      detected raid stride %lu too large, use optimum %lu
      detected raid stripe width %lu too large, use optimum %lu

      These warnings appear unconditionally, even when custom values for stride and stripe_width are explicitly provided via mkfs. This is misleading, because in this case, the user-supplied values are used, not the "optimum" ones.

      Resolution Plan

      This LU will fix both issues:

      • Issue1: Correct the unit comparison by converting OPTIMIZED_STRIDE and OPTIMIZED_STRIPE_WIDTH to bytes (e.g., 512 * 4096 = 2 MB) to match the units of min_io and opt_io.
      • Issue 2: Suppress or conditionally display the warning messages only when the parameters are being auto-detected, not when they are explicitly specified by the user. This will fix LU-18514.

      Attachments

        Issue Links

          Activity

            People

              markus.hilger Markus Hilger
              markus.hilger Markus Hilger
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: