Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • None
    • 6
    • 3
    • 5832

    Description

      The Lustre Manual has a section that has suggested tuning for "testing", under "Tuning Linux Storage Devices". All of the settings have suggested values except for /sys/block/sdN/queue/scheduler. It would be nice to have a suggestion there.

      I think we are probably still using the Linux default (probably CFQ) everywhere at LLNL, and that may be a problem. I remember a recent discussion at LUG that suggested that was bad. ZFS certainly attempts to change the scheduler off of CFQ to noop (if ZFS believes that it owns the entire disc).

      For ldiskfs it might be the deadline scheduler that we should recommend?

      Attachments

        Activity

          [LUDOC-109] Missing block scheduler tuning suggestion

          Change has been approved and merged. Resolving ticket.

          linda Linda Bebernes (Inactive) added a comment - Change has been approved and merged. Resolving ticket.
          linda Linda Bebernes (Inactive) added a comment - - edited

          Added note about scheduler default (deadline) and recommendations (deadline, noop). Patch is ready for review at http://review.whamcloud.com/#change,6486.

          linda Linda Bebernes (Inactive) added a comment - - edited Added note about scheduler default (deadline) and recommendations (deadline, noop). Patch is ready for review at http://review.whamcloud.com/#change,6486 .

          Note that LU-2498 has a patch (http://review.whamcloud.com/4853) to automatically change the default scheduler for Lustre block devices from CFQ to deadline, unless it is already set to noop. This behavior should also be documented.

          adilger Andreas Dilger added a comment - Note that LU-2498 has a patch ( http://review.whamcloud.com/4853 ) to automatically change the default scheduler for Lustre block devices from CFQ to deadline, unless it is already set to noop. This behavior should also be documented.

          Brett,
          Would you mind taking a look at this and see if this is something you might be able to work with Linda on the Lustre Manual project with?

          jlevi Jodi Levi (Inactive) added a comment - Brett, Would you mind taking a look at this and see if this is something you might be able to work with Linda on the Lustre Manual project with?

          IIRC, while ZFS allocates the IO in order, there is some jitter in the processing times of the IO requests between threads, and this causes slightly out-of-order IO submission to the queue. At least I recall Brian (or someone) commenting about the slightly non-linear IO ordering from ZFS at the disk level. That's why I suggest deadline over noop, since it isn't guaranteed that only front/back merging is enough.

          adilger Andreas Dilger added a comment - IIRC, while ZFS allocates the IO in order, there is some jitter in the processing times of the IO requests between threads, and this causes slightly out-of-order IO submission to the queue. At least I recall Brian (or someone) commenting about the slightly non-linear IO ordering from ZFS at the disk level. That's why I suggest deadline over noop, since it isn't guaranteed that only front/back merging is enough.

          I thought it was old and established knowledge that CFQ sucks for high-performance workloads

          Well, that common knowledge appears to have been missed in both the documention, and at LLNL as a whole.

          but I think it needs to be done internally by ZFS for its constituent block devices

          That was the intent with ZFS, but apparently Brian was worried about setting the device's scheduler unilaterally, since the drive might be shared with other filesystems in other partitions. But they are talking that out right now in the hallway.

          Brian tells me that even the noop scheduler does front/back merging. He might have said that the merging happens at a layer before the scheduling or something along those lines. That isn't to say that deadline might help too, but at least we should get merging even with noop. And in theory ZFS's scheduler will make things easy to merge. We need to verify that theory with block traces though.

          morrone Christopher Morrone (Inactive) added a comment - I thought it was old and established knowledge that CFQ sucks for high-performance workloads Well, that common knowledge appears to have been missed in both the documention, and at LLNL as a whole. but I think it needs to be done internally by ZFS for its constituent block devices That was the intent with ZFS, but apparently Brian was worried about setting the device's scheduler unilaterally, since the drive might be shared with other filesystems in other partitions. But they are talking that out right now in the hallway. Brian tells me that even the noop scheduler does front/back merging. He might have said that the merging happens at a layer before the scheduling or something along those lines. That isn't to say that deadline might help too, but at least we should get merging even with noop. And in theory ZFS's scheduler will make things easy to merge. We need to verify that theory with block traces though.

          People

            cliffw Cliff White (Inactive)
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: