Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15043

OST spill pools should not allow spill pool loops

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      Using the latest build of Lustre, 2.14.54_92 build # 4421, I created a spill pool loop and encountered some unexpected behavior.
      I created three pools and created a loop of spill pools where pool1.spill= pool2, pool2.spill=pool3 and pool3.spill=pool1. I then created a file on pool1, but the file was created on pool2. The same thing happened when I created a file on pool2 and on pool3, they were created on pool3 and pool1, respectively.

      I think we should not allow spill pool loops to be created.

      Here are more details:
      Created three pools:

      # lfs pool_list scratch.pool1
      Pool: scratch.pool1
      scratch-OST0000_UUID
      # lfs pool_list scratch.pool2
      Pool: scratch.pool2
      scratch-OST0001_UUID
      # lfs pool_list scratch.pool3
      Pool: scratch.pool3
      scratch-OST0002_UUID
      

      Set spill pool and thresholds on both MDSs:

      mds1# lctl get_param lod.scratch-MDT*.pool.*.spill*
      lod.scratch-MDT0000-mdtlov.pool.pool1.spill_is_active=1
      lod.scratch-MDT0000-mdtlov.pool.pool1.spill_target=pool2
      lod.scratch-MDT0000-mdtlov.pool.pool1.spill_threshold_pct=5
      lod.scratch-MDT0000-mdtlov.pool.pool2.spill_is_active=1
      lod.scratch-MDT0000-mdtlov.pool.pool2.spill_target=pool3
      lod.scratch-MDT0000-mdtlov.pool.pool2.spill_threshold_pct=5
      lod.scratch-MDT0000-mdtlov.pool.pool3.spill_is_active=1
      lod.scratch-MDT0000-mdtlov.pool.pool3.spill_target=pool1
      lod.scratch-MDT0000-mdtlov.pool.pool3.spill_threshold_pct=5
      lod.scratch-MDT0002-mdtlov.pool.pool1.spill_is_active=1
      lod.scratch-MDT0002-mdtlov.pool.pool1.spill_target=pool2
      lod.scratch-MDT0002-mdtlov.pool.pool1.spill_threshold_pct=5
      lod.scratch-MDT0002-mdtlov.pool.pool2.spill_is_active=1
      lod.scratch-MDT0002-mdtlov.pool.pool2.spill_target=pool3
      lod.scratch-MDT0002-mdtlov.pool.pool2.spill_threshold_pct=5
      lod.scratch-MDT0002-mdtlov.pool.pool3.spill_is_active=1
      lod.scratch-MDT0002-mdtlov.pool.pool3.spill_target=pool1
      lod.scratch-MDT0002-mdtlov.pool.pool3.spill_threshold_pct=5
      

      We see the following in dmesg on mds1, not on mds2:

      [ 9046.643396] LustreError: 5659:0:(qmt_pool.c:1390:qmt_pool_add_rem()) add to: can't scratch-QMT0000 scratch-OST0000_UUID pool pool1: rc = -17
      [ 9056.957864] LustreError: 5666:0:(qmt_pool.c:1390:qmt_pool_add_rem()) add to: can't scratch-QMT0000 scratch-OST0001_UUID pool pool2: rc = -17
      [ 9065.980468] LustreError: 5674:0:(qmt_pool.c:1390:qmt_pool_add_rem()) add to: can't scratch-QMT0000 scratch-OST0002_UUID pool pool3: rc = -17
      

      Create files on specific OST pools:

      # lfs setstripe -p pool1 -c -1 /lustre/scratch/file1
      # lfs getstripe -p /lustre/scratch/file1
      pool2
      # lfs setstripe -p pool2 -c -1 /lustre/scratch/file2
      # lfs getstripe -p /lustre/scratch/file2
      pool3
      # lfs setstripe -p pool3 -c -1 /lustre/scratch/file3
      # lfs getstripe -p /lustre/scratch/file3
      pool1
      

      We see the following on MDS0:

      [10198.677195] Lustre: 1506:0:(lod_pool.c:799:lod_check_and_spill_pool()) scratch-MDT0000-mdtlov: more than 10 levels of pool spill for 'pool1->pool2'
      [10223.616652] Lustre: 1506:0:(lod_pool.c:799:lod_check_and_spill_pool()) scratch-MDT0000-mdtlov: more than 10 levels of pool spill for 'pool2->pool3'
      [10234.693511] Lustre: 1538:0:(lod_pool.c:799:lod_check_and_spill_pool()) scratch-MDT0000-mdtlov: more than 10 levels of pool spill for 'pool3->pool1' 
      

      Attachments

        Issue Links

          Activity

            [LU-15043] OST spill pools should not allow spill pool loops
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45083/
            Subject: LU-15043 lod: check for spilling loops
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: c9c842d678e38345c890c1514e9b922fe496dba7

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45083/ Subject: LU-15043 lod: check for spilling loops Project: fs/lustre-release Branch: master Current Patch Set: Commit: c9c842d678e38345c890c1514e9b922fe496dba7
            adilger Andreas Dilger added a comment - - edited

            The qmt_pool_add_rem() message started appearing on 2021-08-18, and there have been a few thousand hits per day. Patches landed on that day are listed below (no other patches landed after 2021-08-10 or before 2021-08-25):

            $ git log --oneline --before 2021-08-22  --after 2021-08-16
            d8204f903a (tag: v2_14_54, tag: 2.14.54) New tag 2.14.54
            5220160648 LU-14093 lutf: fix build with gcc10
            a205334da5 LU-14903 doc: update lfs-setdirstripe man page
            1313cad7a1 LU-14899 ldiskfs: Add 5.4.136 mainline kernel support
            c44afcfb72 LU-12815 socklnd: set conns_per_peer based on link speed
            6e30cd0844 LU-14871 kernel: kernel update RHEL7.9 [3.10.0-1160.36.2.el7]
            14b8276e06 LU-14865 utils: llog_reader.c printf type mismatch
            aa5d081237 LU-9859 lnet: fold lprocfs_call_handler functionality into lnet_debugfs_*
            e423a0bd7a LU-14787 libcfs: Proved an abstraction for AS_EXITING
            76c71a167b LU-14775 kernel: kernel update SLES12 SP5 [4.12.14-122.74.1]
            67752f6db2 LU-14773 tests: skip check_network() on working node
            024f9303bc LU-14668 lnet: Lock primary NID logic
            684943e2d0 LU-14668 lnet: peer state to lock primary nid
            16321de596 LU-14661 obdclass: Add peer/peer NI when processing llog
            ac201366ad LU-14661 lnet: Provide kernel API for adding peers
            51350e9b73 LU-14531 osd: serialize access to object vs object destroy
            a5cbe7883d LU-12815 socklnd: allow dynamic setting of conns_per_peer
            d13d8158e8 LU-14093 mgc: rework mgc_apply_recover_logs() for gcc10
            8dd4488a07 LU-6142 tests: remove iam_ut binary
            301d76a711 LU-14876 out: don't connect to busy MDS-MDS export
            

            The graph shows occurrences by subtest, it looks like this happens in any subtest that is adding a pool:

            adilger Andreas Dilger added a comment - - edited The qmt_pool_add_rem() message started appearing on 2021-08-18, and there have been a few thousand hits per day. Patches landed on that day are listed below (no other patches landed after 2021-08-10 or before 2021-08-25): $ git log --oneline --before 2021-08-22 --after 2021-08-16 d8204f903a (tag: v2_14_54, tag: 2.14.54) New tag 2.14.54 5220160648 LU-14093 lutf: fix build with gcc10 a205334da5 LU-14903 doc: update lfs-setdirstripe man page 1313cad7a1 LU-14899 ldiskfs: Add 5.4.136 mainline kernel support c44afcfb72 LU-12815 socklnd: set conns_per_peer based on link speed 6e30cd0844 LU-14871 kernel: kernel update RHEL7.9 [3.10.0-1160.36.2.el7] 14b8276e06 LU-14865 utils: llog_reader.c printf type mismatch aa5d081237 LU-9859 lnet: fold lprocfs_call_handler functionality into lnet_debugfs_* e423a0bd7a LU-14787 libcfs: Proved an abstraction for AS_EXITING 76c71a167b LU-14775 kernel: kernel update SLES12 SP5 [4.12.14-122.74.1] 67752f6db2 LU-14773 tests: skip check_network() on working node 024f9303bc LU-14668 lnet: Lock primary NID logic 684943e2d0 LU-14668 lnet: peer state to lock primary nid 16321de596 LU-14661 obdclass: Add peer/peer NI when processing llog ac201366ad LU-14661 lnet: Provide kernel API for adding peers 51350e9b73 LU-14531 osd: serialize access to object vs object destroy a5cbe7883d LU-12815 socklnd: allow dynamic setting of conns_per_peer d13d8158e8 LU-14093 mgc: rework mgc_apply_recover_logs() for gcc10 8dd4488a07 LU-6142 tests: remove iam_ut binary 301d76a711 LU-14876 out: don't connect to busy MDS-MDS export The graph shows occurrences by subtest, it looks like this happens in any subtest that is adding a pool:

            "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45083
            Subject: LU-15043 lod: check for spilling loops
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 14d441c9bdb4873e4b0255658873ee963828548e

            gerrit Gerrit Updater added a comment - "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45083 Subject: LU-15043 lod: check for spilling loops Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 14d441c9bdb4873e4b0255658873ee963828548e

            James, the "more than 10 levels of pool spill" logic in the code is to prevent an infinite loop in the kernel as it follows the circular linked list of spill targets. It looks like it stops at the 10th level of pool spilling (3 full loops plus 1), which explains the -p pool1 creating a file in pool2.

            Detecting a loop at the time spill_target is set should be relatively simple to implement. In lod_spill_target_seq_write() it should first copy the specified target into a temporary buffer, rather than pool_spill_target, and follow the specified target pool until it hits a pool with no spill_target set (the normal case), or the pool_spill_target is the same as pool->pool_name.

            adilger Andreas Dilger added a comment - James, the " more than 10 levels of pool spill " logic in the code is to prevent an infinite loop in the kernel as it follows the circular linked list of spill targets. It looks like it stops at the 10th level of pool spilling (3 full loops plus 1), which explains the -p pool1 creating a file in pool2 . Detecting a loop at the time spill_target is set should be relatively simple to implement. In lod_spill_target_seq_write() it should first copy the specified target into a temporary buffer, rather than pool_spill_target , and follow the specified target pool until it hits a pool with no spill_target set (the normal case), or the pool_spill_target is the same as pool->pool_name .

            People

              bzzz Alex Zhuravlev
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: