Details

    • New Feature
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0
    • None
    • 9223372036854775807

    Description

      OST (or MDT) pool feature enables users to group OSTs together to make object placement more flexible which is a very useful mechanism for system management. However the pool support of quota is not completed now which limits the use of it. Luckily current quota framework is really powerful and flexible which makes it possible to add new extension.

      Attachments

        Issue Links

          Activity

            [LU-11023] OST Pool Quotas

            adilger, thank you for advice. However my last attempt when I used generate_logname also failed. The reason is not finally clear for me. At first look it doesn't relate to my patch - crash dump doesn't consist the reason of panic:

            crash> dmesg | tail -n 2
            [ 1593.570869] Lustre: lustre-OST0001-osc-ffff8800a60bc800: disconnect after 21s idle
            [ 1593.573338] Lustre: Skipped 19 previous similar messages
            crash> sys | grep PANIC
                   PANIC: "" 

            On the other side it is occurred in sanity-quota_69 when it calls lctl dk - https://testing-archive.whamcloud.com/gerrit-janitor/7821/results.html

            Can someone assist me here ?

            scherementsev Sergey Cheremencev added a comment - adilger , thank you for advice. However my last attempt when I used generate_logname also failed. The reason is not finally clear for me. At first look it doesn't relate to my patch - crash dump doesn't consist the reason of panic: crash> dmesg | tail -n 2 [ 1593.570869] Lustre: lustre-OST0001-osc-ffff8800a60bc800: disconnect after 21s idle [ 1593.573338] Lustre: Skipped 19 previous similar messages crash> sys | grep PANIC        PANIC: "" On the other side it is occurred in sanity-quota_69 when it calls lctl dk -  https://testing-archive.whamcloud.com/gerrit-janitor/7821/results.html Can someone assist me here ?

            Poking around a bit further, I see that lustre/tests/auster is uploading all of the logs from its $LOGDIR, and within test-framework.sh the generate_logname() function is using $LOGDIR/$TESTSUITE.$TESTNAME.$1.<hostname>.log for the individual logfiles. It looks like you could use "lctl dk $(generate_logname $(date +%s))" to dump the logs (similar to what gather_logs() does if an error is hit) and then they will be uploaded.

            James, Minh, Charlie, please correct me if the above is not corrent for log files to be included into the Maloo report for a test session.

            adilger Andreas Dilger added a comment - Poking around a bit further, I see that lustre/tests/auster is uploading all of the logs from its $LOGDIR , and within test-framework.sh the generate_logname() function is using $LOGDIR/$TESTSUITE.$TESTNAME.$1.<hostname>.log for the individual logfiles. It looks like you could use " lctl dk $(generate_logname $(date +%s)) " to dump the logs (similar to what gather_logs() does if an error is hit) and then they will be uploaded. James, Minh, Charlie, please correct me if the above is not corrent for log files to be included into the Maloo report for a test session.

            sergey you could add debugging to the test script in your patch to dump the debug logs sooner

             I tried this approach but didn't get success. The latest sanity-quota_69 failure doesn't contain any debug logs I saved at tmp with name "$TMP/lustre-log-client-$(date +%s).log". Probably it should be similar with "sanity-quota.test_69.test_log.onyx-49vm1.log" ? If no, please advice another way.

            Thanks.

            scherementsev Sergey Cheremencev added a comment - sergey  you could add debugging to the test script in your patch to dump the debug logs sooner  I tried this approach but didn't get success. The latest sanity-quota_69  failure  doesn't contain any debug logs I saved at tmp with name "$TMP/lustre-log-client-$(date +%s).log". Probably it should be similar with "sanity-quota.test_69.test_log.onyx-49vm1.log" ? If no, please advice another way. Thanks.
            mdiep Minh Diep added a comment -

            spitzcor, I am not sure what you're taking about.

            mdiep Minh Diep added a comment - spitzcor , I am not sure what you're taking about.

            sergey you could add debugging to the test script in your patch to dump the debug logs sooner (e.g. a background thread that calls "lctl dk /tmp/lustre-log-$(date +%s).log" every 5s for some time). I believe that Maloo will attach all "/tmp/*.log" files to the test results.

            adilger Andreas Dilger added a comment - sergey you could add debugging to the test script in your patch to dump the debug logs sooner (e.g. a background thread that calls " lctl dk /tmp/lustre-log-$(date +%s).log " every 5s for some time). I believe that Maloo will attach all " /tmp/*.log " files to the test results.
            spitzcor Cory Spitz added a comment -

            mdiep, I heard that you might be able to assist with Sergey's request. Can you?

            spitzcor Cory Spitz added a comment - mdiep , I heard that you might be able to assist with Sergey's request. Can you?

            I am stuck with investigation of sanity-quota_69 failure. It fails only on configuration with 8 OSTs, 4 OSTs and 2 clients(review-dne-part-4).
            The test fails with timeout after 423 minutes from the beginning. Thus I haven't needed logs that should relate to the several minutes after test's start. Is it possible to restart this test with reduced timeout to capture needed period? I propose to set it to 4 minutes.

            scherementsev Sergey Cheremencev added a comment - I am stuck with investigation of sanity-quota_69 failure . It fails only on configuration with 8 OSTs, 4 OSTs and 2 clients(review-dne-part-4). The test fails with timeout after 423 minutes from the beginning. Thus I haven't needed logs that should relate to the several minutes after test's start. Is it possible to restart this test with reduced timeout to capture needed period? I propose to set it to 4 minutes.

            MDT pools and pool quotas? This is becoming increasingly important for limiting DoM space usage

            From my side I did all things to make the process of implementing MDT quota pools simple as possible. MDT pools look like a distinct feature. Suggest to discuss it in a another ticket. Possibly we can implement MDT pools only for DOM. Anyway I believe Cray is interesting to have pool quotas on MDT pools and I will have opportunity(need to get approvement from management) to be involved in this development process. Let's start discussing!

            integrating quota with OST object allocation on the MDS. It doesn't make sense to allocate objects on OSTs for which the user has no space. With SEL (and PFL when LU-13058 is landed) it would be possible to skip intermediate components on pools for which the user has no quota.

            The key thing here is to provide quota pools state for usr/grp/prj to LOD layer. If OST belongs to a pool, LOD could ask QMT - does this user has quota at the pool. It looks like we need just to find lqe from global pool(qmt_pool_lqe_lookup(env, qmt, pooltype, qtype, id, NULL)) and check each entry in lqe global array for edquot. So if it is a simple patch, I can help to implement this.
            But current patch is pretty big and I'd like to make this simple and small as much as possible. So I am voting to do this at another ticket when QP will be ready.

            scherementsev Sergey Cheremencev added a comment - MDT pools and pool quotas? This is becoming increasingly important for limiting DoM space usage From my side I did all things to make the process of implementing MDT quota pools simple as possible. MDT pools look like a distinct feature. Suggest to discuss it in a another ticket. Possibly we can implement MDT pools only for DOM. Anyway I believe Cray is interesting to have pool quotas on MDT pools and I will have opportunity(need to get approvement from management) to be involved in this development process. Let's start discussing! integrating quota with OST object allocation on the MDS. It doesn't make sense to allocate objects on OSTs for which the user has no space. With SEL (and PFL when  LU-13058  is landed) it would be possible to skip intermediate components on pools for which the user has no quota. The key thing here is to provide quota pools state for usr/grp/prj to LOD layer. If OST belongs to a pool, LOD could ask QMT - does this user has quota at the pool. It looks like we need just to find lqe from global pool( qmt_pool_lqe_lookup(env, qmt, pooltype, qtype, id, NULL))  and check each entry in  lqe global array  for edquot . So if it is a simple patch, I can help to implement this. But current patch is pretty big and I'd like to make this simple and small as much as possible. So I am voting to do this at another ticket when QP will be ready.

            Since this work is already nearing completion, I'm wondering if there are additional developments in this area that you will pursue:

            • MDT pools and pool quotas? This is becoming increasingly important for limiting DoM space usage
            • integrating quota with OST object allocation on the MDS. It doesn't make sense to allocate objects on OSTs for which the user has no space. With SEL (and PFL when LU-13058 is landed) it would be possible to skip intermediate components on pools for which the user has no quota.
            adilger Andreas Dilger added a comment - Since this work is already nearing completion, I'm wondering if there are additional developments in this area that you will pursue: MDT pools and pool quotas? This is becoming increasingly important for limiting DoM space usage integrating quota with OST object allocation on the MDS. It doesn't make sense to allocate objects on OSTs for which the user has no space. With SEL (and PFL when LU-13058 is landed) it would be possible to skip intermediate components on pools for which the user has no quota.

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34389/
            Subject: LU-11023 quota: remove quota pool ID
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f6819c90c8532e017646c8173337a9c92250e60f

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34389/ Subject: LU-11023 quota: remove quota pool ID Project: fs/lustre-release Branch: master Current Patch Set: Commit: f6819c90c8532e017646c8173337a9c92250e60f

            Sergey Cheremencev (c17829@cray.com) uploaded a new patch: https://review.whamcloud.com/35615
            Subject: LU-11023 quota: quota pools for OSTs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2d983ab779d203d73b01f132cb991253855af51a

            gerrit Gerrit Updater added a comment - Sergey Cheremencev (c17829@cray.com) uploaded a new patch: https://review.whamcloud.com/35615 Subject: LU-11023 quota: quota pools for OSTs Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2d983ab779d203d73b01f132cb991253855af51a

            People

              scherementsev Sergey Cheremencev
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              26 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: