Details

    • New Feature
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0
    • None
    • 9223372036854775807

    Description

      OST (or MDT) pool feature enables users to group OSTs together to make object placement more flexible which is a very useful mechanism for system management. However the pool support of quota is not completed now which limits the use of it. Luckily current quota framework is really powerful and flexible which makes it possible to add new extension.

      Attachments

        Issue Links

          Activity

            [LU-11023] OST Pool Quotas
            scherementsev Sergey Cheremencev added a comment - - edited

            There is no special ticket about Pool Quotas testing results.
            Thus leaving a link to test report here - https://wiki.lustre.org/OST_Pool_Quotas_Test_Report.

            scherementsev Sergey Cheremencev added a comment - - edited There is no special ticket about Pool Quotas testing results. Thus leaving a link to test report here -  https://wiki.lustre.org/OST_Pool_Quotas_Test_Report .

            Sergey Cheremencev (sergey.cheremencev@hpe.com) uploaded a new patch: https://review.whamcloud.com/40175
            Subject: LU-11023 tests: test quota interop
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 760d9be975dca2370a0cda558289818868c801c0

            gerrit Gerrit Updater added a comment - Sergey Cheremencev (sergey.cheremencev@hpe.com) uploaded a new patch: https://review.whamcloud.com/40175 Subject: LU-11023 tests: test quota interop Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 760d9be975dca2370a0cda558289818868c801c0
            spitzcor Cory Spitz added a comment -

            pjones, I'm afraid I didn't have the proper attention to detail after all!
            I said to rename it to "OST Quota Pools" above, but I also said, "probably better to say OST pool quotas because we have user quotas, project quotas and pool quotas, not quota pools."
            I'm sorry about the confusion. Let's call it "OST Pool Quotas" per that rational.

            spitzcor Cory Spitz added a comment - pjones , I'm afraid I didn't have the proper attention to detail after all! I said to rename it to "OST Quota Pools" above, but I also said, "probably better to say OST pool quotas because we have user quotas, project quotas and pool quotas, not quota pools." I'm sorry about the confusion. Let's call it "OST Pool Quotas" per that rational.
            pjones Peter Jones added a comment -

            I agree that this is more clear as to what is being provided in 2.14. Thanks for your attention to detail on this!

            pjones Peter Jones added a comment - I agree that this is more clear as to what is being provided in 2.14. Thanks for your attention to detail on this!
            spitzcor Cory Spitz added a comment -

            pjones and adilger, can we rename this ticket from "Add OST/MDT pool quota feature" to "OST Quota Pools"? The landed code doesn't include MDT pools and it is probably better to say OST pool quotas because we have user quotas, project quotas and pool quotas, not quota pools.

            spitzcor Cory Spitz added a comment - pjones and adilger , can we rename this ticket from "Add OST/MDT pool quota feature" to "OST Quota Pools"? The landed code doesn't include MDT pools and it is probably better to say OST pool quotas because we have user quotas, project quotas and pool quotas, not quota pools.
            pjones Peter Jones added a comment -

            Landed for 2.14

            pjones Peter Jones added a comment - Landed for 2.14

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35615/
            Subject: LU-11023 quota: quota pools for OSTs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 09f9fb3211cd998c87e26df5217cc4ad84e6ce0b

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35615/ Subject: LU-11023 quota: quota pools for OSTs Project: fs/lustre-release Branch: master Current Patch Set: Commit: 09f9fb3211cd998c87e26df5217cc4ad84e6ce0b

            so it may be that the code tries to register this same proc entry multiple times, and then crashes during cleanup when it is freed multiple times?

            In such case I'd expect to see the reason of failure in crash dump, smth like "BUG: unable to handle kernel NULL pointer".

            Anyway the reason is clear - I lost "dk" in my script causing timeout error:

            do_facet mds1 $LCTL > $(generate_logname $(date +%s)) 
            scherementsev Sergey Cheremencev added a comment - so it may be that the code tries to register this same proc entry multiple times, and then crashes during cleanup when it is freed multiple times? In such case I'd expect to see the reason of failure in crash dump, smth like "BUG: unable to handle kernel NULL pointer". Anyway the reason is clear - I lost "dk" in my script causing timeout error: do_facet mds1 $LCTL > $(generate_logname $(date +%s))
            adilger Andreas Dilger added a comment - - edited

            Looking earlier in the test logs, I see a few other stack traces in the oleg308-server-console.txt from a special test run for this patch:

            [ 4326.625102] WARNING: CPU: 2 PID: 3431 at fs/proc/generic.c:399 proc_register+0x94/0xb0
            [ 4326.627740] proc_dir_entry 'lustre-QMT0000/dt-qpool1' already registered
            [ 4326.640806] CPU: 2 PID: 3431 Comm: llog_process_th Kdump: loaded Tainted: P        W  OE  ------------   3.10.0-7.7-debug #1
            [ 4326.644194] Call Trace:
            [ 4326.644610]  [<ffffffff817d1711>] dump_stack+0x19/0x1b
            [ 4326.645525]  [<ffffffff8108ba58>] __warn+0xd8/0x100
            [ 4326.646338]  [<ffffffff8108badf>] warn_slowpath_fmt+0x5f/0x80
            [ 4326.649833]  [<ffffffff812c2434>] proc_register+0x94/0xb0
            [ 4326.650741]  [<ffffffff812c2576>] proc_mkdir_data+0x66/0xa0
            [ 4326.651683]  [<ffffffff812c25e5>] proc_mkdir+0x15/0x20
            [ 4326.652710]  [<ffffffffa0315374>] lprocfs_register+0x24/0x80 [obdclass]
            [ 4326.653941]  [<ffffffffa0aa2385>] qmt_pool_alloc+0x175/0x570 [lquota]
            [ 4326.655347]  [<ffffffffa0aa74a4>] qmt_pool_new+0x224/0x4d0 [lquota]
            [ 4326.656901]  [<ffffffffa032c83b>] class_process_config+0x22eb/0x2ee0 [obdclass]
            [ 4326.660700]  [<ffffffffa032eec9>] class_config_llog_handler+0x819/0x14b0 [obdclass]
            [ 4326.662767]  [<ffffffffa02f2582>] llog_process_thread+0x7d2/0x1a20 [obdclass]
            [ 4326.665703]  [<ffffffffa02f4292>] llog_process_thread_daemonize+0xa2/0xe0 [obdclass]
            [ 4326.676370] LustreError: 3431:0:(qmt_pool.c:208:qmt_pool_alloc()) lustre-QMT0000: failed to create proc entry for pool dt-qpool1 (-12)
            [ 4326.680007] LustreError: 3431:0:(qmt_pool.c:935:qmt_pool_new()) Can't alloc pool qpool1
            [ 4336.217899] LustreError: 3774:0:(qmt_pool.c:1343:qmt_pool_add_rem()) Can't add to lustre-OST0001_UUID pool qpool1, err -17
            [ 4336.223934] LustreError: 3774:0:(qmt_pool.c:1343:qmt_pool_add_rem()) Skipped 5 previous similar messages
            

            so it may be that the code tries to register this same proc entry multiple times, and then crashes during cleanup when it is freed multiple times?

            adilger Andreas Dilger added a comment - - edited Looking earlier in the test logs, I see a few other stack traces in the oleg308-server-console.txt from a special test run for this patch: [ 4326.625102] WARNING: CPU: 2 PID: 3431 at fs/proc/generic.c:399 proc_register+0x94/0xb0 [ 4326.627740] proc_dir_entry 'lustre-QMT0000/dt-qpool1' already registered [ 4326.640806] CPU: 2 PID: 3431 Comm: llog_process_th Kdump: loaded Tainted: P W OE ------------ 3.10.0-7.7-debug #1 [ 4326.644194] Call Trace: [ 4326.644610] [<ffffffff817d1711>] dump_stack+0x19/0x1b [ 4326.645525] [<ffffffff8108ba58>] __warn+0xd8/0x100 [ 4326.646338] [<ffffffff8108badf>] warn_slowpath_fmt+0x5f/0x80 [ 4326.649833] [<ffffffff812c2434>] proc_register+0x94/0xb0 [ 4326.650741] [<ffffffff812c2576>] proc_mkdir_data+0x66/0xa0 [ 4326.651683] [<ffffffff812c25e5>] proc_mkdir+0x15/0x20 [ 4326.652710] [<ffffffffa0315374>] lprocfs_register+0x24/0x80 [obdclass] [ 4326.653941] [<ffffffffa0aa2385>] qmt_pool_alloc+0x175/0x570 [lquota] [ 4326.655347] [<ffffffffa0aa74a4>] qmt_pool_new+0x224/0x4d0 [lquota] [ 4326.656901] [<ffffffffa032c83b>] class_process_config+0x22eb/0x2ee0 [obdclass] [ 4326.660700] [<ffffffffa032eec9>] class_config_llog_handler+0x819/0x14b0 [obdclass] [ 4326.662767] [<ffffffffa02f2582>] llog_process_thread+0x7d2/0x1a20 [obdclass] [ 4326.665703] [<ffffffffa02f4292>] llog_process_thread_daemonize+0xa2/0xe0 [obdclass] [ 4326.676370] LustreError: 3431:0:(qmt_pool.c:208:qmt_pool_alloc()) lustre-QMT0000: failed to create proc entry for pool dt-qpool1 (-12) [ 4326.680007] LustreError: 3431:0:(qmt_pool.c:935:qmt_pool_new()) Can't alloc pool qpool1 [ 4336.217899] LustreError: 3774:0:(qmt_pool.c:1343:qmt_pool_add_rem()) Can't add to lustre-OST0001_UUID pool qpool1, err -17 [ 4336.223934] LustreError: 3774:0:(qmt_pool.c:1343:qmt_pool_add_rem()) Skipped 5 previous similar messages so it may be that the code tries to register this same proc entry multiple times, and then crashes during cleanup when it is freed multiple times?

            adilger, thank you for advice. However my last attempt when I used generate_logname also failed. The reason is not finally clear for me. At first look it doesn't relate to my patch - crash dump doesn't consist the reason of panic:

            crash> dmesg | tail -n 2
            [ 1593.570869] Lustre: lustre-OST0001-osc-ffff8800a60bc800: disconnect after 21s idle
            [ 1593.573338] Lustre: Skipped 19 previous similar messages
            crash> sys | grep PANIC
                   PANIC: "" 

            On the other side it is occurred in sanity-quota_69 when it calls lctl dk - https://testing-archive.whamcloud.com/gerrit-janitor/7821/results.html

            Can someone assist me here ?

            scherementsev Sergey Cheremencev added a comment - adilger , thank you for advice. However my last attempt when I used generate_logname also failed. The reason is not finally clear for me. At first look it doesn't relate to my patch - crash dump doesn't consist the reason of panic: crash> dmesg | tail -n 2 [ 1593.570869] Lustre: lustre-OST0001-osc-ffff8800a60bc800: disconnect after 21s idle [ 1593.573338] Lustre: Skipped 19 previous similar messages crash> sys | grep PANIC        PANIC: "" On the other side it is occurred in sanity-quota_69 when it calls lctl dk -  https://testing-archive.whamcloud.com/gerrit-janitor/7821/results.html Can someone assist me here ?

            People

              scherementsev Sergey Cheremencev
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              26 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: