Details

    • New Feature
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0
    • None
    • 9223372036854775807

    Description

      OST (or MDT) pool feature enables users to group OSTs together to make object placement more flexible which is a very useful mechanism for system management. However the pool support of quota is not completed now which limits the use of it. Luckily current quota framework is really powerful and flexible which makes it possible to add new extension.

      Attachments

        Issue Links

          Activity

            [LU-11023] OST Pool Quotas
            spitzcor Cory Spitz added a comment -

            pjones and adilger, can we rename this ticket from "Add OST/MDT pool quota feature" to "OST Quota Pools"? The landed code doesn't include MDT pools and it is probably better to say OST pool quotas because we have user quotas, project quotas and pool quotas, not quota pools.

            spitzcor Cory Spitz added a comment - pjones and adilger , can we rename this ticket from "Add OST/MDT pool quota feature" to "OST Quota Pools"? The landed code doesn't include MDT pools and it is probably better to say OST pool quotas because we have user quotas, project quotas and pool quotas, not quota pools.
            pjones Peter Jones added a comment -

            Landed for 2.14

            pjones Peter Jones added a comment - Landed for 2.14

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35615/
            Subject: LU-11023 quota: quota pools for OSTs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 09f9fb3211cd998c87e26df5217cc4ad84e6ce0b

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35615/ Subject: LU-11023 quota: quota pools for OSTs Project: fs/lustre-release Branch: master Current Patch Set: Commit: 09f9fb3211cd998c87e26df5217cc4ad84e6ce0b

            so it may be that the code tries to register this same proc entry multiple times, and then crashes during cleanup when it is freed multiple times?

            In such case I'd expect to see the reason of failure in crash dump, smth like "BUG: unable to handle kernel NULL pointer".

            Anyway the reason is clear - I lost "dk" in my script causing timeout error:

            do_facet mds1 $LCTL > $(generate_logname $(date +%s)) 
            scherementsev Sergey Cheremencev added a comment - so it may be that the code tries to register this same proc entry multiple times, and then crashes during cleanup when it is freed multiple times? In such case I'd expect to see the reason of failure in crash dump, smth like "BUG: unable to handle kernel NULL pointer". Anyway the reason is clear - I lost "dk" in my script causing timeout error: do_facet mds1 $LCTL > $(generate_logname $(date +%s))
            adilger Andreas Dilger added a comment - - edited

            Looking earlier in the test logs, I see a few other stack traces in the oleg308-server-console.txt from a special test run for this patch:

            [ 4326.625102] WARNING: CPU: 2 PID: 3431 at fs/proc/generic.c:399 proc_register+0x94/0xb0
            [ 4326.627740] proc_dir_entry 'lustre-QMT0000/dt-qpool1' already registered
            [ 4326.640806] CPU: 2 PID: 3431 Comm: llog_process_th Kdump: loaded Tainted: P        W  OE  ------------   3.10.0-7.7-debug #1
            [ 4326.644194] Call Trace:
            [ 4326.644610]  [<ffffffff817d1711>] dump_stack+0x19/0x1b
            [ 4326.645525]  [<ffffffff8108ba58>] __warn+0xd8/0x100
            [ 4326.646338]  [<ffffffff8108badf>] warn_slowpath_fmt+0x5f/0x80
            [ 4326.649833]  [<ffffffff812c2434>] proc_register+0x94/0xb0
            [ 4326.650741]  [<ffffffff812c2576>] proc_mkdir_data+0x66/0xa0
            [ 4326.651683]  [<ffffffff812c25e5>] proc_mkdir+0x15/0x20
            [ 4326.652710]  [<ffffffffa0315374>] lprocfs_register+0x24/0x80 [obdclass]
            [ 4326.653941]  [<ffffffffa0aa2385>] qmt_pool_alloc+0x175/0x570 [lquota]
            [ 4326.655347]  [<ffffffffa0aa74a4>] qmt_pool_new+0x224/0x4d0 [lquota]
            [ 4326.656901]  [<ffffffffa032c83b>] class_process_config+0x22eb/0x2ee0 [obdclass]
            [ 4326.660700]  [<ffffffffa032eec9>] class_config_llog_handler+0x819/0x14b0 [obdclass]
            [ 4326.662767]  [<ffffffffa02f2582>] llog_process_thread+0x7d2/0x1a20 [obdclass]
            [ 4326.665703]  [<ffffffffa02f4292>] llog_process_thread_daemonize+0xa2/0xe0 [obdclass]
            [ 4326.676370] LustreError: 3431:0:(qmt_pool.c:208:qmt_pool_alloc()) lustre-QMT0000: failed to create proc entry for pool dt-qpool1 (-12)
            [ 4326.680007] LustreError: 3431:0:(qmt_pool.c:935:qmt_pool_new()) Can't alloc pool qpool1
            [ 4336.217899] LustreError: 3774:0:(qmt_pool.c:1343:qmt_pool_add_rem()) Can't add to lustre-OST0001_UUID pool qpool1, err -17
            [ 4336.223934] LustreError: 3774:0:(qmt_pool.c:1343:qmt_pool_add_rem()) Skipped 5 previous similar messages
            

            so it may be that the code tries to register this same proc entry multiple times, and then crashes during cleanup when it is freed multiple times?

            adilger Andreas Dilger added a comment - - edited Looking earlier in the test logs, I see a few other stack traces in the oleg308-server-console.txt from a special test run for this patch: [ 4326.625102] WARNING: CPU: 2 PID: 3431 at fs/proc/generic.c:399 proc_register+0x94/0xb0 [ 4326.627740] proc_dir_entry 'lustre-QMT0000/dt-qpool1' already registered [ 4326.640806] CPU: 2 PID: 3431 Comm: llog_process_th Kdump: loaded Tainted: P W OE ------------ 3.10.0-7.7-debug #1 [ 4326.644194] Call Trace: [ 4326.644610] [<ffffffff817d1711>] dump_stack+0x19/0x1b [ 4326.645525] [<ffffffff8108ba58>] __warn+0xd8/0x100 [ 4326.646338] [<ffffffff8108badf>] warn_slowpath_fmt+0x5f/0x80 [ 4326.649833] [<ffffffff812c2434>] proc_register+0x94/0xb0 [ 4326.650741] [<ffffffff812c2576>] proc_mkdir_data+0x66/0xa0 [ 4326.651683] [<ffffffff812c25e5>] proc_mkdir+0x15/0x20 [ 4326.652710] [<ffffffffa0315374>] lprocfs_register+0x24/0x80 [obdclass] [ 4326.653941] [<ffffffffa0aa2385>] qmt_pool_alloc+0x175/0x570 [lquota] [ 4326.655347] [<ffffffffa0aa74a4>] qmt_pool_new+0x224/0x4d0 [lquota] [ 4326.656901] [<ffffffffa032c83b>] class_process_config+0x22eb/0x2ee0 [obdclass] [ 4326.660700] [<ffffffffa032eec9>] class_config_llog_handler+0x819/0x14b0 [obdclass] [ 4326.662767] [<ffffffffa02f2582>] llog_process_thread+0x7d2/0x1a20 [obdclass] [ 4326.665703] [<ffffffffa02f4292>] llog_process_thread_daemonize+0xa2/0xe0 [obdclass] [ 4326.676370] LustreError: 3431:0:(qmt_pool.c:208:qmt_pool_alloc()) lustre-QMT0000: failed to create proc entry for pool dt-qpool1 (-12) [ 4326.680007] LustreError: 3431:0:(qmt_pool.c:935:qmt_pool_new()) Can't alloc pool qpool1 [ 4336.217899] LustreError: 3774:0:(qmt_pool.c:1343:qmt_pool_add_rem()) Can't add to lustre-OST0001_UUID pool qpool1, err -17 [ 4336.223934] LustreError: 3774:0:(qmt_pool.c:1343:qmt_pool_add_rem()) Skipped 5 previous similar messages so it may be that the code tries to register this same proc entry multiple times, and then crashes during cleanup when it is freed multiple times?

            adilger, thank you for advice. However my last attempt when I used generate_logname also failed. The reason is not finally clear for me. At first look it doesn't relate to my patch - crash dump doesn't consist the reason of panic:

            crash> dmesg | tail -n 2
            [ 1593.570869] Lustre: lustre-OST0001-osc-ffff8800a60bc800: disconnect after 21s idle
            [ 1593.573338] Lustre: Skipped 19 previous similar messages
            crash> sys | grep PANIC
                   PANIC: "" 

            On the other side it is occurred in sanity-quota_69 when it calls lctl dk - https://testing-archive.whamcloud.com/gerrit-janitor/7821/results.html

            Can someone assist me here ?

            scherementsev Sergey Cheremencev added a comment - adilger , thank you for advice. However my last attempt when I used generate_logname also failed. The reason is not finally clear for me. At first look it doesn't relate to my patch - crash dump doesn't consist the reason of panic: crash> dmesg | tail -n 2 [ 1593.570869] Lustre: lustre-OST0001-osc-ffff8800a60bc800: disconnect after 21s idle [ 1593.573338] Lustre: Skipped 19 previous similar messages crash> sys | grep PANIC        PANIC: "" On the other side it is occurred in sanity-quota_69 when it calls lctl dk -  https://testing-archive.whamcloud.com/gerrit-janitor/7821/results.html Can someone assist me here ?

            Poking around a bit further, I see that lustre/tests/auster is uploading all of the logs from its $LOGDIR, and within test-framework.sh the generate_logname() function is using $LOGDIR/$TESTSUITE.$TESTNAME.$1.<hostname>.log for the individual logfiles. It looks like you could use "lctl dk $(generate_logname $(date +%s))" to dump the logs (similar to what gather_logs() does if an error is hit) and then they will be uploaded.

            James, Minh, Charlie, please correct me if the above is not corrent for log files to be included into the Maloo report for a test session.

            adilger Andreas Dilger added a comment - Poking around a bit further, I see that lustre/tests/auster is uploading all of the logs from its $LOGDIR , and within test-framework.sh the generate_logname() function is using $LOGDIR/$TESTSUITE.$TESTNAME.$1.<hostname>.log for the individual logfiles. It looks like you could use " lctl dk $(generate_logname $(date +%s)) " to dump the logs (similar to what gather_logs() does if an error is hit) and then they will be uploaded. James, Minh, Charlie, please correct me if the above is not corrent for log files to be included into the Maloo report for a test session.

            sergey you could add debugging to the test script in your patch to dump the debug logs sooner

             I tried this approach but didn't get success. The latest sanity-quota_69 failure doesn't contain any debug logs I saved at tmp with name "$TMP/lustre-log-client-$(date +%s).log". Probably it should be similar with "sanity-quota.test_69.test_log.onyx-49vm1.log" ? If no, please advice another way.

            Thanks.

            scherementsev Sergey Cheremencev added a comment - sergey  you could add debugging to the test script in your patch to dump the debug logs sooner  I tried this approach but didn't get success. The latest sanity-quota_69  failure  doesn't contain any debug logs I saved at tmp with name "$TMP/lustre-log-client-$(date +%s).log". Probably it should be similar with "sanity-quota.test_69.test_log.onyx-49vm1.log" ? If no, please advice another way. Thanks.
            mdiep Minh Diep added a comment -

            spitzcor, I am not sure what you're taking about.

            mdiep Minh Diep added a comment - spitzcor , I am not sure what you're taking about.

            sergey you could add debugging to the test script in your patch to dump the debug logs sooner (e.g. a background thread that calls "lctl dk /tmp/lustre-log-$(date +%s).log" every 5s for some time). I believe that Maloo will attach all "/tmp/*.log" files to the test results.

            adilger Andreas Dilger added a comment - sergey you could add debugging to the test script in your patch to dump the debug logs sooner (e.g. a background thread that calls " lctl dk /tmp/lustre-log-$(date +%s).log " every 5s for some time). I believe that Maloo will attach all " /tmp/*.log " files to the test results.
            spitzcor Cory Spitz added a comment -

            mdiep, I heard that you might be able to assist with Sergey's request. Can you?

            spitzcor Cory Spitz added a comment - mdiep , I heard that you might be able to assist with Sergey's request. Can you?

            People

              scherementsev Sergey Cheremencev
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              26 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: