Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12018

deadlock on OSS: quota reintegration vs memory release

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.7.0, Lustre 2.12.0, Lustre 2.13.0, Lustre 2.10.6
    • Fix Version/s: Lustre 2.13.0, Lustre 2.12.1
    • Labels:
      None
    • Environment:
      kernel 3.10.0-514.26.2.el7_lustre.2.7.21.1.ddn4.g3b21639.x86_64
      lustre 2.7.21.3-ddn36
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      I think this can be a deadlock on OSS. Many threads were being blocked while there were no I/O activity.

      [root@foss22 ~]# vmstat 1
      procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
       r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
       2 265      0 4842020 1118532 57555660    0    0   752   983    8    5  0  1 89 10  0
       0 265      0 4840220 1118532 57557244    0    0     0    12 12538 20455  0  1  0 99  0
       0 265      0 4841724 1118532 57555684    0    0     0     4 17744 25132  0  1  0 99  0
       0 265      0 4839416 1118532 57554168    0    0     0    16 13079 20067  0  1  0 99  0
       2 265      0 4839468 1118532 57553992    0    0     0     4 12368 20798  0  1  0 99  0
       0 265      0 4838772 1118532 57553444    0    0     0     8 11571 18796  0  1  0 99  0
       0 265      0 4846112 1118540 57552180    0    0     0    24 12476 18410  0  1  0 99  0
       2 265      0 4846332 1118540 57549600    0    0     0     4 12562 18163  0  1  0 99  0
       0 265      0 4844584 1118540 57550976    0    0     0    16 10843 18739  0  1  0 99  0
       1 265      0 4846072 1118540 57546488    0    0     0     8 20940 27705  0  1  0 98  0
      

      It started from following call trace.

      Feb 23 15:18:42 foss22 kernel: INFO: task kswapd0:101 blocked for more than 90 seconds.
      Feb 23 15:18:42 foss22 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      Feb 23 15:18:42 foss22 kernel: intel_powerclamp: No package C-state available
      Feb 23 15:18:42 foss22 kernel: kswapd0         D
      Feb 23 15:18:42 foss22 kernel: ffff881546baf200     0   101      2 0x00000000
      Feb 23 15:18:42 foss22 kernel: ffff88165e4b37d0 0000000000000046 ffff88165e433ec0 ffff88165e4b3fd8
      Feb 23 15:18:42 foss22 kernel: ffff88165e4b3fd8 ffff88165e4b3fd8 ffff88165e433ec0 ffff88165e4b3938
      Feb 23 15:18:42 foss22 kernel: ffff88165e4b3940 7fffffffffffffff ffff88165e433ec0 ffff881546baf200
      Feb 23 15:18:42 foss22 kernel: Call Trace:
      Feb 23 15:18:42 foss22 kernel: [<ffffffff8168d629>] schedule+0x29/0x70
      Feb 23 15:18:42 foss22 kernel: [<ffffffff8168b069>] schedule_timeout+0x239/0x2c0
      Feb 23 15:18:42 foss22 kernel: [<ffffffff8168da06>] wait_for_completion+0x116/0x170
      Feb 23 15:18:42 foss22 kernel: [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20
      Feb 23 15:18:42 foss22 kernel: [<ffffffff810b08e8>] kthread_create_on_node+0xa8/0x140
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0c6e5d0>] ? qsd_reint_index+0x1700/0x1700 [lquota]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0c6fde8>] ? qsd_start_reint_thread+0x778/0xd70 [lquota]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0c6fe73>] qsd_start_reint_thread+0x803/0xd70 [lquota]
      Feb 23 15:18:42 foss22 kernel: [<ffffffff812635ee>] ? dqput+0x16e/0x1f0
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0c74691>] qsd_ready+0x231/0x3c0 [lquota]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0c77722>] qsd_adjust+0xa2/0x900 [lquota]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0c68d1a>] ? qsd_refresh_usage+0x6a/0x2b0 [lquota]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0c78b34>] qsd_op_adjust+0x4d4/0x720 [lquota]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0d85a00>] osd_object_delete+0x1f0/0x510 [osd_ldiskfs]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0836f0d>] lu_object_free.isra.30+0x9d/0x1a0 [obdclass]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0837b96>] lu_site_purge_objects+0x326/0x4a0 [obdclass]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0838dd9>] lu_cache_shrink+0x259/0x2d0 [obdclass]
      Feb 23 15:18:42 foss22 kernel: [<ffffffff811947b3>] shrink_slab+0x163/0x330
      Feb 23 15:18:42 foss22 kernel: [<ffffffff811f5b77>] ? vmpressure+0x87/0x90
      Feb 23 15:18:42 foss22 kernel: [<ffffffff811985b1>] balance_pgdat+0x4b1/0x5e0
      Feb 23 15:18:42 foss22 kernel: [<ffffffff81198853>] kswapd+0x173/0x450
      Feb 23 15:18:42 foss22 kernel: [<ffffffff810b1b20>] ? wake_up_atomic_t+0x30/0x30
      Feb 23 15:18:42 foss22 kernel: [<ffffffff811986e0>] ? balance_pgdat+0x5e0/0x5e0
      Feb 23 15:18:42 foss22 kernel: [<ffffffff810b0a4f>] kthread+0xcf/0xe0
      Feb 23 15:18:42 foss22 kernel: [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
      Feb 23 15:18:42 foss22 kernel: [<ffffffff81698598>] ret_from_fork+0x58/0x90
      Feb 23 15:18:42 foss22 kernel: [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
      

      Then,

      Feb 23 15:18:42 foss22 kernel: INFO: task ll_ost_io00_002:3904 blocked for more than 90 seconds.
      Feb 23 15:18:42 foss22 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      Feb 23 15:18:42 foss22 kernel: ll_ost_io00_002 D ffffffffa089d1e8     0  3904      2 0x00000080
      Feb 23 15:18:42 foss22 kernel: ffff88161b11b530 0000000000000046 ffff88161d214e70 ffff88161b11bfd8
      Feb 23 15:18:42 foss22 kernel: ffff88161b11bfd8 ffff88161b11bfd8 ffff88161d214e70 ffffffffa089d1e0
      Feb 23 15:18:42 foss22 kernel: ffffffffa089d1e4 ffff88161d214e70 00000000ffffffff ffffffffa089d1e8
      Feb 23 15:18:42 foss22 kernel: Call Trace:
      Feb 23 15:18:42 foss22 kernel: [<ffffffff8168e719>] schedule_preempt_disabled+0x29/0x70
      Feb 23 15:18:42 foss22 kernel: [<ffffffff8168c365>] __mutex_lock_slowpath+0xc5/0x1d0
      Feb 23 15:18:42 foss22 kernel: [<ffffffff8168b7bf>] mutex_lock+0x1f/0x2f
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0838bed>] lu_cache_shrink+0x6d/0x2d0 [obdclass]
      Feb 23 15:18:42 foss22 kernel: [<ffffffff811946f9>] shrink_slab+0xa9/0x330
      Feb 23 15:18:42 foss22 kernel: [<ffffffff811f5b11>] ? vmpressure+0x21/0x90
      Feb 23 15:18:42 foss22 kernel: [<ffffffff81197ab2>] do_try_to_free_pages+0x3c2/0x4e0
      Feb 23 15:18:42 foss22 kernel: [<ffffffff81197ccc>] try_to_free_pages+0xfc/0x180
      Feb 23 15:18:42 foss22 kernel: [<ffffffff81683898>] __alloc_pages_slowpath+0x458/0x725
      Feb 23 15:18:42 foss22 kernel: [<ffffffff8118b655>] __alloc_pages_nodemask+0x405/0x420
      Feb 23 15:18:42 foss22 kernel: [<ffffffff811cfa0a>] alloc_pages_current+0xaa/0x170
      Feb 23 15:18:42 foss22 kernel: [<ffffffff81180be7>] __page_cache_alloc+0x97/0xb0
      Feb 23 15:18:42 foss22 kernel: [<ffffffff81181905>] find_or_create_page+0x45/0xa0
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0da8ef7>] osd_bufs_get+0x3a7/0x870 [osd_ldiskfs]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0eecbf8>] ofd_preprw+0x688/0x1220 [ofd]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0ac8faf>] ? __req_capsule_get+0x15f/0x710 [ptlrpc]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0b0cd91>] tgt_brw_read+0x9a1/0x1850 [ptlrpc]
      Feb 23 15:18:42 foss22 kernel: [<ffffffff811dda53>] ? __kmalloc+0x1f3/0x240
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0addfd0>] ? null_alloc_rs+0xa0/0x380 [ptlrpc]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0ade109>] ? null_alloc_rs+0x1d9/0x380 [ptlrpc]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0aa098f>] ? lustre_pack_reply_v2+0x14f/0x280 [ptlrpc]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0aa0b2f>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0aa0cb1>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0b0ab3b>] tgt_request_handle+0x8fb/0x11f0 [ptlrpc]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0aad91b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa06e2668>] ? lc_watchdog_touch+0x68/0x180 [libcfs]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0aab1f8>] ? ptlrpc_wait_event+0x98/0x330 [ptlrpc]
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0ab1240>] ptlrpc_main+0xc00/0x1f70 [ptlrpc]
      Feb 23 15:18:42 foss22 kernel: [<ffffffff81029569>] ? __switch_to+0xd9/0x4c0
      Feb 23 15:18:42 foss22 kernel: [<ffffffffa0ab0640>] ? ptlrpc_register_service+0x1070/0x1070 [ptlrpc]
      Feb 23 15:18:42 foss22 kernel: [<ffffffff810b0a4f>] kthread+0xcf/0xe0
      Feb 23 15:18:42 foss22 kernel: [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
      Feb 23 15:18:42 foss22 kernel: [<ffffffff81698598>] ret_from_fork+0x58/0x90
      Feb 23 15:18:42 foss22 kernel: [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bzzz Alex Zhuravlev
                Reporter:
                bzzz Alex Zhuravlev
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: