Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16941

MDT deadlock involving lquota_wb

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • Lustre 2.15.3
    • None
    • 3
    • 9223372036854775807

      Hello! Two days ago, we upgraded the Lustre servers of Sherlock's scratch (Fir) from 2.12.9 to 2.15.3. The clients were already running 2.15.x. The server upgrade went smoothly but I noticed one issue today: we got an alert that a Robinhood changelog readerĀ  (also running 2.15.3) was stuck, and indeed:

      Jul  4 11:39:40 fir-rbh01 kernel: INFO: task robinhood:8863 blocked for more than 120 seconds.
      Jul  4 11:39:40 fir-rbh01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      Jul  4 11:39:40 fir-rbh01 kernel: robinhood       D ffff9bacf6375280     0  8863      1 0x00000080
      Jul  4 11:39:40 fir-rbh01 kernel: Call Trace:
      Jul  4 11:39:40 fir-rbh01 kernel: [<ffffffffac7b8bf9>] schedule_preempt_disabled+0x29/0x70
      Jul  4 11:39:40 fir-rbh01 kernel: [<ffffffffac7b6ca7>] __mutex_lock_slowpath+0xc7/0x1e0
      Jul  4 11:39:40 fir-rbh01 kernel: [<ffffffffac7b602f>] mutex_lock+0x1f/0x33
      Jul  4 11:39:40 fir-rbh01 kernel: [<ffffffffac7ae6ed>] lookup_slow+0x33/0xab
      Jul  4 11:39:40 fir-rbh01 kernel: [<ffffffffac26bf8e>] path_lookupat+0x89e/0x8d0
      Jul  4 11:39:40 fir-rbh01 kernel: [<ffffffffac235c6b>] ? kmem_cache_alloc+0x19b/0x1f0
      Jul  4 11:39:40 fir-rbh01 kernel: [<ffffffffac26e16f>] ? getname_flags+0x4f/0x1a0
      Jul  4 11:39:40 fir-rbh01 kernel: [<ffffffffac26bfeb>] filename_lookup+0x2b/0xd0
      Jul  4 11:39:40 fir-rbh01 kernel: [<ffffffffac26f337>] user_path_at_empty+0x67/0xc0
      Jul  4 11:39:40 fir-rbh01 kernel: [<ffffffffac25e493>] ? fput+0x13/0x20
      Jul  4 11:39:40 fir-rbh01 kernel: [<ffffffffac26f3a1>] user_path_at+0x11/0x20
      Jul  4 11:39:40 fir-rbh01 kernel: [<ffffffffac2618a3>] vfs_fstatat+0x63/0xd0
      Jul  4 11:39:40 fir-rbh01 kernel: [<ffffffffac261d01>] SYSC_newlstat+0x31/0x70
      Jul  4 11:39:40 fir-rbh01 kernel: [<ffffffffac1468f6>] ? __audit_syscall_exit+0x1f6/0x2b0
      Jul  4 11:39:40 fir-rbh01 kernel: [<ffffffffac2621ce>] SyS_newlstat+0xe/0x20
      Jul  4 11:39:40 fir-rbh01 kernel: [<ffffffffac7c539a>] system_call_fastpath+0x25/0x2a
      Jul  4 11:39:40 fir-rbh01 kernel: INFO: task robinhood:8864 blocked for more than 120 seconds.
      

      I tried to restart the client and Robinhood multiple times without success, same issue. When I look at the MDT (fir-MDT0001), which is currently still running, I can notice a constant CPU load from this process:

         PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                         
       12875 root      20   0       0      0      0 S  13.6  0.0  39:41.42 lquota_wb_fir-M   
      

      Looking further with crash, there seems to be two lquota_wb threads (12871 and 12875):

      crash> bt 12871
      PID: 12871  TASK: ffff8ccd6c9b0000  CPU: 6   COMMAND: "lquota_wb_fir-M"
       #0 [ffff8ccd6ca93cf0] __schedule at ffffffffb07b78d8
       #1 [ffff8ccd6ca93d58] schedule at ffffffffb07b7ca9
       #2 [ffff8ccd6ca93d68] schedule_timeout at ffffffffb07b5778
       #3 [ffff8ccd6ca93e10] qsd_upd_thread at ffffffffc175a665 [lquota]
       #4 [ffff8ccd6ca93ec8] kthread at ffffffffb00cb621
       #5 [ffff8ccd6ca93f50] ret_from_fork_nospec_begin at ffffffffb07c51dd
      crash> bt 12875
      PID: 12875  TASK: ffff8ccd6cba1080  CPU: 2   COMMAND: "lquota_wb_fir-M"
       #0 [ffff8ccd6cbabcf0] __schedule at ffffffffb07b78d8
       #1 [ffff8ccd6cbabd58] schedule at ffffffffb07b7ca9
       #2 [ffff8ccd6cbabd68] schedule_timeout at ffffffffb07b5778
       #3 [ffff8ccd6cbabe10] qsd_upd_thread at ffffffffc175a665 [lquota]
       #4 [ffff8ccd6cbabec8] kthread at ffffffffb00cb621
       #5 [ffff8ccd6cbabf50] ret_from_fork_nospec_begin at ffffffffb07c51dd
      

      At first glance, as this involves lquota_wb, this looks similar to LU-15283, but I can't find the same backtrace signature. And the issue should be fixed in 2.15.3.

      However, the following thread is suspect for me, as it involves lquota and seems stuck:

      crash> bt 32263
      PID: 32263  TASK: ffff8cdd78a21080  CPU: 5   COMMAND: "mdt01_011"
       #0 [ffff8ccd6b5a3640] __schedule at ffffffffb07b78d8
       #1 [ffff8ccd6b5a36a8] schedule at ffffffffb07b7ca9
       #2 [ffff8ccd6b5a36b8] schedule_timeout at ffffffffb07b5778
       #3 [ffff8ccd6b5a3760] schedule_timeout_interruptible at ffffffffb07b58fe
       #4 [ffff8ccd6b5a3770] qsd_acquire at ffffffffc175e915 [lquota]
       #5 [ffff8ccd6b5a3810] qsd_op_begin0 at ffffffffc175f6bf [lquota]
       #6 [ffff8ccd6b5a38b0] qsd_op_begin at ffffffffc1760142 [lquota]
       #7 [ffff8ccd6b5a38f8] osd_declare_qid at ffffffffc17ffcb3 [osd_ldiskfs]
       #8 [ffff8ccd6b5a3950] osd_declare_attr_qid at ffffffffc17bbdfc [osd_ldiskfs]
       #9 [ffff8ccd6b5a39a8] osd_declare_attr_set at ffffffffc17bffbe [osd_ldiskfs]
      #10 [ffff8ccd6b5a39f8] lod_sub_declare_attr_set at ffffffffc1a7752c [lod]
      #11 [ffff8ccd6b5a3a48] lod_declare_attr_set at ffffffffc1a5854b [lod]
      #12 [ffff8ccd6b5a3ac8] mdd_attr_set at ffffffffc1d1ab32 [mdd]
      #13 [ffff8ccd6b5a3b48] mdt_attr_set at ffffffffc193ac36 [mdt]
      #14 [ffff8ccd6b5a3b98] mdt_reint_setattr at ffffffffc193bd3a [mdt]
      #15 [ffff8ccd6b5a3c10] mdt_reint_rec at ffffffffc193e69a [mdt]
      #16 [ffff8ccd6b5a3c38] mdt_reint_internal at ffffffffc1913f3c [mdt]
      #17 [ffff8ccd6b5a3c78] mdt_reint at ffffffffc191e807 [mdt]
      #18 [ffff8ccd6b5a3ca8] tgt_request_handle at ffffffffc13b725f [ptlrpc]
      #19 [ffff8ccd6b5a3d38] ptlrpc_server_handle_request at ffffffffc1360aa3 [ptlrpc]
      #20 [ffff8ccd6b5a3df0] ptlrpc_main at ffffffffc1362734 [ptlrpc]
      #21 [ffff8ccd6b5a3ec8] kthread at ffffffffb00cb621
      #22 [ffff8ccd6b5a3f50] ret_from_fork_nospec_begin at ffffffffb07c51dd
      

      For now, I am attaching a live "foreach bt" taken from this MDS as fir-md1-s2_fir-MDT0001_foreach_bt_lquota_wb_20230704.txt. Let me know if you have any idea on how to fix this. Thanks much.

            wc-triage WC Triage
            sthiell Stephane Thiell
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: