Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10649

yield cpu after call to ll_agl_trigger

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      The statahead and agl threads loop over all entries in the
      directory without yielding the CPU. If the number of entries in
      the directory is large enough then these threads may trigger
      soft lockups. The fix is to add calls to cond_resched() after
      calling ll_agl_trigger(), which gets the glimpse lock for a
      file.

      Attachments

        Activity

          [LU-10649] yield cpu after call to ll_agl_trigger
          pjones Peter Jones added a comment -

          Landed for 2.12

          pjones Peter Jones added a comment - Landed for 2.12

          Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31240/
          Subject: LU-10649 llite: yield cpu after call to ll_agl_trigger
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 031001f0d438cf4c6b05331a5ab368c45b1b2e90

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31240/ Subject: LU-10649 llite: yield cpu after call to ll_agl_trigger Project: fs/lustre-release Branch: master Current Patch Set: Commit: 031001f0d438cf4c6b05331a5ab368c45b1b2e90

          Here's the original description of the problem:

          While doing a Robinhood scan on a client node, the client frequently reports soft lockups.

          Message from syslogd@x at May 25 13:29:33 ...
           kernel:[ 2292.633811] BUG: soft lockup - CPU#0 stuck for 23s! [ll_agl_22918:26698]
          
          Message from syslogd@x at May 25 13:32:10 ...
           kernel:[ 2448.708462] BUG: soft lockup - CPU#19 stuck for 22s! [ll_agl_22918:26698]
          
          Message from syslogd@x at May 25 13:32:57 ...
           kernel:[ 2496.528758] BUG: soft lockup - CPU#3 stuck for 23s! [ll_agl_22918:26698]
          
          Message from syslogd@x at May 25 13:33:25 ...
           kernel:[ 2524.514882] BUG: soft lockup - CPU#3 stuck for 23s! [ll_agl_22918:26698]
          [skip]
          Message from syslogd@x at May 25 13:40:53 ...
           kernel:[ 2972.344727] BUG: soft lockup - CPU#7 stuck for 23s! [ll_agl_22918:26698]
          

          Client was configured to panic on soft lockup and a couple of dumps were captured on different days. My analysis of the dumps:

          The soft lockups in the 05/25 and 05/31 dumps are occurring while  getting the glimpse locks needed by statahead to fetch file size.
          
          >  Time in runqueue
          > ^[0 00:00:02.407] [RU]  PID: 26697  TASK: ffff883aceb30080  CPU: 7   COMMAND: "ll_sa_22918"
          > ^[0 00:00:26.560] [RU]  PID: 26698  TASK: ffff883e959840c0  CPU: 16  COMMAND: "ll_agl_22918"
          
          
          > PID: 26698  TASK: ffff883e959840c0  CPU: 16  COMMAND: "ll_agl_22918"
          >  #0 [ffff883ffe603d60] machine_kexec at ffffffff8103a88c
          >  #1 [ffff883ffe603da0] __crash_kexec at ffffffff810d2b7a
          >  #2 [ffff883ffe603e60] panic at ffffffff8151050e
          >  #3 [ffff883ffe603ed8] watchdog_timer_fn at ffffffff810f8d03
          >  #4 [ffff883ffe603f00] __run_hrtimer at ffffffff8107b157
          >  #5 [ffff883ffe603f38] hrtimer_interrupt at ffffffff8107b962
          >  #6 [ffff883ffe603fa0] smp_apic_timer_interrupt at ffffffff81525aeb
          >  #7 [ffff883ffe603fb0] apic_timer_interrupt at ffffffff8152459d
          > --- <IRQ stack> ---
          >  #8 [ffff883acb56bab8] apic_timer_interrupt at ffffffff8152459d
          >     [exception RIP: keys_fill+167]
          >     RIP: ffffffffa08f5337  RSP: ffff883acb56bb60  RFLAGS: 00000203
          >     RAX: ffff883365976588  RBX: ffffffff8118b516  RCX: ffff88003604f340
          >     RDX: 0000000000000002  RSI: 0000000000000000  RDI: ffff883acb56bfd8
          >     RBP: ffff883db9e09d10   R8: 0000000000000000   R9: ffff883365976588
          >     R10: 0000000000000000  R11: 0000000000000001  R12: ffff883acb5965d8
          >     R13: ffff883ec1d11bc0  R14: ffff883ffe0024c0  R15: ffffffff81188b2e
          >     ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
          >  #9 [ffff883acb56bb78] lu_context_init at ffffffffa08f848f [obdclass]
          > #10 [ffff883acb56bb90] lu_env_init at ffffffffa08f8606 [obdclass]
          > #11 [ffff883acb56bba0] cl_env_new at ffffffffa0900f2e [obdclass]
          > #12 [ffff883acb56bbd8] cl_env_get at ffffffffa0901fac [obdclass]
          > #13 [ffff883acb56bc10] lov_sub_get at ffffffffa0d82475 [lov]
          > #14 [ffff883acb56bc50] lov_lock_delete at ffffffffa0d84b42 [lov]
          > #15 [ffff883acb56bc90] lov_lock_unuse at ffffffffa0d855e3 [lov]
          > #16 [ffff883acb56bd18] lov_lock_init_raid0 at ffffffffa0d85d92 [lov]
          > #17 [ffff883acb56bd38] cl_lock_weigh at ffffffffa0907d37 [obdclass]
          > #18 [ffff883acb56bd70] cl_lock_request at ffffffffa0907e80 [obdclass]
          > #19 [ffff883acb56bda0] cl_inode2ccc at ffffffffa0e4d5c7 [lustre]
          > #20 [ffff883acb56bdd8] cl_glimpse_size0 at ffffffffa0e4d927 [lustre]
          > #21 [ffff883acb56be10] ll_agl_trigger at ffffffffa0e48ca5 [lustre]
          > #22 [ffff883acb56be48] ll_agl_thread at ffffffffa0e4918d [lustre]
          > #23 [ffff883acb56bed8] kthread at ffffffff81077874
          > #24 [ffff883acb56bf50] ret_from_fork at ffffffff81523898
          
          I think the lockup occurs because the list of inodes that the ll_agl_thread must process is too long to be completed in the soft lockup timeout. 
          
          >  910 static int ll_agl_thread(void *arg)
          ...
          >  940         while (1) {
          >  941                 l_wait_event(thread->t_ctl_waitq,
          >  942                              !agl_list_empty(sai) ||
          >  943                              !thread_is_running(thread),
          >  944                              &lwi);
          >  945 
          >  946                 if (!thread_is_running(thread))
          >  947                         break;
          
          >  949                 spin_lock(&plli->lli_agl_lock);
          >  950                 /* The statahead thread maybe help to process AGL entries,
          >  951                  * so check whether list empty again. */
          >  952                 if (!agl_list_empty(sai)) {
          >  953                         clli = agl_first_entry(sai);
          >  954                         list_del_init(&clli->lli_agl_list);
          >  955                         spin_unlock(&plli->lli_agl_lock);
          >  956                         ll_agl_trigger(&clli->lli_vfs_inode, sai);
          >  957                         cond_resched();
          >  958                 } else {
          >  959                         spin_unlock(&plli->lli_agl_lock);
          >  960                 }
          >  961         }
          
          
          The proposed fix is add a cond_resched() to the loop after the call to ll_agl_trigger(). ll_statahead_thread() has a similar loop that has the potential to execute too long before yielding the CPU. The soft lockups are likely to occur when doing statahead on directories with large numbers of files. In the case of the 05/25 dump, the statahead thread had iterated over more than 350,000 inodes when the lockup occurred.
          
          Data from the 05/25 dump:
          
          > PID: 26697  TASK: ffff883aceb30080  CPU: 7   COMMAND: "ll_sa_22918"
          >  #0 [ffff883ffe4e5e58] crash_nmi_callback at ffffffff8102fabe
          >  #1 [ffff883ffe4e5e60] nmi_handle at ffffffff8151cb0c
          >  #2 [ffff883ffe4e5eb8] do_nmi at ffffffff8151cd48
          >  #3 [ffff883ffe4e5ef0] end_repeat_nmi at ffffffff8151c0df
          >     [exception RIP: ldlm_prepare_lru_list+426]
          >     RIP: ffffffffa0b9876a  RSP: ffff883acb5699a0  RFLAGS: 00000206
          >     RAX: ffff883721e27b90  RBX: ffff883dcf304448  RCX: 0000000000002710
          >     RDX: ffff8839e7b83b80  RSI: ffff883a0a9a6150  RDI: ffff8839f279e940
          >     RBP: 0000000000000040   R8: 0000000000000040   R9: 0000000000000040
          >     R10: 0000000000000065  R11: 0000000000000001  R12: 0000400000000000
          >     R13: 0000000200000000  R14: ffff8839e7b83b80  R15: ffff883acaee12c0
          >     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
          > --- <NMI exception stack> ---
          >  #4 [ffff883acb5699a0] ldlm_prepare_lru_list at ffffffffa0b9876a [ptlrpc]
          >  #5 [ffff883acb569a10] ldlm_cancel_lru_local at ffffffffa0b9c785 [ptlrpc]
          >  #6 [ffff883acb569a28] ldlm_prep_elc_req at ffffffffa0b9cb48 [ptlrpc]
          >  #7 [ffff883acb569a88] ldlm_prep_enqueue_req at ffffffffa0b9cc04 [ptlrpc]
          >  #8 [ffff883acb569a98] osc_enqueue_base at ffffffffa0ee8bd1 [osc]
          >  #9 [ffff883acb569b20] osc_ast_data_get at ffffffffa0ef2097 [osc]
          > #10 [ffff883acb569bc8] cl_lock_enqueue at ffffffffa0907953 [obdclass]
          > #11 [ffff883acb569bf8] lov_lock_enqueue at ffffffffa0d84ecc [lov]
          > #12 [ffff883acb569c40] cl_lock_enqueue at ffffffffa0907953 [obdclass]
          > #13 [ffff883acb569c70] cl_lock_request at ffffffffa0907ea2 [obdclass]
          > #14 [ffff883acb569ca0] cl_inode2ccc at ffffffffa0e4d5c7 [lustre]
          > #15 [ffff883acb569cd8] cl_glimpse_size0 at ffffffffa0e4d927 [lustre]
          > #16 [ffff883acb569d10] ll_agl_trigger at ffffffffa0e48ca5 [lustre]
          > #17 [ffff883acb569d48] vvp_page_is_under_lock at ffffffffa0e498f4 [lustre]
          > #18 [ffff883acb569ed8] kthread at ffffffff81077874
          > #19 [ffff883acb569f50] ret_from_fork at ffffffff81523898
          
          > crash> files 22918
          >  31 ffff883e2ff25c80 ffff883d6c00ecc0 ffff883acd74b148 DIR  /global/cscratch1/.lustre/fid/[0x2000268d0:0x1:0x0]
          >  32 ffff881e131b8080 ffff883d283b1b40 ffff8819492da688 DIR  /global/cscratch1/.lustre/fid/[0x20008da94:0xc9ad:0x0]
          >  34 ffff881e131b8180 ffff881e4e7190c0 ffff881c1348b688 DIR  /global/cscratch1/.lustre/fid/[0x20008da94:0xca4d:0x0]
          
          > crash> dentry ffff883d6c00ecc0 | grep d_inode
          >   d_inode = 0xffff883acd74b148,
          
          d_inode - 264 == ll_inode_info
          
          > crash> ll_inode_info ffff883acd74b040 | grep sai
          >       lli_sai = 0xffff883cc22fc800, 
          > crash> ll_statahead_info 0xffff883cc22fc800
          >   sai_dentry = 0xffff883d6c00ecc0, 
          >   sai_refcount = {
          >     counter = 2
          >   }, 
          >   sai_max = 32, 
          >   sai_sent = 350669, 
          >   sai_replied = 350669, 
          >   sai_index = 350670,   <-- incremented each time through statahead outer loop
          >   sai_index_wait = 337263, 
          >   sai_hit = 350637, 
          >   sai_miss = 0, 
          >   sai_consecutive_miss = 0, 
          >   sai_miss_hidden = 0, 
          >   sai_skip_hidden = 0, 
          >   sai_ls_all = 0, 
          >   sai_agl_valid = 1, 
          >   sai_in_readpage = 0,
          
          amk Ann Koehler (Inactive) added a comment - Here's the original description of the problem: While doing a Robinhood scan on a client node, the client frequently reports soft lockups. Message from syslogd@x at May 25 13:29:33 ... kernel:[ 2292.633811] BUG: soft lockup - CPU#0 stuck for 23s! [ll_agl_22918:26698] Message from syslogd@x at May 25 13:32:10 ... kernel:[ 2448.708462] BUG: soft lockup - CPU#19 stuck for 22s! [ll_agl_22918:26698] Message from syslogd@x at May 25 13:32:57 ... kernel:[ 2496.528758] BUG: soft lockup - CPU#3 stuck for 23s! [ll_agl_22918:26698] Message from syslogd@x at May 25 13:33:25 ... kernel:[ 2524.514882] BUG: soft lockup - CPU#3 stuck for 23s! [ll_agl_22918:26698] [skip] Message from syslogd@x at May 25 13:40:53 ... kernel:[ 2972.344727] BUG: soft lockup - CPU#7 stuck for 23s! [ll_agl_22918:26698] Client was configured to panic on soft lockup and a couple of dumps were captured on different days. My analysis of the dumps: The soft lockups in the 05/25 and 05/31 dumps are occurring while getting the glimpse locks needed by statahead to fetch file size. > Time in runqueue > ^[0 00:00:02.407] [RU] PID: 26697 TASK: ffff883aceb30080 CPU: 7 COMMAND: "ll_sa_22918" > ^[0 00:00:26.560] [RU] PID: 26698 TASK: ffff883e959840c0 CPU: 16 COMMAND: "ll_agl_22918" > PID: 26698 TASK: ffff883e959840c0 CPU: 16 COMMAND: "ll_agl_22918" > #0 [ffff883ffe603d60] machine_kexec at ffffffff8103a88c > #1 [ffff883ffe603da0] __crash_kexec at ffffffff810d2b7a > #2 [ffff883ffe603e60] panic at ffffffff8151050e > #3 [ffff883ffe603ed8] watchdog_timer_fn at ffffffff810f8d03 > #4 [ffff883ffe603f00] __run_hrtimer at ffffffff8107b157 > #5 [ffff883ffe603f38] hrtimer_interrupt at ffffffff8107b962 > #6 [ffff883ffe603fa0] smp_apic_timer_interrupt at ffffffff81525aeb > #7 [ffff883ffe603fb0] apic_timer_interrupt at ffffffff8152459d > --- <IRQ stack> --- > #8 [ffff883acb56bab8] apic_timer_interrupt at ffffffff8152459d > [exception RIP: keys_fill+167] > RIP: ffffffffa08f5337 RSP: ffff883acb56bb60 RFLAGS: 00000203 > RAX: ffff883365976588 RBX: ffffffff8118b516 RCX: ffff88003604f340 > RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff883acb56bfd8 > RBP: ffff883db9e09d10 R8: 0000000000000000 R9: ffff883365976588 > R10: 0000000000000000 R11: 0000000000000001 R12: ffff883acb5965d8 > R13: ffff883ec1d11bc0 R14: ffff883ffe0024c0 R15: ffffffff81188b2e > ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018 > #9 [ffff883acb56bb78] lu_context_init at ffffffffa08f848f [obdclass] > #10 [ffff883acb56bb90] lu_env_init at ffffffffa08f8606 [obdclass] > #11 [ffff883acb56bba0] cl_env_new at ffffffffa0900f2e [obdclass] > #12 [ffff883acb56bbd8] cl_env_get at ffffffffa0901fac [obdclass] > #13 [ffff883acb56bc10] lov_sub_get at ffffffffa0d82475 [lov] > #14 [ffff883acb56bc50] lov_lock_delete at ffffffffa0d84b42 [lov] > #15 [ffff883acb56bc90] lov_lock_unuse at ffffffffa0d855e3 [lov] > #16 [ffff883acb56bd18] lov_lock_init_raid0 at ffffffffa0d85d92 [lov] > #17 [ffff883acb56bd38] cl_lock_weigh at ffffffffa0907d37 [obdclass] > #18 [ffff883acb56bd70] cl_lock_request at ffffffffa0907e80 [obdclass] > #19 [ffff883acb56bda0] cl_inode2ccc at ffffffffa0e4d5c7 [lustre] > #20 [ffff883acb56bdd8] cl_glimpse_size0 at ffffffffa0e4d927 [lustre] > #21 [ffff883acb56be10] ll_agl_trigger at ffffffffa0e48ca5 [lustre] > #22 [ffff883acb56be48] ll_agl_thread at ffffffffa0e4918d [lustre] > #23 [ffff883acb56bed8] kthread at ffffffff81077874 > #24 [ffff883acb56bf50] ret_from_fork at ffffffff81523898 I think the lockup occurs because the list of inodes that the ll_agl_thread must process is too long to be completed in the soft lockup timeout. > 910 static int ll_agl_thread(void *arg) ... > 940 while (1) { > 941 l_wait_event(thread->t_ctl_waitq, > 942 !agl_list_empty(sai) || > 943 !thread_is_running(thread), > 944 &lwi); > 945 > 946 if (!thread_is_running(thread)) > 947 break; > 949 spin_lock(&plli->lli_agl_lock); > 950 /* The statahead thread maybe help to process AGL entries, > 951 * so check whether list empty again. */ > 952 if (!agl_list_empty(sai)) { > 953 clli = agl_first_entry(sai); > 954 list_del_init(&clli->lli_agl_list); > 955 spin_unlock(&plli->lli_agl_lock); > 956 ll_agl_trigger(&clli->lli_vfs_inode, sai); > 957 cond_resched(); > 958 } else { > 959 spin_unlock(&plli->lli_agl_lock); > 960 } > 961 } The proposed fix is add a cond_resched() to the loop after the call to ll_agl_trigger(). ll_statahead_thread() has a similar loop that has the potential to execute too long before yielding the CPU. The soft lockups are likely to occur when doing statahead on directories with large numbers of files. In the case of the 05/25 dump, the statahead thread had iterated over more than 350,000 inodes when the lockup occurred. Data from the 05/25 dump: > PID: 26697 TASK: ffff883aceb30080 CPU: 7 COMMAND: "ll_sa_22918" > #0 [ffff883ffe4e5e58] crash_nmi_callback at ffffffff8102fabe > #1 [ffff883ffe4e5e60] nmi_handle at ffffffff8151cb0c > #2 [ffff883ffe4e5eb8] do_nmi at ffffffff8151cd48 > #3 [ffff883ffe4e5ef0] end_repeat_nmi at ffffffff8151c0df > [exception RIP: ldlm_prepare_lru_list+426] > RIP: ffffffffa0b9876a RSP: ffff883acb5699a0 RFLAGS: 00000206 > RAX: ffff883721e27b90 RBX: ffff883dcf304448 RCX: 0000000000002710 > RDX: ffff8839e7b83b80 RSI: ffff883a0a9a6150 RDI: ffff8839f279e940 > RBP: 0000000000000040 R8: 0000000000000040 R9: 0000000000000040 > R10: 0000000000000065 R11: 0000000000000001 R12: 0000400000000000 > R13: 0000000200000000 R14: ffff8839e7b83b80 R15: ffff883acaee12c0 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > --- <NMI exception stack> --- > #4 [ffff883acb5699a0] ldlm_prepare_lru_list at ffffffffa0b9876a [ptlrpc] > #5 [ffff883acb569a10] ldlm_cancel_lru_local at ffffffffa0b9c785 [ptlrpc] > #6 [ffff883acb569a28] ldlm_prep_elc_req at ffffffffa0b9cb48 [ptlrpc] > #7 [ffff883acb569a88] ldlm_prep_enqueue_req at ffffffffa0b9cc04 [ptlrpc] > #8 [ffff883acb569a98] osc_enqueue_base at ffffffffa0ee8bd1 [osc] > #9 [ffff883acb569b20] osc_ast_data_get at ffffffffa0ef2097 [osc] > #10 [ffff883acb569bc8] cl_lock_enqueue at ffffffffa0907953 [obdclass] > #11 [ffff883acb569bf8] lov_lock_enqueue at ffffffffa0d84ecc [lov] > #12 [ffff883acb569c40] cl_lock_enqueue at ffffffffa0907953 [obdclass] > #13 [ffff883acb569c70] cl_lock_request at ffffffffa0907ea2 [obdclass] > #14 [ffff883acb569ca0] cl_inode2ccc at ffffffffa0e4d5c7 [lustre] > #15 [ffff883acb569cd8] cl_glimpse_size0 at ffffffffa0e4d927 [lustre] > #16 [ffff883acb569d10] ll_agl_trigger at ffffffffa0e48ca5 [lustre] > #17 [ffff883acb569d48] vvp_page_is_under_lock at ffffffffa0e498f4 [lustre] > #18 [ffff883acb569ed8] kthread at ffffffff81077874 > #19 [ffff883acb569f50] ret_from_fork at ffffffff81523898 > crash> files 22918 > 31 ffff883e2ff25c80 ffff883d6c00ecc0 ffff883acd74b148 DIR /global/cscratch1/.lustre/fid/[0x2000268d0:0x1:0x0] > 32 ffff881e131b8080 ffff883d283b1b40 ffff8819492da688 DIR /global/cscratch1/.lustre/fid/[0x20008da94:0xc9ad:0x0] > 34 ffff881e131b8180 ffff881e4e7190c0 ffff881c1348b688 DIR /global/cscratch1/.lustre/fid/[0x20008da94:0xca4d:0x0] > crash> dentry ffff883d6c00ecc0 | grep d_inode > d_inode = 0xffff883acd74b148, d_inode - 264 == ll_inode_info > crash> ll_inode_info ffff883acd74b040 | grep sai > lli_sai = 0xffff883cc22fc800, > crash> ll_statahead_info 0xffff883cc22fc800 > sai_dentry = 0xffff883d6c00ecc0, > sai_refcount = { > counter = 2 > }, > sai_max = 32, > sai_sent = 350669, > sai_replied = 350669, > sai_index = 350670, <-- incremented each time through statahead outer loop > sai_index_wait = 337263, > sai_hit = 350637, > sai_miss = 0, > sai_consecutive_miss = 0, > sai_miss_hidden = 0, > sai_skip_hidden = 0, > sai_ls_all = 0, > sai_agl_valid = 1, > sai_in_readpage = 0,

          Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/31240
          Subject: LU-10649 llite: yield cpu after call to ll_agl_trigger
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: cf33e2721bb4196be7318f177018553b1c048527

          gerrit Gerrit Updater added a comment - Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/31240 Subject: LU-10649 llite: yield cpu after call to ll_agl_trigger Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: cf33e2721bb4196be7318f177018553b1c048527

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: