Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11964

Heavy load and soft lockups on MDS with DOM

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0, Lustre 2.12.1
    • Lustre 2.12.0
    • None
    • CentOS 7.6, Linux Kernel 3.10.0-957.1.3.el7_lustre.x86_64, Lustre 2.12 clients (Sherlock cluster)
    • 2
    • 9223372036854775807

    Description

      On Fir, our new scratch for Sherlock running 2.12, the filesystem hung today and then operations resumed. Fir is using DNE, PFL and DOM. The MDS hosting mdt[0,2] seems to have been under a heavy load. For example:

      [481509.742936] Call Trace:
      [481509.745520]  [<ffffffffc0f73303>] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc]
      [481509.753123]  [<ffffffffc0f61fc0>] ? ldlm_export_lock_object+0x10/0x10 [ptlrpc]
      [481509.760464]  [<ffffffffc0f73270>] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc]
      [481509.768321]  [<ffffffffc0f4160e>] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc]
      [481509.775403]  [<ffffffffc0f426cb>] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc]
      [481509.782482]  [<ffffffffc0f434a2>] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc]
      [481509.790863]  [<ffffffffc0f43851>] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc]
      [481509.798987]  [<ffffffffc0f6c6e2>] ldlm_bl_thread_main+0x112/0x700 [ptlrpc]
      [481509.805950]  [<ffffffff9d6d67b0>] ? wake_up_state+0x20/0x20
      [481509.811650]  [<ffffffffc0f6c5d0>] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc]
      [481509.819130]  [<ffffffff9d6c1c31>] kthread+0xd1/0xe0
      [481509.824095]  [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40
      [481509.830277]  [<ffffffff9dd74c24>] ret_from_fork_nospec_begin+0xe/0x21
      [481509.836799]  [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40
      [481509.842979] Code: 74 0e 4c 89 f2 48 89 de 4c 89 ff e8 d3 a2 fc ff 49 8b 87 10 02 00 00 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 <74> 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 
      [481516.050304] INFO: rcu_sched self-detected stall on CPU
      [481516.051302] INFO: rcu_sched self-detected stall on CPU
      [481516.051303]  {
      [481516.051303]  15
      [481516.051304] } 
      [481516.051314]  (t=60001 jiffies g=115591743 c=115591742 q=6728165)
      [481516.051315] Task dump for CPU 14:
      [481516.051316] ldlm_bl_01      R
      [481516.051316]   running task    
      [481516.051318]     0 54053      2 0x00000088
      [481516.051318] Call Trace:
      [481516.051362]  [<ffffffffc0f43851>] ? ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc]
      [481516.051394]  [<ffffffffc0f6c6e2>] ? ldlm_bl_thread_main+0x112/0x700 [ptlrpc]
      [481516.051397]  [<ffffffff9d6d67b0>] ? wake_up_state+0x20/0x20
      [481516.051426]  [<ffffffffc0f6c5d0>] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc]
      [481516.051427]  [<ffffffff9d6c1c31>] ? kthread+0xd1/0xe0
      [481516.051429]  [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40
      [481516.051431]  [<ffffffff9dd74c24>] ? ret_from_fork_nospec_begin+0xe/0x21
      [481516.051432]  [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40
      [481516.051432] Task dump for CPU 15:
      

      filesystem is still running but some threads are at 100% on the MDS

         PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                      
       54053 root      20   0       0      0      0 R 100.0  0.0   2:02.22 ldlm_bl_01                                                                                                                                   
       55962 root      20   0       0      0      0 R 100.0  0.0   2:01.79 ldlm_bl_03                                                                                                                                   
       56043 root      20   0       0      0      0 R 100.0  0.0   3:20.41 ldlm_bl_04                                                                                                                                   
       63820 root      20   0       0      0      0 R 100.0  0.0  10:13.65 mdt03_033  
      
      [root@fir-md1-s1 ~]# lctl get_param ldlm.*.*mdt*.*
      ldlm.namespaces.mdt-fir-MDT0000_UUID.contended_locks=32
      ldlm.namespaces.mdt-fir-MDT0000_UUID.contention_seconds=2
      ldlm.namespaces.mdt-fir-MDT0000_UUID.ctime_age_limit=10
      ldlm.namespaces.mdt-fir-MDT0000_UUID.dirty_age_limit=10
      ldlm.namespaces.mdt-fir-MDT0000_UUID.early_lock_cancel=0
      ldlm.namespaces.mdt-fir-MDT0000_UUID.lock_count=146645
      ldlm.namespaces.mdt-fir-MDT0000_UUID.lock_timeouts=8
      ldlm.namespaces.mdt-fir-MDT0000_UUID.lock_unused_count=0
      ldlm.namespaces.mdt-fir-MDT0000_UUID.lru_max_age=3900000
      ldlm.namespaces.mdt-fir-MDT0000_UUID.lru_size=4800
      ldlm.namespaces.mdt-fir-MDT0000_UUID.max_nolock_bytes=0
      ldlm.namespaces.mdt-fir-MDT0000_UUID.max_parallel_ast=1024
      ldlm.namespaces.mdt-fir-MDT0000_UUID.resource_count=62361
      ldlm.namespaces.mdt-fir-MDT0002_UUID.contended_locks=32
      ldlm.namespaces.mdt-fir-MDT0002_UUID.contention_seconds=2
      ldlm.namespaces.mdt-fir-MDT0002_UUID.ctime_age_limit=10
      ldlm.namespaces.mdt-fir-MDT0002_UUID.dirty_age_limit=10
      ldlm.namespaces.mdt-fir-MDT0002_UUID.early_lock_cancel=0
      ldlm.namespaces.mdt-fir-MDT0002_UUID.lock_count=6472231
      ldlm.namespaces.mdt-fir-MDT0002_UUID.lock_timeouts=166760
      ldlm.namespaces.mdt-fir-MDT0002_UUID.lock_unused_count=0
      ldlm.namespaces.mdt-fir-MDT0002_UUID.lru_max_age=3900000
      ldlm.namespaces.mdt-fir-MDT0002_UUID.lru_size=4800
      ldlm.namespaces.mdt-fir-MDT0002_UUID.max_nolock_bytes=0
      ldlm.namespaces.mdt-fir-MDT0002_UUID.max_parallel_ast=1024
      ldlm.namespaces.mdt-fir-MDT0002_UUID.resource_count=359599
      

      MDS resumed operations by itself, the last logs were:

      [482208.719700] LustreError: 63820:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1549999197, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8995f523f740/0x1a4b7ac741771fee lrc: 3/0,1 mode: --/PW res: [0x2c00033c7:0x2b6e:0x0].0x0 bits 0x40/0x0 rrc: 39993 type: IBT flags: 0x40010080000000 nid: local remote: 0x0 expref: -99 pid: 63820 timeout: 0 lvb_type: 0
      [482361.725697] LNet: Service thread pid 63820 completed after 1225.53s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      [482666.099950] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting
      

      I'm attaching the full syslog output.

      Now everything looks good even though this MDS is still quite loaded in terms of CPU.

      Thanks,
      Stephane

      Attachments

        1. fir-md1-s1.syslog-20190212.log.gz
          303 kB
        2. fir-md1-s1-20190213.log
          82 kB
        3. fir-md1-s1-dk.log
          1.27 MB
        4. fir-md1-s2-dk.log
          7.15 MB
        5. lustre-log.1550078704.21794
          39 kB
        6. sh-106-60.dk.log.gz
          4.11 MB
        7. sherlocks.txt
          17 kB

        Issue Links

          Activity

            [LU-11964] Heavy load and soft lockups on MDS with DOM
            gerrit Gerrit Updater added a comment - - edited

            (Incorrect ticket, comment removed...)

            gerrit Gerrit Updater added a comment - - edited (Incorrect ticket, comment removed...)

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34504/
            Subject: LU-11964 mdc: prevent glimpse lock count grow
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 82dad29511b32c844c849ac4ec3ec0214077f125

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34504/ Subject: LU-11964 mdc: prevent glimpse lock count grow Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 82dad29511b32c844c849ac4ec3ec0214077f125

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34504
            Subject: LU-11964 mdc: prevent glimpse lock count grow
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 6422dc451867239cb97a6ab864498077220dcbbb

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34504 Subject: LU-11964 mdc: prevent glimpse lock count grow Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 6422dc451867239cb97a6ab864498077220dcbbb

            Whoops! Great!

            sthiell Stephane Thiell added a comment - Whoops! Great!
            pjones Peter Jones added a comment -

            sthiell that's what the LTS12 label means

            pjones Peter Jones added a comment - sthiell that's what the LTS12 label means

            I noticed that this patch is not in b2_12 yet, but it should be added.

            sthiell Stephane Thiell added a comment - I noticed that this patch is not in b2_12 yet, but it should be added.
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34261/
            Subject: LU-11964 mdc: prevent glimpse lock count grow
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b915221b6d0f3457fd9dd202a9d14c5f8385bf47

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34261/ Subject: LU-11964 mdc: prevent glimpse lock count grow Project: fs/lustre-release Branch: master Current Patch Set: Commit: b915221b6d0f3457fd9dd202a9d14c5f8385bf47

            Hi Peter, yes definitively a very good news!

            I have applied the following patch on top of 2.12.0 (and only on the clients), this should be https://review.whamcloud.com/#/c/34261/ Patch Set 3:

            commit 7b8a3b32f4dc35dcb81ff1c9fd4da3e7c4184d26
            Author: Mikhail Pershin <mpershin@whamcloud.com>
            Date:   Fri Feb 15 00:51:00 2019 +0300
            
                LU-11964 mdc: prevent glimpse lock count grow
                
                DOM locks matching tries to ignore locks with
                LDLM_FL_KMS_IGNORE flag during ldlm_lock_match() but
                checks that after ldlm_lock_match() call. Therefore if
                there is any lock with such flag in queue then all other
                locks after it are ignored and new lock is created causing
                big amount of locks on single resource in some access
                patterns.
                Patch move that check inside lock_matches() function, it is
                done by negative check for that flag, i.e. if flag is
                set in lock_matches() parameters then locks with such flag
                are not matched. Corresponding test was added in sanity-dom.sh
                
                Test-Parameters: testlist=sanity-dom
                Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
                Change-Id: Ic45ca10f0e603e79a3a00e4fde13a5fae15ea5fc
            

            This is the only patch we use right now. We have also disabled the  idling connection feature, see LU-11976. I think the combination of the patch + disabling the idling connection feature have made 2.12 stable for us.

            Thanks!

            sthiell Stephane Thiell added a comment - Hi Peter, yes definitively a very good news! I have applied the following patch on top of 2.12.0 (and only on the clients), this should be https://review.whamcloud.com/#/c/34261/ Patch Set 3: commit 7b8a3b32f4dc35dcb81ff1c9fd4da3e7c4184d26 Author: Mikhail Pershin <mpershin@whamcloud.com> Date: Fri Feb 15 00:51:00 2019 +0300 LU-11964 mdc: prevent glimpse lock count grow DOM locks matching tries to ignore locks with LDLM_FL_KMS_IGNORE flag during ldlm_lock_match() but checks that after ldlm_lock_match() call. Therefore if there is any lock with such flag in queue then all other locks after it are ignored and new lock is created causing big amount of locks on single resource in some access patterns. Patch move that check inside lock_matches() function, it is done by negative check for that flag, i.e. if flag is set in lock_matches() parameters then locks with such flag are not matched. Corresponding test was added in sanity-dom.sh Test-Parameters: testlist=sanity-dom Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com> Change-Id: Ic45ca10f0e603e79a3a00e4fde13a5fae15ea5fc This is the only patch we use right now. We have also disabled the  idling connection feature, see LU-11976 . I think the combination of the patch + disabling the idling connection feature have made 2.12 stable for us. Thanks!
            pjones Peter Jones added a comment -

            That is good news Stephane. Is the LU-11964 patch the only one applied on top of the vanilla 2.12.0 - or is there anything else?

            pjones Peter Jones added a comment - That is good news Stephane. Is the LU-11964 patch the only one applied on top of the vanilla 2.12.0 - or is there anything else?

            People

              tappro Mikhail Pershin
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: