Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11964

Heavy load and soft lockups on MDS with DOM

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0, Lustre 2.12.1
    • Lustre 2.12.0
    • None
    • CentOS 7.6, Linux Kernel 3.10.0-957.1.3.el7_lustre.x86_64, Lustre 2.12 clients (Sherlock cluster)
    • 2
    • 9223372036854775807

    Description

      On Fir, our new scratch for Sherlock running 2.12, the filesystem hung today and then operations resumed. Fir is using DNE, PFL and DOM. The MDS hosting mdt[0,2] seems to have been under a heavy load. For example:

      [481509.742936] Call Trace:
      [481509.745520]  [<ffffffffc0f73303>] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc]
      [481509.753123]  [<ffffffffc0f61fc0>] ? ldlm_export_lock_object+0x10/0x10 [ptlrpc]
      [481509.760464]  [<ffffffffc0f73270>] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc]
      [481509.768321]  [<ffffffffc0f4160e>] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc]
      [481509.775403]  [<ffffffffc0f426cb>] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc]
      [481509.782482]  [<ffffffffc0f434a2>] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc]
      [481509.790863]  [<ffffffffc0f43851>] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc]
      [481509.798987]  [<ffffffffc0f6c6e2>] ldlm_bl_thread_main+0x112/0x700 [ptlrpc]
      [481509.805950]  [<ffffffff9d6d67b0>] ? wake_up_state+0x20/0x20
      [481509.811650]  [<ffffffffc0f6c5d0>] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc]
      [481509.819130]  [<ffffffff9d6c1c31>] kthread+0xd1/0xe0
      [481509.824095]  [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40
      [481509.830277]  [<ffffffff9dd74c24>] ret_from_fork_nospec_begin+0xe/0x21
      [481509.836799]  [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40
      [481509.842979] Code: 74 0e 4c 89 f2 48 89 de 4c 89 ff e8 d3 a2 fc ff 49 8b 87 10 02 00 00 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 <74> 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 
      [481516.050304] INFO: rcu_sched self-detected stall on CPU
      [481516.051302] INFO: rcu_sched self-detected stall on CPU
      [481516.051303]  {
      [481516.051303]  15
      [481516.051304] } 
      [481516.051314]  (t=60001 jiffies g=115591743 c=115591742 q=6728165)
      [481516.051315] Task dump for CPU 14:
      [481516.051316] ldlm_bl_01      R
      [481516.051316]   running task    
      [481516.051318]     0 54053      2 0x00000088
      [481516.051318] Call Trace:
      [481516.051362]  [<ffffffffc0f43851>] ? ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc]
      [481516.051394]  [<ffffffffc0f6c6e2>] ? ldlm_bl_thread_main+0x112/0x700 [ptlrpc]
      [481516.051397]  [<ffffffff9d6d67b0>] ? wake_up_state+0x20/0x20
      [481516.051426]  [<ffffffffc0f6c5d0>] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc]
      [481516.051427]  [<ffffffff9d6c1c31>] ? kthread+0xd1/0xe0
      [481516.051429]  [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40
      [481516.051431]  [<ffffffff9dd74c24>] ? ret_from_fork_nospec_begin+0xe/0x21
      [481516.051432]  [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40
      [481516.051432] Task dump for CPU 15:
      

      filesystem is still running but some threads are at 100% on the MDS

         PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                      
       54053 root      20   0       0      0      0 R 100.0  0.0   2:02.22 ldlm_bl_01                                                                                                                                   
       55962 root      20   0       0      0      0 R 100.0  0.0   2:01.79 ldlm_bl_03                                                                                                                                   
       56043 root      20   0       0      0      0 R 100.0  0.0   3:20.41 ldlm_bl_04                                                                                                                                   
       63820 root      20   0       0      0      0 R 100.0  0.0  10:13.65 mdt03_033  
      
      [root@fir-md1-s1 ~]# lctl get_param ldlm.*.*mdt*.*
      ldlm.namespaces.mdt-fir-MDT0000_UUID.contended_locks=32
      ldlm.namespaces.mdt-fir-MDT0000_UUID.contention_seconds=2
      ldlm.namespaces.mdt-fir-MDT0000_UUID.ctime_age_limit=10
      ldlm.namespaces.mdt-fir-MDT0000_UUID.dirty_age_limit=10
      ldlm.namespaces.mdt-fir-MDT0000_UUID.early_lock_cancel=0
      ldlm.namespaces.mdt-fir-MDT0000_UUID.lock_count=146645
      ldlm.namespaces.mdt-fir-MDT0000_UUID.lock_timeouts=8
      ldlm.namespaces.mdt-fir-MDT0000_UUID.lock_unused_count=0
      ldlm.namespaces.mdt-fir-MDT0000_UUID.lru_max_age=3900000
      ldlm.namespaces.mdt-fir-MDT0000_UUID.lru_size=4800
      ldlm.namespaces.mdt-fir-MDT0000_UUID.max_nolock_bytes=0
      ldlm.namespaces.mdt-fir-MDT0000_UUID.max_parallel_ast=1024
      ldlm.namespaces.mdt-fir-MDT0000_UUID.resource_count=62361
      ldlm.namespaces.mdt-fir-MDT0002_UUID.contended_locks=32
      ldlm.namespaces.mdt-fir-MDT0002_UUID.contention_seconds=2
      ldlm.namespaces.mdt-fir-MDT0002_UUID.ctime_age_limit=10
      ldlm.namespaces.mdt-fir-MDT0002_UUID.dirty_age_limit=10
      ldlm.namespaces.mdt-fir-MDT0002_UUID.early_lock_cancel=0
      ldlm.namespaces.mdt-fir-MDT0002_UUID.lock_count=6472231
      ldlm.namespaces.mdt-fir-MDT0002_UUID.lock_timeouts=166760
      ldlm.namespaces.mdt-fir-MDT0002_UUID.lock_unused_count=0
      ldlm.namespaces.mdt-fir-MDT0002_UUID.lru_max_age=3900000
      ldlm.namespaces.mdt-fir-MDT0002_UUID.lru_size=4800
      ldlm.namespaces.mdt-fir-MDT0002_UUID.max_nolock_bytes=0
      ldlm.namespaces.mdt-fir-MDT0002_UUID.max_parallel_ast=1024
      ldlm.namespaces.mdt-fir-MDT0002_UUID.resource_count=359599
      

      MDS resumed operations by itself, the last logs were:

      [482208.719700] LustreError: 63820:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1549999197, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8995f523f740/0x1a4b7ac741771fee lrc: 3/0,1 mode: --/PW res: [0x2c00033c7:0x2b6e:0x0].0x0 bits 0x40/0x0 rrc: 39993 type: IBT flags: 0x40010080000000 nid: local remote: 0x0 expref: -99 pid: 63820 timeout: 0 lvb_type: 0
      [482361.725697] LNet: Service thread pid 63820 completed after 1225.53s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      [482666.099950] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting
      

      I'm attaching the full syslog output.

      Now everything looks good even though this MDS is still quite loaded in terms of CPU.

      Thanks,
      Stephane

      Attachments

        1. sherlocks.txt
          17 kB
        2. sh-106-60.dk.log.gz
          4.11 MB
        3. lustre-log.1550078704.21794
          39 kB
        4. fir-md1-s2-dk.log
          7.15 MB
        5. fir-md1-s1-dk.log
          1.27 MB
        6. fir-md1-s1-20190213.log
          82 kB
        7. fir-md1-s1.syslog-20190212.log.gz
          303 kB

        Issue Links

          Activity

            People

              tappro Mikhail Pershin
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: