Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10697

MDT locking issues after failing over OSTs from hung OSS

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • None
    • Lustre 2.10.3
    • None
    • 3.10.0-693.2.2.el7_lustre.pl1.x86_64
    • 3
    • 9223372036854775807

    Description

      We have more server instabilities with 2.10.3.

      The problem started with an hung OSS `oak-io2-s2` this morning, it's not clear what happened and would required some analysis of the crash dump. The first stack trace in the log is the following:

      [1369496.805186] INFO: task systemd:1 blocked for more than 120 seconds.
      [1369496.826365] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [1369496.852691] systemd         D ffffffff81a8ae08     0     1      0 0x00000000
      [1369496.876463]  ffff88015368fb50 0000000000000082 ffff880153698000 ffff88015368ffd8
      [1369496.901403]  ffff88015368ffd8 ffff88015368ffd8 ffff880153698000 ffffffff81a8ae00
      [1369496.926336]  ffffffff81a8ae04 ffff880153698000 00000000ffffffff ffffffff81a8ae08
      [1369496.951268] Call Trace:
      [1369496.959866]  [<ffffffff816aa409>] schedule_preempt_disabled+0x29/0x70
      [1369496.981629]  [<ffffffff816a8337>] __mutex_lock_slowpath+0xc7/0x1d0
      [1369497.002523]  [<ffffffff816a774f>] mutex_lock+0x1f/0x2f
      [1369497.019998]  [<ffffffff8127f9a2>] sysfs_permission+0x32/0x60
      [1369497.039175]  [<ffffffff8120c31e>] __inode_permission+0x6e/0xc0
      [1369497.058921]  [<ffffffff8120c388>] inode_permission+0x18/0x50
      [1369497.078096]  [<ffffffff8120e44e>] link_path_walk+0x27e/0x8b0
      [1369497.097273]  [<ffffffff8120ebdb>] path_lookupat+0x6b/0x7b0
      [1369497.115877]  [<ffffffff811df555>] ? kmem_cache_alloc+0x35/0x1e0
      [1369497.135925]  [<ffffffff81211d2f>] ? getname_flags+0x4f/0x1a0
      [1369497.155099]  [<ffffffff8120f34b>] filename_lookup+0x2b/0xc0
      [1369497.173988]  [<ffffffff81212ec7>] user_path_at_empty+0x67/0xc0
      [1369497.193750]  [<ffffffff8124b919>] ? ep_scan_ready_list.isra.7+0x1b9/0x1f0
      [1369497.216641]  [<ffffffff81212f31>] user_path_at+0x11/0x20
      [1369497.234673]  [<ffffffff81206473>] vfs_fstatat+0x63/0xc0
      [1369497.252433]  [<ffffffff81206a94>] SYSC_newfstatat+0x24/0x60
      [1369497.271340]  [<ffffffff810eaaba>] ? get_monotonic_boottime+0x4a/0x100
      [1369497.293088]  [<ffffffff810aec01>] ? posix_get_boottime+0x11/0x20
      [1369497.313423]  [<ffffffff810b0271>] ? SyS_clock_gettime+0x81/0xc0
      [1369497.333456]  [<ffffffff81206cde>] SyS_newfstatat+0xe/0x10
      [1369497.351774]  [<ffffffff816b5009>] system_call_fastpath+0x16/0x1b
      
      
      

      Because OSTs were unresponsive, we took a crash dump at 11:10:

      Failing over the OSTs to the partner OSS seemed to work at first.

      Then users reported that some operations on directories were stuck, so I took a look at the MDS and found it with warnings and stack traces (see attached file `oak-md1-s2.kernel.log` for full logs). This is just to show the beginning:

      Feb 21 11:20:54 oak-md1-s2 kernel: LustreError: 102992:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) oak-OST003f-osc-MDT0000: cannot cleanup orphans: rc = -11
      Feb 21 11:20:54 oak-md1-s2 kernel: LustreError: 102992:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) Skipped 21 previous similar messages
      Feb 21 11:28:39 oak-md1-s2 kernel: Lustre: oak-OST0051-osc-MDT0000: Connection restored to 10.0.2.105@o2ib5 (at 10.0.2.105@o2ib5)
      Feb 21 11:28:39 oak-md1-s2 kernel: Lustre: Skipped 946 previous similar messages
      Feb 21 11:28:47 oak-md1-s2 kernel: LustreError: 102692:0:(client.c:3007:ptlrpc_replay_interpret()) @@@ status -2, old was 0  req@ffff88020ad62a00 x1592931638579664/t4295689592(4295689592) o6->oak-OST0053-osc-MDT0000@10.0.2.105@o2ib5:28/4 lens 664/400 e 5 to 0 dl 1519241348 ref 2 fl Interpret:R/4/0 rc -2/-2
      Feb 21 11:31:33 oak-md1-s2 kernel: INFO: task mdt01_002:102837 blocked for more than 120 seconds.
      Feb 21 11:31:33 oak-md1-s2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      Feb 21 11:31:33 oak-md1-s2 kernel: mdt01_002       D ffffffff00000000     0 102837      2 0x00000080
      Feb 21 11:31:33 oak-md1-s2 kernel: ffff88102428f558 0000000000000046 ffff88027975cf10 ffff88102428ffd8
      Feb 21 11:31:33 oak-md1-s2 kernel: ffff88102428ffd8 ffff88102428ffd8 ffff88027975cf10 ffff88027975cf10
      Feb 21 11:31:33 oak-md1-s2 kernel: ffff88036428f248 ffff88036428f240 fffffffe00000001 ffffffff00000000
      Feb 21 11:31:33 oak-md1-s2 kernel: Call Trace:
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffff816a94e9>] schedule+0x29/0x70
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffff816aadd5>] rwsem_down_write_failed+0x225/0x3a0
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffff81332047>] call_rwsem_down_write_failed+0x17/0x30
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffff816a87cd>] down_write+0x2d/0x3d
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc128f7f4>] lod_qos_prep_create+0xaa4/0x17f0 [lod]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc1017200>] ? qsd_op_begin+0xb0/0x4d0 [lquota]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc1094d10>] ? osd_declare_qid+0x1f0/0x480 [osd_ldiskfs]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc1290ab8>] lod_prepare_create+0x298/0x3f0 [lod]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc105807e>] ? osd_idc_find_and_init+0x7e/0x100 [osd_ldiskfs]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc128563e>] lod_declare_striped_create+0x1ee/0x970 [lod]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc1287b54>] lod_declare_create+0x1e4/0x540 [lod]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc12f3a1f>] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc12e4b63>] mdd_declare_create+0x53/0xe20 [mdd]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc12e8b69>] mdd_create+0x7d9/0x1320 [mdd]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc11ba9bc>] mdt_reint_open+0x218c/0x31a0 [mdt]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc09ef4ce>] ? upcall_cache_get_entry+0x20e/0x8f0 [obdclass]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc119faa3>] ? ucred_set_jobid+0x53/0x70 [mdt]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc11af8a0>] mdt_reint_rec+0x80/0x210 [mdt]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc119130b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc1191832>] mdt_intent_reint+0x162/0x430 [mdt]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc119c59e>] mdt_intent_policy+0x43e/0xc70 [mdt]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc0bbc672>] ? ldlm_resource_get+0x5e2/0xa30 [ptlrpc]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc0bb5277>] ldlm_lock_enqueue+0x387/0x970 [ptlrpc]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc0bde903>] ldlm_handle_enqueue0+0x9c3/0x1680 [ptlrpc]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc0c06ae0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc0c63ea2>] tgt_enqueue+0x62/0x210 [ptlrpc]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc0c67da5>] tgt_request_handle+0x925/0x1370 [ptlrpc]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc0c10b16>] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc0c0d148>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffff810c4822>] ? default_wake_function+0x12/0x20
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffff810ba588>] ? __wake_up_common+0x58/0x90
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc0c14252>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffffc0c137c0>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffff816b4f58>] ret_from_fork+0x58/0x90
      Feb 21 11:31:33 oak-md1-s2 kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
      Feb 21 11:31:33 oak-md1-s2 kernel: INFO: task mdt00_004:103144 blocked for more than 120 seconds.
      
      
      

      then slightly later:

      Feb 21 11:35:25 oak-md1-s2 kernel: LustreError: 103142:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1519241425, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-oak-MDT0000_UUID lock: ffff8801fa7b4200/0xec47b06a6b3b9c78 lrc: 3/0,1 mode: --/CW res: [0x20000f271:0x2562:0x0].0x0 bits 0x2 rrc: 11 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 103142 timeout: 0 lvb_type: 0
      Feb 21 11:35:25 oak-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1519241725.103142
      Feb 21 11:35:27 oak-md1-s2 kernel: LustreError: 102833:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1519241427, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-oak-MDT0000_UUID lock: ffff880afe56c400/0xec47b06a6b3d396a lrc: 3/0,1 mode: --/CW res: [0x20000f271:0x2562:0x0].0x0 bits 0x2 rrc: 11 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102833 timeout: 0 lvb_type: 0
      Feb 21 11:35:42 oak-md1-s2 kernel: LustreError: 220676:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1519241442, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-oak-MDT0000_UUID lock: ffff880e0908aa00/0xec47b06a6b44312f lrc: 3/1,0 mode: --/PR res: [0x20000db01:0xda33:0x0].0x0 bits 0x13 rrc: 4 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 220676 timeout: 0 lvb_type: 0
      Feb 21 11:37:04 oak-md1-s2 kernel: LNet: Service thread pid 102832 completed after 399.98s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      Feb 21 11:37:04 oak-md1-s2 kernel: LNet: Skipped 3 previous similar messages
      Feb 21 11:37:06 oak-md1-s2 kernel: LustreError: 220680:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1519241526, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-oak-MDT0000_UUID lock: ffff880248745a00/0xec47b06a6b61994d lrc: 3/1,0 mode: --/PR res: [0x20000c387:0x1420f:0x0].0x0 bits 0x13 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 220680 timeout: 0 lvb_type: 0
      Feb 21 11:38:44 oak-md1-s2 kernel: LNet: Service thread pid 103154 completed after 499.94s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      Feb 21 11:38:57 oak-md1-s2 kernel: LNet: Service thread pid 103147 was inactive for 412.60s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      
      
      

      We decided to do a stop/start of the MDT at 12:04:

      Feb 21 12:04:28 oak-md1-s2 kernel: Lustre: Failing over oak-MDT0000
      
      
      

      The full logs show many granted locks.

      Recovery completed at 12:21 with 1 client evicted:

      Feb 21 12:21:46 oak-md1-s2 kernel: Lustre: oak-MDT0000: Recovery over after 14:38, of 1263 clients 1262 recovered and 1 was evicted.
      
      
      

      Which fixed the directory access issues.

      That's the third time I see this issue (after OST failover) since the upgrade to 2.10.3. Never seen  with 2.10.1+patches or 2.10.2 (without). Here, all servers are running 2.10.3, MDS is running 2.10.3 + first patch from LU-10680.

      Note: most of our clients are still running 2.10.0 (~900) and 2.10.1 (~300) but we haven't seen any Lustre issues on the clients so far.

      Thanks!

      Attachments

        Issue Links

          Activity

            People

              bfaccini Bruno Faccini (Inactive)
              sthiell Stephane Thiell
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: