Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5503

MDS (2.4.2) are getting "Service thread ... inactive" and file-system times out

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • None
    • Lustre 2.4.2
    • Server: CentOS 6.3/Lustre 2.4.2
      Client: CentOS 6.3/Lustre 2.5.2
    • 3
    • 15353

    Description

      We are getting increasing number of MDS service inactive like below. Whenever this happens, clients get evicted and file-system hang.

      LNet: Service thread pid 3766 was inactive for 334.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Pid: 3766, comm: mdt_rdpg00_003

      Call Trace:
      [<ffffffff81133281>] ? shrink_inactive_list+0x191/0x830
      [<ffffffff81133017>] ? shrink_active_list+0x297/0x370
      [<ffffffff81133cce>] ? shrink_mem_cgroup_zone+0x3ae/0x610
      [<ffffffff8117289d>] ? mem_cgroup_iter+0xfd/0x280
      [<ffffffff81133f93>] ? shrink_zone+0x63/0xb0
      [<ffffffff81134b99>] ? zone_reclaim+0x349/0x400
      [<ffffffff8112a6ac>] ? get_page_from_freelist+0x69c/0x830
      [<ffffffff811b5a07>] ? __find_get_block+0x97/0xe0
      [<ffffffff8112bc43>] ? __alloc_pages_nodemask+0x113/0x8d0
      [<ffffffffa0b2d46e>] ? ldiskfs_ind_get_blocks+0x35e/0x610 [ldiskfs]
      [<ffffffff81170b10>] ? mem_cgroup_get_reclaim_stat_from_page+0x20/0x70
      [<ffffffff8112f18d>] ? update_page_reclaim_stat+0x2d/0x70
      [<ffffffff81166dc2>] ? kmem_getpages+0x62/0x170
      [<ffffffff8116742f>] ? cache_grow+0x2cf/0x320
      [<ffffffff8116774a>] ? ____cache_alloc_node+0x8a/0x160
      [<ffffffff811686db>] ? kmem_cache_alloc+0x11b/0x190
      [<ffffffffa0b526b0>] ? ldiskfs_alloc_inode+0x20/0x150 [ldiskfs]
      [<ffffffff8119d257>] ? alloc_inode+0x27/0xa0
      [<ffffffff8119e168>] ? iget_locked+0x78/0x170
      [<ffffffffa0b2f6e7>] ? ldiskfs_iget+0x37/0x800 [ldiskfs]
      [<ffffffffa0791ad2>] ? fld_server_lookup+0x72/0x3d0 [fld]
      [<ffffffff8119cbbe>] ? generic_detach_inode+0x18e/0x1f0
      [<ffffffffa0b9e4ae>] ? osd_iget+0x2e/0x2c0 [osd_ldiskfs]
      [<ffffffffa0ba9546>] ? osd_ea_fid_get+0x176/0x2c0 [osd_ldiskfs]
      [<ffffffffa0b9ff8a>] ? osd_remote_fid+0x9a/0x280 [osd_ldiskfs]
      [<ffffffffa0bb00f5>] ? osd_it_ea_rec+0xb45/0x1470 [osd_ldiskfs]
      [<ffffffffa0b1f3a5>] ? call_filldir+0xb5/0x150 [ldiskfs]
      [<ffffffffa0bad170>] ? osd_ldiskfs_filldir+0x0/0x480 [osd_ldiskfs]
      [<ffffffffa0b1fd09>] ? ldiskfs_readdir+0x5a9/0x730 [ldiskfs]
      [<ffffffffa0bad170>] ? osd_ldiskfs_filldir+0x0/0x480 [osd_ldiskfs]
      [<ffffffffa0b5f1fd>] ? htree_unlock+0x3d/0x2c6 [ldiskfs]
      [<ffffffffa0d70341>] ? lod_it_rec+0x21/0x90 [lod]
      [<ffffffffa0a241dc>] ? mdd_dir_page_build+0xfc/0x210 [mdd]
      [<ffffffffa0435b22>] ? dt_index_walk+0x162/0x3d0 [obdclass]
      [<ffffffff815101e6>] ? down_read+0x16/0x30
      [<ffffffffa0a240e0>] ? mdd_dir_page_build+0x0/0x210 [mdd]
      [<ffffffffa0a25e2b>] ? mdd_readpage+0x38b/0x5a0 [mdd]
      [<ffffffffa0cb727f>] ? mdt_readpage+0x47f/0x960 [mdt]
      [<ffffffffa0ca7a97>] ? mdt_handle_common+0x647/0x16d0 [mdt]
      [<ffffffffa0ce1685>] ? mds_readpage_handle+0x15/0x20 [mdt]
      [<ffffffffa05cf3a8>] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      [<ffffffffa02e95de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      [<ffffffffa02fad9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      [<ffffffffa05c6709>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      [<ffffffffa05d073e>] ? ptlrpc_main+0xace/0x1700 [ptlrpc]
      [<ffffffffa05cfc70>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
      [<ffffffffa05cfc70>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      [<ffffffffa05cfc70>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

      LustreError: dumping log to /tmp/lustre-log.1408400803.3766

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              haisong Haisong Cai (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: