Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2854

hanging ll_ost_io and refuse connections

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 1.8.9
    • Lustre 1.8.8
    • 3
    • 6915

    Description

      We see the hanging ll_ost_io threads, stacked something in here. And the new connections are refused until this stacked threads finished up. we only had OSS failover to fix this issue. we got backtrace, please have a look at them if see where is problem.

      Feb 20 09:22:46 oss01 kernel: Call Trace:
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff80062ff2>] thread_return+0x62/0xfe
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff80046c6e>] try_to_wake_up+0x472/0x484
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff800649e1>] __down+0x99/0xd8
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff8028b1d5>] __down_trylock+0x44/0x4e
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff800646c9>] __down_failed+0x35/0x3a
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff887fe406>] .text.lock.ldlm_pool+0x37/0x71 [ptlrpc]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff80064614>] __down_read+0x12/0x92
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff8003f279>] shrink_slab+0xd0/0x153
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff800ce4ce>] zone_reclaim+0x235/0x2cd
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff8000985a>] __d_lookup+0xb0/0xff
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff8000a939>] get_page_from_freelist+0xbf/0x442
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff88b134d2>] filter_fid2dentry+0x512/0x740 [obdfilter]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff8000f46f>] __alloc_pages+0x78/0x308
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff888a33ad>] kiblnd_launch_tx+0x16d/0x9d0 [ko2iblnd]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff80025e20>] find_or_create_page+0x32/0x72
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff88b27445>] filter_get_page+0x35/0x70 [obdfilter]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff88b27a81>] filter_preprw_read+0x601/0xd30 [obdfilter]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff886ff873>] lnet_send+0x9a3/0x9d0 [lnet]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff886fdaf7>] lnet_prep_send+0x67/0xb0 [lnet]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff88b29f0c>] filter_preprw+0x1d5c/0x1dc0 [obdfilter]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff88816c3a>] lustre_pack_reply_flags+0x86a/0x950 [ptlrpc]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff8880eaa8>] ptlrpc_send_reply+0x5e8/0x600 [ptlrpc]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff88acfac3>] ost_brw_read+0xb33/0x1a70 [ost]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff88812ed5>] lustre_msg_get_opc+0x35/0xf0 [ptlrpc]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff8008e7f9>] default_wake_function+0x0/0xe
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff88813088>] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff88ad8363>] ost_handle+0x2e73/0x55b0 [ost]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff80154b81>] __next_cpu+0x19/0x28
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff800778c5>] smp_send_reschedule+0x4e/0x53
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff888226b9>] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff88822e15>] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff8008cc1e>] __wake_up_common+0x3e/0x68
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff88823da6>] ptlrpc_main+0xf66/0x1120 [ptlrpc]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff88822e40>] ptlrpc_main+0x0/0x1120 [ptlrpc]
      Feb 20 09:22:46 oss01 kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
      Feb 20 09:22:46 oss01 kernel: 
      Feb 20 09:22:46 oss01 kernel: LustreError: dumping log to /tmp/lustre-log.1361319766.23768
      Feb 20 09:35:48 oss01 kernel: Lustre: 13106:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-383), not sending early reply
      Feb 20 09:35:48 oss01 kernel:   req@ffff8105cc619800 x1420503574224933/t0 o3->28190515-07cf-b016-422b-1b05f42dd534@NET_0x50000c0a803ae_UUID:0/0 lens 448/400 e 4 to 0 dl 1361320553 ref 2 fl Interpret:/
      0/0 rc 0/0
      Feb 20 09:35:48 oss01 kernel: Lustre: 13106:0:(service.c:808:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message
      Feb 20 09:35:49 oss01 kernel: Lustre: 8237:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-383), not sending early reply
      Feb 20 09:35:49 oss01 kernel:   req@ffff81061e650400 x1420503605573149/t0 o3->554c2172-ea07-8255-881e-847f196c5522@NET_0x50000c0a80399_UUID:0/0 lens 448/400 e 4 to 0 dl 1361320554 ref 2 fl Interpret:/
      0/0 rc 0/0
      Feb 20 09:38:06 oss01 kernel: Lustre: 10022:0:(ldlm_lib.c:574:target_handle_reconnect()) lustre-OST0000: 28190515-07cf-b016-422b-1b05f42dd534 reconnecting
      Feb 20 09:38:06 oss01 kernel: Lustre: 10022:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 65 previous similar messages
      Feb 20 09:38:06 oss01 kernel: Lustre: 10022:0:(ldlm_lib.c:874:target_handle_connect()) lustre-OST0000: refuse reconnection from 28190515-07cf-b016-422b-1b05f42dd534@192.168.3.174@o2ib to 0xffff8102d40
      8d600; still busy with 1 active RPCs
      Feb 20 09:38:06 oss01 kernel: Lustre: 10022:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 63 previous similar messages
      Feb 20 09:38:06 oss01 kernel: LustreError: 10022:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16)  req@ffff810181ffd000 x1420503574335450/t0 o8->28190515-07cf-b016-422b-1b05f42dd
      534@NET_0x50000c0a803ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1361320786 ref 1 fl Interpret:/0/0 rc -16/0
      Feb 20 09:38:06 oss01 kernel: LustreError: 10022:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 63 previous similar messages
      

      Attachments

        1. oss01_messages
          934 kB
        2. oss02_messages
          999 kB
        3. oss03_messages
          812 kB
        4. oss04_messages
          931 kB

        Activity

          People

            hongchao.zhang Hongchao Zhang
            ihara Shuichi Ihara (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: