Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2429

easy to find bad client

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 1.8.x (1.8.0 - 1.8.5)
    • None
    • lustre 1.8.8 RHEL5
    • 3
    • 5754

    Description

      we have a network problem at the customer site, the clients are still running, but network is unstable. In that situation, sometimes Lustre servers refuses new connections due to still waiting some active RPC finish.

      e.g.)
      Nov 6 10:51:00 oss212 kernel: Lustre: 21280:0:(ldlm_lib.c:874:target_handle_connect()) LARGE01-OST004c: refuse reconnection from 6279e611-9d6b-3d6a-bab4-e76cf925282f@560@gni to 0xffff81043d807a00; still busy with 1 active RPCs
      Nov 6 10:51:16 oss212 kernel: LustreError: 21337:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (107) req@ffff8106a3c46400 x1415646605273905/t0 o400><?>@<?>:0/0 lens 192/0 e 0 to 0 dl 1352166761 ref 1 fl Interpret:H/0/0 rc -107/0

      Some cases, we can find bad client and reboot them or evict servers and reconnect, then situation can be back.

      Howerver, most of cases, it's hard to find bad client, and keeping the error messages. If we can find bad client, new clients can't reconnect until all clients reboot. this is not good idea..

      Any good idea to easy find bad client when the above logs happen?

      Attachments

        1. 20121210_t2s007037_sysrq_t.log.tgz
          150 kB
          Shuichi Ihara
        2. 20121210_t2s007037.log
          241 kB
          Shuichi Ihara

        Issue Links

          Activity

            People

              bfaccini Bruno Faccini (Inactive)
              ihara Shuichi Ihara (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: