Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1215

mdt can not connect

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • None
    • None
    • 2 mds,4 oss, 6 ost, 2 client, IB
    • 3
    • 10701

    Description

      The mdt can not connect when i copy 500G files . In the mds where mdt is mount, show the demsg:
      CfsError: dumping log to /tmp/cfs-log.1331138951.11082
      Cfs: Service thread pid 20629 was inactive for 600.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      CfsError: dumping log to /tmp/cfs-log.1331138952.20629
      Cfs: Service thread pid 20630 was inactive for 600.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      CfsError: dumping log to /tmp/cfs-log.1331138952.20630
      CfsError: dumping log to /tmp/cfs-log.1331138952.20631
      Cfs: Service thread pid 8373 was inactive for 600.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      Cfs: Skipped 1 previous similar message
      CfsError: dumping log to /tmp/cfs-log.1331138952.8373
      Cfs: 11112:0:(ldlm_lib.c:766:target_handle_connect()) saictfs-MDT0000: exp ffff810215710200 already connecting
      Cfs: 11112:0:(ldlm_lib.c:766:target_handle_connect()) Skipped 5 previous similar messages
      Cfs: 6752:0:(service.c:803:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-207), not sending early reply
      req@ffff81014cf88400 x1390039116158766/t0 o401->LOV_OSC_UUID@192.168.1.105@tcp:0/0 lens 2944/0 e 5 to 0 dl 1331139163 ref 2 fl Interpret:/0/0 rc 0/0
      Cfs: 20628:0:(service.c:803:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-207), not sending early reply
      req@ffff810210775450 x1390039054544125/t0 o401->LOV_OSC_UUID@192.168.1.102@tcp:0/0 lens 2240/0 e 5 to 0 dl 1331139164 ref 2 fl Interpret:/0/0 rc 0/0
      Cfs: 20628:0:(service.c:803:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages
      CfsError: 11128:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error (114) req@ffff8100bae54800 x1395800118657118/t0 o38><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1331139359 ref 1 fl Interpret:/0/0 rc -114/0
      CfsError: 11128:0:(ldlm_lib.c:1848:target_send_reply_msg()) Skipped 10 previous similar messages
      Cfs: Service thread pid 11098 was inactive for 1200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Cfs: 0:0:(linux-debug.c:264:libcfs_debug_dumpstack()) showing stack for process 11098
      ll_mdt_01 D ffff810001015000 0 11098 1 11099 11097 (L-TLB)
      ffff810201a31820 0000000000000046 ffff81009fe6cad1 ffff810201a31880
      ffff810201362158 000000000000000a ffff81021997d7a0 ffff810107f5a080
      000a2eade35d9a19 000000000000ec36 ffff81021997d988 0000000207f14e40
      Call Trace:
      [<ffffffff8803209c>] :jbd:start_this_handle+0x329/0x3ed
      [<ffffffff8009f6d0>] autoremove_wake_function+0x0/0x2e
      [<ffffffff88032233>] :jbd:journal_start+0xd3/0x107
      [<ffffffff88b77a5f>] :fsfilt_ldiskfs:fsfilt_ldiskfs_start+0x55f/0x630
      [<ffffffff8000d3a4>] dput+0x2c/0x113
      [<ffffffff88b285d3>] :mds:mds_client_add+0x6b3/0xe00
      [<ffffffff887ea1b0>] :obdclass:class_handle2object+0xe0/0x170
      [<ffffffff88b0377c>] :mds:mds_connect+0x45c/0x7f0
      [<ffffffff88b0ac40>] :mds:mds_handle+0x0/0x4d60
      [<ffffffff88b0ac40>] :mds:mds_handle+0x0/0x4d60
      [<ffffffff88867863>] :ptlrpc:target_handle_connect+0x21d3/0x2dd0
      [<ffffffff8889dcc4>] :ptlrpc:cfs_msg_set_timeout+0x34/0x110
      [<ffffffff88892ce8>] :ptlrpc:ptlrpc_send_reply+0x5c8/0x5e0
      [<ffffffff88b0b95e>] :mds:mds_handle+0xd1e/0x4d60
      [<ffffffff8874b3d5>] :lnet:lnet_match_blocked_msg+0x385/0x3a0
      [<ffffffff8014dded>] __next_cpu+0x19/0x28
      [<ffffffff888a444e>] :ptlrpc:ptlrpc_server_handle_request+0xa8e/0x1130
      [<ffffffff80047080>] try_to_wake_up+0x472/0x484
      [<ffffffff80062fc8>] thread_return+0x62/0xfe
      [<ffffffff8008a2a7>] __wake_up_common+0x3e/0x68
      [<ffffffff888a7ea8>] :ptlrpc:ptlrpc_main+0x1258/0x1420
      [<ffffffff8008be7d>] default_wake_function+0x0/0xe
      [<ffffffff800b65bf>] audit_syscall_exit+0x336/0x362
      [<ffffffff8005dfb1>] child_rip+0xa/0x11
      [<ffffffff888a6c50>] :ptlrpc:ptlrpc_main+0x0/0x1420
      [<ffffffff8005dfa7>] child_rip+0x0/0x11

      CfsError: dumping log to /tmp/cfs-log.1331139492.11098
      Cfs: 11121:0:(ldlm_lib.c:766:target_handle_connect()) saictfs-MDT0000: exp ffff810215710200 already connecting
      Cfs: 11121:0:(ldlm_lib.c:766:target_handle_connect()) Skipped 6 previous similar messages
      CfsError: 11106:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error (114) req@ffff810219608800 x1395800118657158/t0 o38><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1331139962 ref 1 fl Interpret:/0/0 rc -114/0
      CfsError: 11106:0:(ldlm_lib.c:1848:target_send_reply_msg()) Skipped 7 previous similar messages
      Cfs: 11119:0:(ldlm_lib.c:766:target_handle_connect()) saictfs-MDT0000: exp ffff810215710200 already connecting
      Cfs: 11119:0:(ldlm_lib.c:766:target_handle_connect()) Skipped 12 previous similar messages
      CfsError: 11097:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error (114) req@ffff81021aa41c00 x1395800118657210/t0 o38><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1331140575 ref 1 fl Interpret:/0/0 rc -114/0
      CfsError: 11097:0:(ldlm_lib.c:1848:target_send_reply_msg()) Skipped 13 previous similar messages
      Cfs: 11121:0:(ldlm_lib.c:766:target_handle_connect()) saictfs-MDT0000: exp ffff810215710200 already connecting
      Cfs: 11121:0:(ldlm_lib.c:766:target_handle_connect()) Skipped 14 previous similar messages
      CfsError: 10803:0:(lib-move.c:2613:LNetGet()) error sending GET to 12345-192.168.1.108@tcp: -113
      CfsError: 11116:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error (114) req@ffff810219608800 x1395800118657283/t0 o38><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1331141206 ref 1 fl Interpret:/0/0 rc -114/0
      CfsError: 11116:0:(ldlm_lib.c:1848:target_send_reply_msg()) Skipped 14 previous similar messages
      so ,i donot kown why. Can anybody give me a help? thanks!

      Attachments

        Activity

          People

            wc-triage WC Triage
            zhen zhen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: