Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-316

system hang when running lustre-rsync-test test_5b

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.1.0
    • Lustre 2.1.0
    • None
    • lustre-master/rhel6-x86_64/#114
    • 3
    • 5028

    Description

      system hang when running lustre-rsync-test test_5b

      Lustre: DEBUG MARKER: == lustre-rsync-test test 5b: Kill / restart lustre_rsync == 19:47:48 (1305254868)
      INFO: task lustre_rsync:8898 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      lustre_rsync D ffff88033fc24700 0 8898 1 0x00000080
      ffff8802fd019c40 0000000000000082 0000000000000000 ffff880032e769f0
      ffff8802fd019be0 ffffffff8105fe65 ffff8802fd019bc0 0000000100115300
      ffff88030a7bc678 ffff8802fd019fd8 0000000000010518 ffff88030a7bc678
      Call Trace:
      [<ffffffff8105fe65>] ? task_new_fair+0xb5/0x100
      [<ffffffff814c9cf5>] schedule_timeout+0x225/0x2f0
      [<ffffffffa02ec6de>] ? cfs_waitq_del+0xe/0x10 [libcfs]
      [<ffffffff814c9963>] wait_for_common+0x123/0x180
      [<ffffffff8105c540>] ? default_wake_function+0x0/0x20
      [<ffffffff814c9a7d>] wait_for_completion+0x1d/0x20
      [<ffffffffa03f9ad5>] llog_process_flags+0x115/0x680 [obdclass]
      [<ffffffffa0552897>] ? llog_client_read_header+0x187/0x640 [ptlrpc]
      [<ffffffffa03fce98>] llog_cat_process_flags+0x188/0x2d0 [obdclass]
      [<ffffffffa03fbf8f>] ? llog_init_handle+0x17f/0xa70 [obdclass]
      [<ffffffffa0709170>] ? changelog_show_cb+0x0/0x310 [mdc]
      [<ffffffffa0712aae>] mdc_changelog_send_thread+0x4ce/0xb90 [mdc]
      [<ffffffff81068bc4>] ? __mmdrop+0x44/0x60
      [<ffffffff81059e2c>] ? finish_task_switch+0xac/0xd0
      [<ffffffff810141ca>] child_rip+0xa/0x20
      [<ffffffffa07125e0>] ? mdc_changelog_send_thread+0x0/0xb90 [mdc]
      [<ffffffff810141c0>] ? child_rip+0x0/0x20
      INFO: task lustre_rsync:8899 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      lustre_rsync D ffff88033fc24900 0 8899 8898 0x00000080
      ffff88030e1c9c10 0000000000000082 0000000000000000 00000000000022c4
      ffff88030e1c9b80 ffffffff810348fe ffff88030e1c9b90 000000010011535c
      ffff8802fcd9f068 ffff88030e1c9fd8 0000000000010518 ffff8802fcd9f068
      Call Trace:
      [<ffffffff810348fe>] ? physflat_send_IPI_mask+0xe/0x10
      [<ffffffff814c9cf5>] schedule_timeout+0x225/0x2f0
      [<ffffffffa0552897>] ? llog_client_read_header+0x187/0x640 [ptlrpc]
      [<ffffffff814c9963>] wait_for_common+0x123/0x180
      [<ffffffff8105c540>] ? default_wake_function+0x0/0x20
      [<ffffffff814c9a7d>] wait_for_completion+0x1d/0x20
      [<ffffffffa03f9ad5>] llog_process_flags+0x115/0x680 [obdclass]
      [<ffffffff81096d4f>] ? up+0x2f/0x50
      [<ffffffffa03fdd59>] llog_cat_process_cb+0x329/0x400 [obdclass]
      [<ffffffffa03fb653>] llog_process_thread+0x9a3/0xe70 [obdclass]
      [<ffffffff8111f059>] ? free_pages+0x49/0x50
      [<ffffffff810141ca>] child_rip+0xa/0x20
      [<ffffffffa0709170>] ? changelog_show_cb+0x0/0x310 [mdc]
      [<ffffffffa03facb0>] ? llog_process_thread+0x0/0xe70 [obdclass]
      [<ffffffff810141c0>] ? child_rip+0x0/0x20
      LustreError: 11-0: an error occurred while communicating with 192.168.4.128@o2ib. The obd_ping operation failed with -107
      LustreError: 166-1: MGC192.168.4.128@o2ib: Connection to service MGS via nid 192.168.4.128@o2ib was lost; in progress operations using this service will fail.
      Lustre: lustre-OST0005-osc-ffff880318f86000: Connection to service lustre-OST0005 via nid 192.168.4.129@o2ib was lost; in progress operations using this service will wait for recovery to complete.
      LustreError: 167-0: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
      Lustre: lustre-MDT0000-mdc-ffff880318f86000: Connection restored to service lustre-MDT0000 using nid 192.168.4.128@o2ib.
      Lustre: 2501:0:(import.c:885:ptlrpc_connect_interpret()) MGS@192.168.4.128@o2ib changed server handle from 0x2af0a31218ffe297 to 0x2af0a31219beaacb
      Lustre: MGC192.168.4.128@o2ib: Reactivating import
      Lustre: MGC192.168.4.128@o2ib: Connection restored to service MGS using nid 192.168.4.128@o2ib.
      Lustre: Skipped 1 previous similar message
      INFO: task lustre_rsync:8898 blocked for more than 120 seconds.
      LustreError: 11-0: an error occurred while communicating with 192.168.4.128@o2ib. The obd_ping operation failed with -107
      LustreError: Skipped 2 previous similar messages
      LustreError: 166-1: MGC192.168.4.128@o2ib: Connection to service MGS via nid 192.168.4.128@o2ib was lost; in progress operations using this service will fail.
      Lustre: lustre-OST0000-osc-ffff880318f86000: Connection to service lustre-OST0000 via nid 192.168.4.129@o2ib was lost; in progress operations using this service will wait for recovery to complete.
      Lustre: Skipped 1 previous similar message
      Lustre: 2501:0:(import.c:885:ptlrpc_connect_interpret()) MGS@192.168.4.128@o2ib changed server handle from 0x2af0a31219beaacb to 0x2af0a31219beaad9
      LustreError: 167-0: This client was evicted by lustre-OST0000; in progress operations using this service will fail.
      LustreError: Skipped 1 previous similar message
      Lustre: lustre-OST0002-osc-ffff880318f86000: Connection restored to service lustre-OST0002 using nid 192.168.4.129@o2ib.
      LustreError: 9082:0:(client.c:1057:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@ffff8802ff556400 x1368658213493291/t0(0) o-1->MGS@192.168.4.128@o2ib:26/25 lens 296/352 e 0 to 0 dl 0 ref 2 fl Rpc:/ffffffff/ffffffff rc 0/-1
      Lustre: MGC192.168.4.128@o2ib: Reactivating import

      Attachments

        Activity

          People

            green Oleg Drokin
            sarah Sarah Liu
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: