Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.1.0
-
None
-
lustre-master/rhel6-x86_64/#114
-
3
-
5028
Description
system hang when running lustre-rsync-test test_5b
Lustre: DEBUG MARKER: == lustre-rsync-test test 5b: Kill / restart lustre_rsync == 19:47:48 (1305254868)
INFO: task lustre_rsync:8898 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lustre_rsync D ffff88033fc24700 0 8898 1 0x00000080
ffff8802fd019c40 0000000000000082 0000000000000000 ffff880032e769f0
ffff8802fd019be0 ffffffff8105fe65 ffff8802fd019bc0 0000000100115300
ffff88030a7bc678 ffff8802fd019fd8 0000000000010518 ffff88030a7bc678
Call Trace:
[<ffffffff8105fe65>] ? task_new_fair+0xb5/0x100
[<ffffffff814c9cf5>] schedule_timeout+0x225/0x2f0
[<ffffffffa02ec6de>] ? cfs_waitq_del+0xe/0x10 [libcfs]
[<ffffffff814c9963>] wait_for_common+0x123/0x180
[<ffffffff8105c540>] ? default_wake_function+0x0/0x20
[<ffffffff814c9a7d>] wait_for_completion+0x1d/0x20
[<ffffffffa03f9ad5>] llog_process_flags+0x115/0x680 [obdclass]
[<ffffffffa0552897>] ? llog_client_read_header+0x187/0x640 [ptlrpc]
[<ffffffffa03fce98>] llog_cat_process_flags+0x188/0x2d0 [obdclass]
[<ffffffffa03fbf8f>] ? llog_init_handle+0x17f/0xa70 [obdclass]
[<ffffffffa0709170>] ? changelog_show_cb+0x0/0x310 [mdc]
[<ffffffffa0712aae>] mdc_changelog_send_thread+0x4ce/0xb90 [mdc]
[<ffffffff81068bc4>] ? __mmdrop+0x44/0x60
[<ffffffff81059e2c>] ? finish_task_switch+0xac/0xd0
[<ffffffff810141ca>] child_rip+0xa/0x20
[<ffffffffa07125e0>] ? mdc_changelog_send_thread+0x0/0xb90 [mdc]
[<ffffffff810141c0>] ? child_rip+0x0/0x20
INFO: task lustre_rsync:8899 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lustre_rsync D ffff88033fc24900 0 8899 8898 0x00000080
ffff88030e1c9c10 0000000000000082 0000000000000000 00000000000022c4
ffff88030e1c9b80 ffffffff810348fe ffff88030e1c9b90 000000010011535c
ffff8802fcd9f068 ffff88030e1c9fd8 0000000000010518 ffff8802fcd9f068
Call Trace:
[<ffffffff810348fe>] ? physflat_send_IPI_mask+0xe/0x10
[<ffffffff814c9cf5>] schedule_timeout+0x225/0x2f0
[<ffffffffa0552897>] ? llog_client_read_header+0x187/0x640 [ptlrpc]
[<ffffffff814c9963>] wait_for_common+0x123/0x180
[<ffffffff8105c540>] ? default_wake_function+0x0/0x20
[<ffffffff814c9a7d>] wait_for_completion+0x1d/0x20
[<ffffffffa03f9ad5>] llog_process_flags+0x115/0x680 [obdclass]
[<ffffffff81096d4f>] ? up+0x2f/0x50
[<ffffffffa03fdd59>] llog_cat_process_cb+0x329/0x400 [obdclass]
[<ffffffffa03fb653>] llog_process_thread+0x9a3/0xe70 [obdclass]
[<ffffffff8111f059>] ? free_pages+0x49/0x50
[<ffffffff810141ca>] child_rip+0xa/0x20
[<ffffffffa0709170>] ? changelog_show_cb+0x0/0x310 [mdc]
[<ffffffffa03facb0>] ? llog_process_thread+0x0/0xe70 [obdclass]
[<ffffffff810141c0>] ? child_rip+0x0/0x20
LustreError: 11-0: an error occurred while communicating with 192.168.4.128@o2ib. The obd_ping operation failed with -107
LustreError: 166-1: MGC192.168.4.128@o2ib: Connection to service MGS via nid 192.168.4.128@o2ib was lost; in progress operations using this service will fail.
Lustre: lustre-OST0005-osc-ffff880318f86000: Connection to service lustre-OST0005 via nid 192.168.4.129@o2ib was lost; in progress operations using this service will wait for recovery to complete.
LustreError: 167-0: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
Lustre: lustre-MDT0000-mdc-ffff880318f86000: Connection restored to service lustre-MDT0000 using nid 192.168.4.128@o2ib.
Lustre: 2501:0:(import.c:885:ptlrpc_connect_interpret()) MGS@192.168.4.128@o2ib changed server handle from 0x2af0a31218ffe297 to 0x2af0a31219beaacb
Lustre: MGC192.168.4.128@o2ib: Reactivating import
Lustre: MGC192.168.4.128@o2ib: Connection restored to service MGS using nid 192.168.4.128@o2ib.
Lustre: Skipped 1 previous similar message
INFO: task lustre_rsync:8898 blocked for more than 120 seconds.
LustreError: 11-0: an error occurred while communicating with 192.168.4.128@o2ib. The obd_ping operation failed with -107
LustreError: Skipped 2 previous similar messages
LustreError: 166-1: MGC192.168.4.128@o2ib: Connection to service MGS via nid 192.168.4.128@o2ib was lost; in progress operations using this service will fail.
Lustre: lustre-OST0000-osc-ffff880318f86000: Connection to service lustre-OST0000 via nid 192.168.4.129@o2ib was lost; in progress operations using this service will wait for recovery to complete.
Lustre: Skipped 1 previous similar message
Lustre: 2501:0:(import.c:885:ptlrpc_connect_interpret()) MGS@192.168.4.128@o2ib changed server handle from 0x2af0a31219beaacb to 0x2af0a31219beaad9
LustreError: 167-0: This client was evicted by lustre-OST0000; in progress operations using this service will fail.
LustreError: Skipped 1 previous similar message
Lustre: lustre-OST0002-osc-ffff880318f86000: Connection restored to service lustre-OST0002 using nid 192.168.4.129@o2ib.
LustreError: 9082:0:(client.c:1057:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@ffff8802ff556400 x1368658213493291/t0(0) o-1->MGS@192.168.4.128@o2ib:26/25 lens 296/352 e 0 to 0 dl 0 ref 2 fl Rpc:/ffffffff/ffffffff rc 0/-1
Lustre: MGC192.168.4.128@o2ib: Reactivating import