[LU-1752] Test failure on test suite obdfilter-survey, subtest test_1c Created: 15/Aug/12 Updated: 23/Aug/12 Resolved: 23/Aug/12 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Zhenyu Xu |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 6350 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/8220cf66-e4f5-11e1-9681-52540035b04c. The sub-test test_1c failed with the following error:
12:46:22:Lustre: DEBUG MARKER: == obdfilter-survey test 1c: Object Storage Targets survey, big batch == 12:46:20 (1344800780) 12:46:22:Lustre: DEBUG MARKER: lctl dl | grep obdfilter 12:46:23:Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep tcp | cut -f 1 -d '@' 12:46:23:Lustre: Echo OBD driver; http://www.lustre.org/ 12:47:02:Lustre: 5083:0:(filter_io_26.c:772:filter_commitrw_write()) lustre-OST0002: slow direct_io 32s 12:47:05:Lustre: 5419:0:(filter_io_26.c:772:filter_commitrw_write()) lustre-OST0002: slow direct_io 30s 12:47:06:Lustre: 5364:0:(filter_io_26.c:772:filter_commitrw_write()) lustre-OST0002: slow direct_io 31s 12:47:06:Lustre: 5364:0:(filter_io_26.c:772:filter_commitrw_write()) Skipped 9 previous similar messages 12:47:08:Lustre: 5086:0:(filter_io_26.c:772:filter_commitrw_write()) lustre-OST0001: slow direct_io 38s 12:47:08:Lustre: 5086:0:(filter_io_26.c:772:filter_commitrw_write()) Skipped 17 previous similar messages 12:47:09:Lustre: 5705:0:(filter_io_26.c:772:filter_commitrw_write()) lustre-OST0001: slow direct_io 34s 12:47:09:Lustre: 5705:0:(filter_io_26.c:772:filter_commitrw_write()) Skipped 38 previous similar messages 12:47:13:Lustre: 5706:0:(filter_io_26.c:772:filter_commitrw_write()) lustre-OST0001: slow direct_io 38s 12:47:13:Lustre: 5706:0:(filter_io_26.c:772:filter_commitrw_write()) Skipped 47 previous similar messages 12:47:22:Lustre: 5346:0:(filter_io_26.c:772:filter_commitrw_write()) lustre-OST0005: slow direct_io 36s 12:47:24:Lustre: 5346:0:(filter_io_26.c:772:filter_commitrw_write()) Skipped 155 previous similar messages 12:47:38:Lustre: 5220:0:(filter_io_26.c:772:filter_commitrw_write()) lustre-OST0004: slow direct_io 36s 12:47:39:Lustre: 5220:0:(filter_io_26.c:772:filter_commitrw_write()) Skipped 271 previous similar messages 12:48:11:Lustre: 5628:0:(filter_io_26.c:772:filter_commitrw_write()) lustre-OST0000: slow direct_io 33s 12:48:11:Lustre: 5628:0:(filter_io_26.c:772:filter_commitrw_write()) Skipped 492 previous similar messages 12:49:15:Lustre: 5901:0:(filter_io_26.c:772:filter_commitrw_write()) lustre-OST0005: slow direct_io 30s 12:49:16:Lustre: 5901:0:(filter_io_26.c:772:filter_commitrw_write()) Skipped 1089 previous similar messages 12:52:38:INFO: task jbd2/dm-0-8:29189 blocked for more than 120 seconds. 12:52:38:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 12:52:38:jbd2/dm-0-8 D 0000000000000000 0 29189 2 0x00000080 12:52:38: ffff880069127d20 0000000000000046 ffff880069127c90 ffffffff810920e6 12:52:38: ffff880069127cd0 ffffffff81039678 ffff8800ffffffff ffff880069938040 12:52:38: ffff8800699385f8 ffff880069127fd8 000000000000fb88 ffff8800699385f8 12:52:38:Call Trace: 12:52:38: [<ffffffff810920e6>] ? autoremove_wake_function+0x16/0x40 12:52:38: [<ffffffff81039678>] ? pvclock_clocksource_read+0x58/0xd0 12:52:38: [<ffffffff810923be>] ? prepare_to_wait+0x4e/0x80 12:52:38: [<ffffffffa03a493f>] jbd2_journal_commit_transaction+0x19f/0x1530 [jbd2] 12:52:38: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320 12:52:38: [<ffffffff8107e00c>] ? lock_timer_base+0x3c/0x70 12:52:38: [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40 12:52:38: [<ffffffffa03ab128>] kjournald2+0xb8/0x220 [jbd2] 12:52:38: [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40 12:52:38: [<ffffffffa03ab070>] ? kjournald2+0x0/0x220 [jbd2] 12:52:38: [<ffffffff81091d66>] kthread+0x96/0xa0 12:52:38: [<ffffffff8100c14a>] child_rip+0xa/0x20 12:52:38: [<ffffffff81091cd0>] ? kthread+0x0/0xa0 12:52:38: [<ffffffff8100c140>] ? child_rip+0x0/0x20 |
| Comments |
| Comment by Keith Mannthey (Inactive) [ 15/Aug/12 ] |
|
Is there any more the the OST kernel messages in /var/log/messages? The direct IO the the DM devices seemed to just stop for several of the OSTs. Did the underlying IO fail? |
| Comment by Peter Jones [ 19/Aug/12 ] |
|
Bobijam Could you please look into this one in more detail? Thanks Peter |
| Comment by Zhenyu Xu [ 20/Aug/12 ] |
|
OSS stack trace of major blocked threads 14:51:15:jbd2/dm-0-8 D 0000000000000000 0 29189 2 0x00000080 14:51:15: ffff880069127d20 0000000000000046 ffff880069127c90 ffffffff810920e6 14:51:15: ffff880069127cd0 ffffffff81039678 ffff8800ffffffff ffff880069938040 14:51:15: ffff8800699385f8 ffff880069127fd8 000000000000fb88 ffff8800699385f8 14:51:15:Call Trace: 14:51:15: [<ffffffff810920e6>] ? autoremove_wake_function+0x16/0x40 14:51:15: [<ffffffff81039678>] ? pvclock_clocksource_read+0x58/0xd0 14:51:15: [<ffffffff810923be>] ? prepare_to_wait+0x4e/0x80 14:51:15: [<ffffffffa03a493f>] jbd2_journal_commit_transaction+0x19f/0x1530 [jbd2] 14:51:15: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320 14:51:15: [<ffffffff8107e00c>] ? lock_timer_base+0x3c/0x70 14:51:16: [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40 14:51:16: [<ffffffffa03ab128>] kjournald2+0xb8/0x220 [jbd2] 14:51:16: [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40 14:51:16: [<ffffffffa03ab070>] ? kjournald2+0x0/0x220 [jbd2] 14:51:16: [<ffffffff81091d66>] kthread+0x96/0xa0 14:51:16: [<ffffffff8100c14a>] child_rip+0xa/0x20 14:51:16: [<ffffffff81091cd0>] ? kthread+0x0/0xa0 14:51:16: [<ffffffff8100c140>] ? child_rip+0x0/0x20 14:51:35:jbd2/dm-1-8 D 0000000000000000 0 29544 2 0x00000080 14:51:35: ffff8800679bbd20 0000000000000046 ffff88006af48040 0000000000000000 14:51:35: ffff8800145d0000 ffff880002216728 000000000000035f ffff88006ce54aa0 14:51:35: ffff88006ce55058 ffff8800679bbfd8 000000000000fb88 ffff88006ce55058 14:51:35:Call Trace: 14:51:35: [<ffffffff810923be>] ? prepare_to_wait+0x4e/0x80 14:51:35: [<ffffffffa03a493f>] jbd2_journal_commit_transaction+0x19f/0x1530 [jbd2] 14:51:35: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320 14:51:35: [<ffffffff8107e00c>] ? lock_timer_base+0x3c/0x70 14:51:35: [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40 14:51:35: [<ffffffffa03ab128>] kjournald2+0xb8/0x220 [jbd2] 14:51:35: [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40 14:51:36: [<ffffffffa03ab070>] ? kjournald2+0x0/0x220 [jbd2] 14:51:36: [<ffffffff81091d66>] kthread+0x96/0xa0 14:51:36: [<ffffffff8100c14a>] child_rip+0xa/0x20 14:51:36: [<ffffffff81091cd0>] ? kthread+0x0/0xa0 14:51:36: [<ffffffff8100c140>] ? child_rip+0x0/0x20 14:51:37:jbd2/dm-2-8 D 0000000000000000 0 29878 2 0x00000080 14:51:37: ffff880062127d20 0000000000000046 ffff880062127c90 ffffffff810920e6 14:51:37: ffff880044ca1000 ffff880002216728 00000000000004e1 ffff88007d352080 14:51:38: ffff88007d352638 ffff880062127fd8 000000000000fb88 ffff88007d352638 14:51:38:Call Trace: 14:51:38: [<ffffffff810920e6>] ? autoremove_wake_function+0x16/0x40 14:51:38: [<ffffffff810923be>] ? prepare_to_wait+0x4e/0x80 14:51:38: [<ffffffffa03a493f>] jbd2_journal_commit_transaction+0x19f/0x1530 [jbd2] 14:51:38: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320 14:51:38: [<ffffffff8107e00c>] ? lock_timer_base+0x3c/0x70 14:51:38: [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40 14:51:38: [<ffffffffa03ab128>] kjournald2+0xb8/0x220 [jbd2] 14:51:38: [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40 14:51:38: [<ffffffffa03ab070>] ? kjournald2+0x0/0x220 [jbd2] 14:51:38: [<ffffffff81091d66>] kthread+0x96/0xa0 14:51:38: [<ffffffff8100c14a>] child_rip+0xa/0x20 14:51:38: [<ffffffff81091cd0>] ? kthread+0x0/0xa0 14:51:38: [<ffffffff8100c140>] ? child_rip+0x0/0x20 14:51:41:jbd2/dm-3-8 D 0000000000000000 0 30247 2 0x00000080 14:51:41: ffff88006d207d20 0000000000000046 ffff880064824ae0 0000000000000000 14:51:41: ffff880014fa3c00 ffff880002216728 0000000000000318 ffff88001e651540 14:51:41: ffff88001e651af8 ffff88006d207fd8 000000000000fb88 ffff88001e651af8 14:51:41:Call Trace: 14:51:41: [<ffffffff810923be>] ? prepare_to_wait+0x4e/0x80 14:51:41: [<ffffffffa03a493f>] jbd2_journal_commit_transaction+0x19f/0x1530 [jbd2] 14:51:41: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320 14:51:41: [<ffffffff8107e00c>] ? lock_timer_base+0x3c/0x70 14:51:41: [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40 14:51:41: [<ffffffffa03ab128>] kjournald2+0xb8/0x220 [jbd2] 14:51:41: [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40 14:51:41: [<ffffffffa03ab070>] ? kjournald2+0x0/0x220 [jbd2] 14:51:41: [<ffffffff81091d66>] kthread+0x96/0xa0 14:51:41: [<ffffffff8100c14a>] child_rip+0xa/0x20 14:51:41: [<ffffffff81091cd0>] ? kthread+0x0/0xa0 14:51:42: [<ffffffff8100c140>] ? child_rip+0x0/0x20 14:52:05:lctl D 0000000000000000 0 5080 5076 0x00000080 14:52:05: ffff880016eab638 0000000000000086 ffff880016eab6c8 ffffffffa0a239d8 14:52:05: ffff880079605540 ffff88007bf42680 ffff880016eab628 0000000000000000 14:52:05: ffff880079605af8 ffff880016eabfd8 000000000000fb88 ffff880079605af8 14:52:05:Call Trace: 14:52:05: [<ffffffffa0a239d8>] ? quota_chk_acq_common+0x88/0xb50 [lquota] 14:52:05: [<ffffffffa0a22a80>] ? quota_acquire_common+0x0/0xf0 [lquota] 14:52:05: [<ffffffff814ff2fe>] __mutex_lock_slowpath+0x13e/0x180 14:52:05: [<ffffffff814ff19b>] mutex_lock+0x2b/0x50 14:52:05: [<ffffffffa0b61f74>] filter_commitrw_write+0xe14/0x2eb8 [obdfilter] 14:52:05: [<ffffffffa0b56654>] ? filter_preprw_write+0x1064/0x22f0 [obdfilter] 14:52:05: [<ffffffffa04626c2>] ? cfs_hash_rw_lock+0x12/0x30 [libcfs] 14:52:05: [<ffffffffa05ddbbb>] ? cl_env_put+0x1cb/0x370 [obdclass] 14:52:05: [<ffffffffa0b5551d>] filter_commitrw+0x27d/0x2a0 [obdfilter] 14:52:05: [<ffffffffa0b586e0>] ? filter_preprw+0x80/0xa0 [obdfilter] 14:52:06: [<ffffffffa0b9d1da>] echo_client_brw_ioctl+0x82a/0x13e0 [obdecho] 14:52:06: [<ffffffff81162447>] ? cache_grow+0x217/0x320 14:52:07: [<ffffffffa05d86cb>] ? lu_context_init+0xab/0x260 [obdclass] 14:52:07: [<ffffffffa0ba56a0>] echo_client_iocontrol+0xbb0/0x2950 [obdecho] 14:52:07: [<ffffffff81114307>] ? unlock_page+0x27/0x30 14:52:07: [<ffffffffa044cbe0>] ? cfs_alloc+0x30/0x60 [libcfs] 14:52:07: [<ffffffffa0586d8a>] ? obd_ioctl_getdata+0x13a/0x1160 [obdclass] 14:52:07: [<ffffffffa059a51f>] class_handle_ioctl+0x12cf/0x1ea0 [obdclass] 14:52:07: [<ffffffff8113ff34>] ? handle_mm_fault+0x1e4/0x2b0 14:52:08: [<ffffffffa05862ab>] obd_class_ioctl+0x4b/0x190 [obdclass] 14:52:08: [<ffffffff8118dff2>] vfs_ioctl+0x22/0xa0 14:52:08: [<ffffffff81039678>] ? pvclock_clocksource_read+0x58/0xd0 14:52:08: [<ffffffff8118e194>] do_vfs_ioctl+0x84/0x580 14:52:08: [<ffffffff8118e711>] sys_ioctl+0x81/0xa0 14:52:09: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b 14:52:09:lctl D 0000000000000000 0 5081 5076 0x00000080 14:52:09: ffff8800270ed638 0000000000000082 ffff8800270ed6c8 ffffffffa0a239d8 14:52:09: ffff88000002bb00 ffff88007bf42680 ffff8800270ed628 0000000000000000 14:52:10: ffff88004c5c7af8 ffff8800270edfd8 000000000000fb88 ffff88004c5c7af8 14:52:10:Call Trace: 14:52:10: [<ffffffffa0a239d8>] ? quota_chk_acq_common+0x88/0xb50 [lquota] 14:52:10: [<ffffffffa0a22a80>] ? quota_acquire_common+0x0/0xf0 [lquota] 14:52:10: [<ffffffff814ff2fe>] __mutex_lock_slowpath+0x13e/0x180 14:52:10: [<ffffffff814ff19b>] mutex_lock+0x2b/0x50 14:52:10: [<ffffffffa0b61f74>] filter_commitrw_write+0xe14/0x2eb8 [obdfilter] 14:52:11: [<ffffffffa0b56654>] ? filter_preprw_write+0x1064/0x22f0 [obdfilter] 14:52:11: [<ffffffffa04626c2>] ? cfs_hash_rw_lock+0x12/0x30 [libcfs] 14:52:11: [<ffffffffa05ddbbb>] ? cl_env_put+0x1cb/0x370 [obdclass] 14:52:11: [<ffffffffa0b5551d>] filter_commitrw+0x27d/0x2a0 [obdfilter] 14:52:11: [<ffffffffa0b586e0>] ? filter_preprw+0x80/0xa0 [obdfilter] 14:52:11: [<ffffffffa0b9d1da>] echo_client_brw_ioctl+0x82a/0x13e0 [obdecho] 14:52:11: [<ffffffffa05d86cb>] ? lu_context_init+0xab/0x260 [obdclass] 14:52:11: [<ffffffffa0ba56a0>] echo_client_iocontrol+0xbb0/0x2950 [obdecho] 14:52:11: [<ffffffff81134425>] ? shmem_getpage+0x225/0x9f0 14:52:11: [<ffffffffa044cbe0>] ? cfs_alloc+0x30/0x60 [libcfs] 14:52:12: [<ffffffffa0586d8a>] ? obd_ioctl_getdata+0x13a/0x1160 [obdclass] 14:52:12: [<ffffffffa059a51f>] class_handle_ioctl+0x12cf/0x1ea0 [obdclass] 14:52:12: [<ffffffff8113ff34>] ? handle_mm_fault+0x1e4/0x2b0 14:52:12: [<ffffffffa05862ab>] obd_class_ioctl+0x4b/0x190 [obdclass] 14:52:13: [<ffffffff8118dff2>] vfs_ioctl+0x22/0xa0 14:52:13: [<ffffffff81039678>] ? pvclock_clocksource_read+0x58/0xd0 14:52:13: [<ffffffff8118e194>] do_vfs_ioctl+0x84/0x580 14:52:13: [<ffffffff8118e711>] sys_ioctl+0x81/0xa0 14:52:13: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b 14:52:16:lctl D 0000000000000000 0 5084 5075 0x00000080 14:52:16: ffff88002005d648 0000000000000082 ffff8800ffffffff 0000000000a22e38 14:52:17: ffffffffa0b7f700 000000000000004e ffff880037d35450 0000000400000000 14:52:17: ffff88005487fab8 ffff88002005dfd8 000000000000fb88 ffff88005487fab8 14:52:17:Call Trace: 14:52:17: [<ffffffff810923be>] ? prepare_to_wait+0x4e/0x80 14:52:17: [<ffffffffa03a3072>] start_this_handle+0x282/0x500 [jbd2] 14:52:17: [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40 14:52:17: [<ffffffffa03a34f0>] jbd2_journal_start+0xd0/0x110 [jbd2] 14:52:17: [<ffffffffa03ffad8>] ldiskfs_journal_start_sb+0x58/0x90 [ldiskfs] 14:52:17: [<ffffffffa04c7088>] fsfilt_ldiskfs_start+0x78/0x630 [fsfilt_ldiskfs] 14:52:17: [<ffffffffa0b560cd>] filter_preprw_write+0xadd/0x22f0 [obdfilter] 14:52:17: [<ffffffffa05d797c>] ? lu_object_find_at+0x8c/0x450 [obdclass] 14:52:17: [<ffffffffa04626c2>] ? cfs_hash_rw_lock+0x12/0x30 [libcfs] 14:52:17: [<ffffffffa04617b8>] ? cfs_hash_multi_bd_lock+0x68/0xd0 [libcfs] 14:52:17: [<ffffffffa05ddbbb>] ? cl_env_put+0x1cb/0x370 [obdclass] 14:52:17: [<ffffffffa044cbae>] ? cfs_free+0xe/0x10 [libcfs] 14:52:17: [<ffffffffa0b3b8d7>] ? filter_unpackmd+0x6d7/0xa70 [obdfilter] 14:52:17: [<ffffffffa0b586e0>] filter_preprw+0x80/0xa0 [obdfilter] 14:52:18: [<ffffffffa0b9cf6a>] echo_client_brw_ioctl+0x5ba/0x13e0 [obdecho] 14:52:18: [<ffffffffa05d86cb>] ? lu_context_init+0xab/0x260 [obdclass] 14:52:18: [<ffffffffa0ba56a0>] echo_client_iocontrol+0xbb0/0x2950 [obdecho] 14:52:18: [<ffffffff81134425>] ? shmem_getpage+0x225/0x9f0 14:52:18: [<ffffffff81127061>] ? __alloc_pages_nodemask+0x111/0x940 14:52:18: [<ffffffffa044cbe0>] ? cfs_alloc+0x30/0x60 [libcfs] 14:52:18: [<ffffffffa0586d8a>] ? obd_ioctl_getdata+0x13a/0x1160 [obdclass] 14:52:18: [<ffffffffa059a51f>] class_handle_ioctl+0x12cf/0x1ea0 [obdclass] 14:52:18: [<ffffffff8113ff34>] ? handle_mm_fault+0x1e4/0x2b0 14:52:18: [<ffffffffa05862ab>] obd_class_ioctl+0x4b/0x190 [obdclass] 14:52:18: [<ffffffff8118dff2>] vfs_ioctl+0x22/0xa0 14:52:18: [<ffffffff81039678>] ? pvclock_clocksource_read+0x58/0xd0 14:52:18: [<ffffffff8118e194>] do_vfs_ioctl+0x84/0x580 14:52:18: [<ffffffff8118e711>] sys_ioctl+0x81/0xa0 14:52:18: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b 14:52:20:lctl D 0000000000000000 0 5086 5074 0x00000080 14:52:20: ffff8800676a96e8 0000000000000086 ffff8800676a9688 ffffffffa03fd4df 14:52:20: ffff8800676a9708 ffffffffa03a2ed5 ffff880000000020 ffff8800676a9698 14:52:20: ffff88006030a638 ffff8800676a9fd8 000000000000fb88 ffff88006030a638 14:52:20:Call Trace: 14:52:21: [<ffffffffa03fd4df>] ? ldiskfs_statfs+0xef/0x1d0 [ldiskfs] 14:52:21: [<ffffffffa03a2ed5>] ? start_this_handle+0xe5/0x500 [jbd2] 14:52:21: [<ffffffff810923be>] ? prepare_to_wait+0x4e/0x80 14:52:21: [<ffffffffa03aad85>] jbd2_log_wait_commit+0xc5/0x140 [jbd2] 14:52:21: [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40 14:52:21: [<ffffffffa04c848d>] fsfilt_ldiskfs_commit_wait+0x6d/0xf0 [fsfilt_ldiskfs] 14:52:21: [<ffffffffa0b5623d>] filter_preprw_write+0xc4d/0x22f0 [obdfilter] 14:52:21: [<ffffffffa05d797c>] ? lu_object_find_at+0x8c/0x450 [obdclass] 14:52:21: [<ffffffffa04626c2>] ? cfs_hash_rw_lock+0x12/0x30 [libcfs] 14:52:21: [<ffffffffa04617b8>] ? cfs_hash_multi_bd_lock+0x68/0xd0 [libcfs] 14:52:21: [<ffffffffa05ddbbb>] ? cl_env_put+0x1cb/0x370 [obdclass] 14:52:21: [<ffffffffa044cbae>] ? cfs_free+0xe/0x10 [libcfs] 14:52:21: [<ffffffffa0b3b8d7>] ? filter_unpackmd+0x6d7/0xa70 [obdfilter] 14:52:21: [<ffffffffa0b586e0>] filter_preprw+0x80/0xa0 [obdfilter] 14:52:21: [<ffffffffa0b9cf6a>] echo_client_brw_ioctl+0x5ba/0x13e0 [obdecho] 14:52:21: [<ffffffffa05d86cb>] ? lu_context_init+0xab/0x260 [obdclass] 14:52:22: [<ffffffffa0ba56a0>] echo_client_iocontrol+0xbb0/0x2950 [obdecho] 14:52:22: [<ffffffff81134425>] ? shmem_getpage+0x225/0x9f0 14:52:22: [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20 14:52:22: [<ffffffffa044cbe0>] ? cfs_alloc+0x30/0x60 [libcfs] 14:52:22: [<ffffffffa0586d8a>] ? obd_ioctl_getdata+0x13a/0x1160 [obdclass] 14:52:22: [<ffffffffa059a51f>] class_handle_ioctl+0x12cf/0x1ea0 [obdclass] 14:52:22: [<ffffffff8113ff34>] ? handle_mm_fault+0x1e4/0x2b0 14:52:22: [<ffffffffa05862ab>] obd_class_ioctl+0x4b/0x190 [obdclass] 14:52:22: [<ffffffff8118dff2>] vfs_ioctl+0x22/0xa0 14:52:22: [<ffffffff8118e194>] do_vfs_ioctl+0x84/0x580 14:52:22: [<ffffffff814fdca0>] ? thread_return+0x4e/0x76e 14:52:22: [<ffffffff8118e711>] sys_ioctl+0x81/0xa0 14:52:22: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b |
| Comment by Zhenyu Xu [ 21/Aug/12 ] |
|
niu, All this failure happens on master, and all stack trace shows in OSS console log has this, while on orion_quota, orion_head_sync and b2_1 does not have this issue. 20:50:58:lctl D 0000000000000000 0 8218 8202 0x00000080 could you check whether master branch missed some patch? |
| Comment by Niu Yawei (Inactive) [ 22/Aug/12 ] |
|
I don't think it's quota related, and looks the stack trace isn't 100% accurate, that's why the two quota functions listed in the trace. What I see in the log is that many io threads is waiting on journal commit (and some are waiting on inode semaphore), not sure why the journal commit can't finish in time. |
| Comment by Zhenyu Xu [ 23/Aug/12 ] |
|
Need an I/O expert's help. |
| Comment by Jodi Levi (Inactive) [ 23/Aug/12 ] |
|
Duplicate of |