Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.5.0
-
None
-
RHEL6 + master branch (two days ago)
-
3
-
10042
Description
while testing a own patch i hit a deadlock with journal
Lustre: DEBUG MARKER: == sanity test 156: Verification of tunables ============================== 21:09:39 (1377968979) Lustre: DEBUG MARKER: cancel_lru_locks osc start Lustre: DEBUG MARKER: cancel_lru_locks osc stop Lustre: DEBUG MARKER: cancel_lru_locks osc start Lustre: DEBUG MARKER: cancel_lru_locks osc stop INFO: task jbd2/loop0-8:11882 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. jbd2/loop0-8 D ffff8801225702b0 0 11882 2 0x00000080 ffff88010811bcf0 0000000000000046 0000000000000002 ffff8801225702b0 ffff88010811bc90 0000000000000046 ffff88010811bdd8 0000000000000282 ffff88010b0845f8 ffff88010811bfd8 000000000000fc88 ffff88010b0845f8 Call Trace: [<ffffffffa0041aed>] jbd2_journal_commit_transaction+0x19d/0x16e0 [jbd2] [<ffffffff81081d5c>] ? lock_timer_base+0x3c/0x70 [<ffffffff815323f1>] ? _spin_lock_irqsave+0x81/0xa0 [<ffffffff8108381b>] ? try_to_del_timer_sync+0x7b/0xe0 [<ffffffff81099690>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8108390a>] ? del_timer_sync+0x8a/0xc0 [<ffffffff81083880>] ? del_timer_sync+0x0/0xc0 [<ffffffffa00481e1>] kjournald2+0xc1/0x220 [jbd2] [<ffffffff81099690>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa0048120>] ? kjournald2+0x0/0x220 [jbd2] [<ffffffff81099346>] kthread+0x96/0xa0 [<ffffffff8100c1ca>] child_rip+0xa/0x20 [<ffffffff81531fc0>] ? _spin_unlock_irq+0x30/0x40 [<ffffffff8100bb10>] ? restore_args+0x0/0x30 [<ffffffff810992b0>] ? kthread+0x0/0xa0 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 INFO: lockdep is turned off. INFO: task mdt01_003:12211 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mdt01_003 D ffff88013fc26d00 0 12211 2 0x00000080 ffff88010aa11870 0000000000000046 0000000000000000 ffffffff810b262d ffff88010aa117f0 0000000000000286 ffff88002cbd6cd8 0000000000000286 ffff880106ac8cf8 ffff88010aa11fd8 000000000000fc88 ffff880106ac8cf8 Call Trace: [<ffffffff810b262d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffa0f456be>] cfs_waitq_wait+0xe/0x10 [libcfs] [<ffffffffa04ef057>] lu_object_find_at+0xb7/0x370 [obdclass] [<ffffffffa014d7e2>] ? fld_server_lookup+0x72/0x440 [fld] [<ffffffff810642f0>] ? default_wake_function+0x0/0x20 [<ffffffffa0bdc6f4>] osd_object_find+0x34/0x270 [osd_ldiskfs] [<ffffffffa0be168e>] osd_index_ea_insert+0x4ae/0xdf0 [osd_ldiskfs] [<ffffffffa0be8f46>] ? osd_index_ea_delete+0x906/0xe60 [osd_ldiskfs] [<ffffffffa0b8a818>] ? __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs] [<ffffffffa0449b84>] lod_index_insert+0x44/0xe0 [lod] [<ffffffffa04499d8>] ? lod_index_delete+0x38/0xd0 [lod] [<ffffffffa0b04d5e>] __mdd_orphan_add+0x33e/0x3b0 [mdd] [<ffffffffa0bdf43d>] ? osd_attr_get+0x13d/0x180 [osd_ldiskfs] [<ffffffffa0b0cb51>] mdd_finish_unlink+0xe1/0x380 [mdd] [<ffffffffa0b0f7c8>] mdd_unlink+0x9d8/0xe30 [mdd] [<ffffffffa0c7eb98>] mdo_unlink+0x18/0x50 [mdt] [<ffffffffa0c81f80>] mdt_reint_unlink+0x820/0x1010 [mdt] [<ffffffffa0c7e891>] mdt_reint_rec+0x41/0xe0 [mdt] [<ffffffffa0c66573>] mdt_reint_internal+0x4c3/0x780 [mdt] [<ffffffffa0c66874>] mdt_reint+0x44/0xe0 [mdt] [<ffffffffa0c693da>] mdt_handle_common+0x53a/0x1480 [mdt] [<ffffffffa0ca2b95>] mds_regular_handle+0x15/0x20 [mdt] [<ffffffffa072b6e1>] ptlrpc_server_handle_request+0x3b1/0xc80 [ptlrpc] [<ffffffffa0f4557e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa0f568af>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [<ffffffffa0722fe7>] ? ptlrpc_wait_event+0xb7/0x2c0 [ptlrpc] [<ffffffff810b262d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff81055d73>] ? __wake_up+0x53/0x70 [<ffffffffa072c9f2>] ptlrpc_main+0xa42/0x1670 [ptlrpc] [<ffffffffa072bfb0>] ? ptlrpc_main+0x0/0x1670 [ptlrpc] [<ffffffff81099346>] kthread+0x96/0xa0 [<ffffffff8100c1ca>] child_rip+0xa/0x20 [<ffffffff81531fc0>] ? _spin_unlock_irq+0x30/0x40 [<ffffffff8100bb10>] ? restore_args+0x0/0x30 [<ffffffff810992b0>] ? kthread+0x0/0xa0 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 INFO: lockdep is turned off. INFO: task jbd2/loop0-8:11882 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. jbd2/loop0-8 D ffff8801225702b0 0 11882 2 0x00000080 ffff88010811bcf0 0000000000000046 0000000000000002 ffff8801225702b0 ffff88010811bc90 0000000000000046 ffff88010811bdd8 0000000000000282 ffff88010b0845f8 ffff88010811bfd8 000000000000fc88 ffff88010b0845f8 Call Trace: [<ffffffffa0041aed>] jbd2_journal_commit_transaction+0x19d/0x16e0 [jbd2] [<ffffffff81081d5c>] ? lock_timer_base+0x3c/0x70 [<ffffffff815323f1>] ? _spin_lock_irqsave+0x81/0xa0 [<ffffffff8108381b>] ? try_to_del_timer_sync+0x7b/0xe0 [<ffffffff81099690>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8108390a>] ? del_timer_sync+0x8a/0xc0 [<ffffffff81083880>] ? del_timer_sync+0x0/0xc0 [<ffffffffa00481e1>] kjournald2+0xc1/0x220 [jbd2] [<ffffffff81099690>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa0048120>] ? kjournald2+0x0/0x220 [jbd2] [<ffffffff81099346>] kthread+0x96/0xa0 [<ffffffff8100c1ca>] child_rip+0xa/0x20 [<ffffffff81531fc0>] ? _spin_unlock_irq+0x30/0x40 [<ffffffff8100bb10>] ? restore_args+0x0/0x30 [<ffffffff810992b0>] ? kthread+0x0/0xa0 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 INFO: lockdep is turned off. INFO: task mdt01_003:12211 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mdt01_003 D ffff88013fc26d00 0 12211 2 0x00000080 ffff88010aa11870 0000000000000046 0000000000000000 ffffffff810b262d ffff88010aa117f0 0000000000000286 ffff88002cbd6cd8 0000000000000286 ffff880106ac8cf8 ffff88010aa11fd8 000000000000fc88 ffff880106ac8cf8 Call Trace: [<ffffffff810b262d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffa0f456be>] cfs_waitq_wait+0xe/0x10 [libcfs] [<ffffffffa04ef057>] lu_object_find_at+0xb7/0x370 [obdclass] [<ffffffffa014d7e2>] ? fld_server_lookup+0x72/0x440 [fld] [<ffffffff810642f0>] ? default_wake_function+0x0/0x20 [<ffffffffa0bdc6f4>] osd_object_find+0x34/0x270 [osd_ldiskfs] [<ffffffffa0be168e>] osd_index_ea_insert+0x4ae/0xdf0 [osd_ldiskfs] [<ffffffffa0be8f46>] ? osd_index_ea_delete+0x906/0xe60 [osd_ldiskfs] [<ffffffffa0b8a818>] ? __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs] [<ffffffffa0449b84>] lod_index_insert+0x44/0xe0 [lod] [<ffffffffa04499d8>] ? lod_index_delete+0x38/0xd0 [lod] [<ffffffffa0b04d5e>] __mdd_orphan_add+0x33e/0x3b0 [mdd] [<ffffffffa0bdf43d>] ? osd_attr_get+0x13d/0x180 [osd_ldiskfs] [<ffffffffa0b0cb51>] mdd_finish_unlink+0xe1/0x380 [mdd] [<ffffffffa0b0f7c8>] mdd_unlink+0x9d8/0xe30 [mdd] [<ffffffffa0c7eb98>] mdo_unlink+0x18/0x50 [mdt] [<ffffffffa0c81f80>] mdt_reint_unlink+0x820/0x1010 [mdt] [<ffffffffa0c7e891>] mdt_reint_rec+0x41/0xe0 [mdt] [<ffffffffa0c66573>] mdt_reint_internal+0x4c3/0x780 [mdt] [<ffffffffa0c66874>] mdt_reint+0x44/0xe0 [mdt] [<ffffffffa0c693da>] mdt_handle_common+0x53a/0x1480 [mdt] [<ffffffffa0ca2b95>] mds_regular_handle+0x15/0x20 [mdt] [<ffffffffa072b6e1>] ptlrpc_server_handle_request+0x3b1/0xc80 [ptlrpc] [<ffffffffa0f4557e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa0f568af>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [<ffffffffa0722fe7>] ? ptlrpc_wait_event+0xb7/0x2c0 [ptlrpc] [<ffffffff810b262d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff81055d73>] ? __wake_up+0x53/0x70 [<ffffffffa072c9f2>] ptlrpc_main+0xa42/0x1670 [ptlrpc] [<ffffffffa072bfb0>] ? ptlrpc_main+0x0/0x1670 [ptlrpc] [<ffffffff81099346>] kthread+0x96/0xa0 [<ffffffff8100c1ca>] child_rip+0xa/0x20 [<ffffffff81531fc0>] ? _spin_unlock_irq+0x30/0x40 [<ffffffff8100bb10>] ? restore_args+0x0/0x30 [<ffffffff810992b0>] ? kthread+0x0/0xa0 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 INFO: lockdep is turned off. ...
i have a crashdump - but it's triggered too later to get a lustre log.
Attachments
Issue Links
- is duplicated by
-
LU-2492 MDT thread stuck: mdd_object_find -> lu_object_find_at
- Resolved