[LU-1276] MDS threads all stuck in jbd2_journal_start - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.1.0
Labels:
None
Environment:
https://github.com/chaos/lustre/commits/2.1.1-llnl

Severity:
3
Rank (Obsolete):
8047

Description

The MDS on a classified production 2.1 lustre cluster got stuck today. The symptoms were high load (800+), but very little CPU usage.

Almost all of the lustre threads were stuck in jbd2_journal_start, while the jbd2/sda thread is stuck in
jbd2_journal_commit_transaction. There is zero I/O going to disk.

There's one thread that stands out as a suspect, as it's not in jbd2_journal_start but seems to be handling an unlink. Perhaps it got stuck waiting on a semaphore while holding an open transaction with jbd2. Its stack trace looks like this:

COMMAND: "mdt_152"
schedule
rwsem_down_failed_common
rwsem_down_read_failed
call_rwsem_down_read_failed
llog_cat_current_log.clone.0
llog_cat_add_rec
llog_obd_origin_add
llog_add
lov_llog_origin_add
llog_add
mds_llog_origin_add
llog_add
mds_llog_add_unlink
mds_log_op_unlink
mdd_unlink_log
mdd_object_kill
mdd_finish_unlink
mdd_unlink
cml_unlink
mdt_reint_unlink
mdt_reint_rec
mdt_reint_internal
mdt_reint
mdt_handle_common
mdt_regular_handle
ptlrpc_main
kernel_thread

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

bt-all.merged.txt
230 kB
28/Feb/14 6:53 PM
dmesg.txt
125 kB
28/Feb/14 6:53 PM

Activity

[LU-1276] MDS threads all stuck in jbd2_journal_start

Patrick Valentin (Inactive) added a comment - 28/Feb/14 6:41 PM - edited

One of the Bull customers (TGCC) had the same deadlock twice during the past six months: one thread is stuck in jbd2_journal_commit_transaction() and many other thread are stuck in jbd2_journal_start().

PID: 29225 TASK: ffff88107c3bb040 CPU: 15 COMMAND: "jbd2/dm-2-8"
 #0 [ffff88107a343c60] schedule at ffffffff81485765
 0000001 [ffff88107a343d28] jbd2_journal_commit_transaction at ffffffffa006a94f [jbd2]
 0000002 [ffff88107a343e68] kjournald2 at ffffffffa0070c08 [jbd2]
 0000003 [ffff88107a343ee8] kthread at ffffffff8107b5f6
 0000004 [ffff88107a343f48] kernel_thread at ffffffff8100412a

and most of the threads:

PID: 15585 TASK: ffff88063062a790 CPU: 0 COMMAND: "mdt_503"
PID: 15586 TASK: ffff88063062a040 CPU: 23 COMMAND: "mdt_504"
PID: 15587 TASK: ffff88020f3ad7d0 CPU: 30 COMMAND: "mdt_505"
PID: 29286 TASK: ffff88087505e790 CPU: 25 COMMAND: "mdt_01"
...
#0 [ffff881949c078f0] schedule at ffffffff81485765
0000001 [ffff881949c079b8] start_this_handle at ffffffffa006908a [jbd2]
0000002 [ffff881949c07a78] jbd2_journal_start at ffffffffa0069500 [jbd2]
0000003 [ffff881949c07ac8] ldiskfs_journal_start_sb at ffffffffa0451ca8 [ldiskfs]
0000004 [ffff881949c07ad8] osd_trans_start at ffffffffa0d4a324 [osd_ldiskfs]
0000005 [ffff881949c07b18] mdd_trans_start at ffffffffa0c4c4e3 [mdd]
0000006 [ffff881949c07b38] mdd_unlink at ffffffffa0c401eb [mdd]
0000007 [ffff881949c07bf8] cml_unlink at ffffffffa0d82e07 [cmm]
0000008 [ffff881949c07c38] mdt_reint_unlink at ffffffffa0cba0f4 [mdt]
0000009 [ffff881949c07cb8] mdt_reint_rec at ffffffffa0cb7cb1 [mdt]
0000010 [ffff881949c07cd8] mdt_reint_internal at ffffffffa0caeed4 [mdt]
0000011 [ffff881949c07d28] mdt_reint at ffffffffa0caf2b4 [mdt]
0000012 [ffff881949c07d48] mdt_handle_common at ffffffffa0ca3762 [mdt]
0000013 [ffff881949c07d98] mdt_regular_handle at ffffffffa0ca4655 [mdt]
0000014 [ffff881949c07da8] ptlrpc_main at ffffffffa071f4f6 [ptlrpc]
0000015 [ffff881949c07f48] kernel_thread at ffffffff8100412a

The are running lustre 2.1.6 which contains http://review.whamcloud.com/4743 from ~~LU-1648~~.

I attach two files containing the dmseg and the crash back trace of all threads.
could you reopen this ticket, as it was closed with "Cannot Reproduce".

Patrick Valentin (Inactive) added a comment - 28/Feb/14 6:41 PM - edited One of the Bull customers (TGCC) had the same deadlock twice during the past six months: one thread is stuck in jbd2_journal_commit_transaction() and many other thread are stuck in jbd2_journal_start(). PID: 29225 TASK: ffff88107c3bb040 CPU: 15 COMMAND: "jbd2/dm-2-8" #0 [ffff88107a343c60] schedule at ffffffff81485765 0000001 [ffff88107a343d28] jbd2_journal_commit_transaction at ffffffffa006a94f [jbd2] 0000002 [ffff88107a343e68] kjournald2 at ffffffffa0070c08 [jbd2] 0000003 [ffff88107a343ee8] kthread at ffffffff8107b5f6 0000004 [ffff88107a343f48] kernel_thread at ffffffff8100412a and most of the threads: PID: 15585 TASK: ffff88063062a790 CPU: 0 COMMAND: "mdt_503" PID: 15586 TASK: ffff88063062a040 CPU: 23 COMMAND: "mdt_504" PID: 15587 TASK: ffff88020f3ad7d0 CPU: 30 COMMAND: "mdt_505" PID: 29286 TASK: ffff88087505e790 CPU: 25 COMMAND: "mdt_01" ... #0 [ffff881949c078f0] schedule at ffffffff81485765 0000001 [ffff881949c079b8] start_this_handle at ffffffffa006908a [jbd2] 0000002 [ffff881949c07a78] jbd2_journal_start at ffffffffa0069500 [jbd2] 0000003 [ffff881949c07ac8] ldiskfs_journal_start_sb at ffffffffa0451ca8 [ldiskfs] 0000004 [ffff881949c07ad8] osd_trans_start at ffffffffa0d4a324 [osd_ldiskfs] 0000005 [ffff881949c07b18] mdd_trans_start at ffffffffa0c4c4e3 [mdd] 0000006 [ffff881949c07b38] mdd_unlink at ffffffffa0c401eb [mdd] 0000007 [ffff881949c07bf8] cml_unlink at ffffffffa0d82e07 [cmm] 0000008 [ffff881949c07c38] mdt_reint_unlink at ffffffffa0cba0f4 [mdt] 0000009 [ffff881949c07cb8] mdt_reint_rec at ffffffffa0cb7cb1 [mdt] 0000010 [ffff881949c07cd8] mdt_reint_internal at ffffffffa0caeed4 [mdt] 0000011 [ffff881949c07d28] mdt_reint at ffffffffa0caf2b4 [mdt] 0000012 [ffff881949c07d48] mdt_handle_common at ffffffffa0ca3762 [mdt] 0000013 [ffff881949c07d98] mdt_regular_handle at ffffffffa0ca4655 [mdt] 0000014 [ffff881949c07da8] ptlrpc_main at ffffffffa071f4f6 [ptlrpc] 0000015 [ffff881949c07f48] kernel_thread at ffffffff8100412a The are running lustre 2.1.6 which contains http://review.whamcloud.com/4743 from LU-1648 . I attach two files containing the dmseg and the crash back trace of all threads. could you reopen this ticket, as it was closed with "Cannot Reproduce".

Peter Jones added a comment - 30/Apr/13 10:52 PM

ok thanks Chris

Peter Jones added a comment - 30/Apr/13 10:52 PM ok thanks Chris

Christopher Morrone (Inactive) added a comment - 30/Apr/13 10:00 PM

It looks like the ~~LU-1648~~ fix landed on before 2.1.4 in change 4743. I think we can close this until we see it again.

Christopher Morrone (Inactive) added a comment - 30/Apr/13 10:00 PM It looks like the LU-1648 fix landed on before 2.1.4 in change 4743 . I think we can close this until we see it again.

Christopher Morrone (Inactive) added a comment - 27/Nov/12 9:57 PM

It sounds like the patch from ~~LU-1648~~ is needed on b2_1.

Christopher Morrone (Inactive) added a comment - 27/Nov/12 9:57 PM It sounds like the patch from LU-1648 is needed on b2_1.

Oleg Drokin added a comment - 20/Aug/12 5:54 PM

I guess the other candidate for this issue is ~~LU-1648~~, can you add a patch from it as well please?

Oleg Drokin added a comment - 20/Aug/12 5:54 PM I guess the other candidate for this issue is LU-1648 , can you add a patch from it as well please?

Peter Jones added a comment - 15/Aug/12 3:07 PM

Reopening ticket - not a duplicate of LU81

Peter Jones added a comment - 15/Aug/12 3:07 PM Reopening ticket - not a duplicate of LU81

Ned Bass (Inactive) added a comment - 14/Aug/12 5:37 PM

We had a repeat of this bug, even though we're running the ~~LU-81~~ patch. We have a crash dump I can upload if anyone wants to take a look.

There is one thread stuck in llog_cancel_rec()

PID: 5892   TASK: ffff880776d34040  CPU: 7   COMMAND: "ldlm_cn_57"
 #0 [ffff8807b50a94e0] schedule at ffffffff814ef052
 #1 [ffff8807b50a95a8] start_this_handle at ffffffffa0a9e072 [jbd2]
 #2 [ffff8807b50a9668] jbd2_journal_restart at ffffffffa0a9e3c1 [jbd2]
 #3 [ffff8807b50a96b8] ldiskfs_truncate_restart_trans at ffffffffa0ac28ba [ldiskfs]
 #4 [ffff8807b50a96e8] ldiskfs_clear_blocks at ffffffffa0ac7bfd [ldiskfs]
 #5 [ffff8807b50a9748] ldiskfs_free_data at ffffffffa0ac7de4 [ldiskfs]
 #6 [ffff8807b50a97a8] ldiskfs_free_branches at ffffffffa0ac8023 [ldiskfs]
 #7 [ffff8807b50a9808] ldiskfs_free_branches at ffffffffa0ac7f16 [ldiskfs]
 #8 [ffff8807b50a9868] ldiskfs_truncate at ffffffffa0ac8629 [ldiskfs]
 #9 [ffff8807b50a9988] ldiskfs_delete_inode at ffffffffa0ac99a0 [ldiskfs]
#10 [ffff8807b50a99a8] generic_delete_inode at ffffffff81192b6e
#11 [ffff8807b50a99d8] generic_drop_inode at ffffffff81192cc5
#12 [ffff8807b50a99f8] iput at ffffffff81191b12
#13 [ffff8807b50a9a18] mds_obd_destroy at ffffffffa0b90161 [mds]
#14 [ffff8807b50a9b58] llog_lvfs_destroy at ffffffffa05afd62 [obdclass]
#15 [ffff8807b50a9c28] llog_cancel_rec at ffffffffa05a7acf [obdclass]
#16 [ffff8807b50a9c58] llog_cat_cancel_records at ffffffffa05abfc1 [obdclass]
#17 [ffff8807b50a9cb8] llog_origin_handle_cancel at ffffffffa072cef6 [ptlrpc]
#18 [ffff8807b50a9db8] ldlm_cancel_handler at ffffffffa06fd2cf [ptlrpc]
#19 [ffff8807b50a9df8] ptlrpc_main at ffffffffa0724ad1 [ptlrpc]
#20 [ffff8807b50a9f48] kernel_thread at ffffffff8100c14a

One thread in llog_cat_add_rec():

PID: 5119   TASK: ffff880831f10080  CPU: 9   COMMAND: "mdt_78"
 #0 [ffff8806e6e7b568] schedule at ffffffff814ef052
 #1 [ffff8806e6e7b630] rwsem_down_failed_common at ffffffff814f1605
 #2 [ffff8806e6e7b690] rwsem_down_read_failed at ffffffff814f1796
 #3 [ffff8806e6e7b6d0] call_rwsem_down_read_failed at ffffffff81278d04
 #4 [ffff8806e6e7b738] llog_cat_current_log.clone.0 at ffffffffa05aa715 [obdclass]
 #5 [ffff8806e6e7b7d8] llog_cat_add_rec at ffffffffa05ac18a [obdclass]
 #6 [ffff8806e6e7b828] llog_obd_origin_add at ffffffffa05b09a6 [obdclass]
 #7 [ffff8806e6e7b858] llog_add at ffffffffa05b0b01 [obdclass]
 #8 [ffff8806e6e7b898] lov_llog_origin_add at ffffffffa091d0c4 [lov]
 #9 [ffff8806e6e7b918] llog_add at ffffffffa05b0b01 [obdclass]
#10 [ffff8806e6e7b958] mds_llog_origin_add at ffffffffa0b915d3 [mds]
#11 [ffff8806e6e7b9a8] llog_add at ffffffffa05b0b01 [obdclass]
#12 [ffff8806e6e7b9e8] mds_llog_add_unlink at ffffffffa0b91a95 [mds]
#13 [ffff8806e6e7ba38] mds_log_op_unlink at ffffffffa0b91f84 [mds]
#14 [ffff8806e6e7ba98] mdd_unlink_log at ffffffffa0bc0e4e [mdd]
#15 [ffff8806e6e7bac8] mdd_object_kill at ffffffffa0bb8242 [mdd]
#16 [ffff8806e6e7bb08] mdd_finish_unlink at ffffffffa0bcb5f6 [mdd]
#17 [ffff8806e6e7bb48] mdd_unlink at ffffffffa0bd010a [mdd]
#18 [ffff8806e6e7bc08] cml_unlink at ffffffffa0ce4e78 [cmm]
#19 [ffff8806e6e7bc58] mdt_reint_unlink at ffffffffa0c3aded [mdt]
#20 [ffff8806e6e7bce8] mdt_reint_rec at ffffffffa0c392a0 [mdt]
#21 [ffff8806e6e7bd18] mdt_reint_internal at ffffffffa0c34098 [mdt]
#22 [ffff8806e6e7bd68] mdt_reint at ffffffffa0c34354 [mdt]
#23 [ffff8806e6e7bd98] mdt_handle_common at ffffffffa0c287ad [mdt]
#24 [ffff8806e6e7bde8] mdt_regular_handle at ffffffffa0c29405 [mdt]
#25 [ffff8806e6e7bdf8] ptlrpc_main at ffffffffa0724ad1 [ptlrpc]
#26 [ffff8806e6e7bf48] kernel_thread at ffffffff8100c14a

And 115 threads stuck in jbd2_journal_start(), i.e.

PID: 4075   TASK: ffff880231f93540  CPU: 15  COMMAND: "mdt_01"
 #0 [ffff8801018cf6a0] schedule at ffffffff814ef052
 #1 [ffff8801018cf768] start_this_handle at ffffffffa0a9e072 [jbd2]
 #2 [ffff8801018cf828] jbd2_journal_start at ffffffffa0a9e4f0 [jbd2]
 #3 [ffff8801018cf878] ldiskfs_journal_start_sb at ffffffffa0aec098 [ldiskfs]
 #4 [ffff8801018cf888] osd_trans_start at ffffffffa0cb5e0c [osd_ldiskfs]
 #5 [ffff8801018cf8c8] mdd_trans_start at ffffffffa0bd9753 [mdd]
 #6 [ffff8801018cf8e8] mdd_create at ffffffffa0bd086b [mdd]
 #7 [ffff8801018cfa28] cml_create at ffffffffa0ce54d8 [cmm]
 #8 [ffff8801018cfa78] mdt_reint_open at ffffffffa0c4c924 [mdt]
 #9 [ffff8801018cfb68] mdt_reint_rec at ffffffffa0c392a0 [mdt]
#10 [ffff8801018cfb98] mdt_reint_internal at ffffffffa0c34098 [mdt]
#11 [ffff8801018cfbe8] mdt_intent_reint at ffffffffa0c34555 [mdt]
#12 [ffff8801018cfc38] mdt_intent_policy at ffffffffa0c2fe29 [mdt]
#13 [ffff8801018cfc88] ldlm_lock_enqueue at ffffffffa06dfb42 [ptlrpc]
#14 [ffff8801018cfcf8] ldlm_handle_enqueue0 at ffffffffa06fe906 [ptlrpc]
#15 [ffff8801018cfd68] mdt_enqueue at ffffffffa0c2fa9a [mdt]
#16 [ffff8801018cfd98] mdt_handle_common at ffffffffa0c287ad [mdt]
#17 [ffff8801018cfde8] mdt_regular_handle at ffffffffa0c29405 [mdt]
#18 [ffff8801018cfdf8] ptlrpc_main at ffffffffa0724ad1 [ptlrpc]
#19 [ffff8801018cff48] kernel_thread at ffffffff8100c14a

Ned Bass (Inactive) added a comment - 14/Aug/12 5:37 PM We had a repeat of this bug, even though we're running the LU-81 patch. We have a crash dump I can upload if anyone wants to take a look. There is one thread stuck in llog_cancel_rec() PID: 5892 TASK: ffff880776d34040 CPU: 7 COMMAND: "ldlm_cn_57" #0 [ffff8807b50a94e0] schedule at ffffffff814ef052 #1 [ffff8807b50a95a8] start_this_handle at ffffffffa0a9e072 [jbd2] #2 [ffff8807b50a9668] jbd2_journal_restart at ffffffffa0a9e3c1 [jbd2] #3 [ffff8807b50a96b8] ldiskfs_truncate_restart_trans at ffffffffa0ac28ba [ldiskfs] #4 [ffff8807b50a96e8] ldiskfs_clear_blocks at ffffffffa0ac7bfd [ldiskfs] #5 [ffff8807b50a9748] ldiskfs_free_data at ffffffffa0ac7de4 [ldiskfs] #6 [ffff8807b50a97a8] ldiskfs_free_branches at ffffffffa0ac8023 [ldiskfs] #7 [ffff8807b50a9808] ldiskfs_free_branches at ffffffffa0ac7f16 [ldiskfs] #8 [ffff8807b50a9868] ldiskfs_truncate at ffffffffa0ac8629 [ldiskfs] #9 [ffff8807b50a9988] ldiskfs_delete_inode at ffffffffa0ac99a0 [ldiskfs] #10 [ffff8807b50a99a8] generic_delete_inode at ffffffff81192b6e #11 [ffff8807b50a99d8] generic_drop_inode at ffffffff81192cc5 #12 [ffff8807b50a99f8] iput at ffffffff81191b12 #13 [ffff8807b50a9a18] mds_obd_destroy at ffffffffa0b90161 [mds] #14 [ffff8807b50a9b58] llog_lvfs_destroy at ffffffffa05afd62 [obdclass] #15 [ffff8807b50a9c28] llog_cancel_rec at ffffffffa05a7acf [obdclass] #16 [ffff8807b50a9c58] llog_cat_cancel_records at ffffffffa05abfc1 [obdclass] #17 [ffff8807b50a9cb8] llog_origin_handle_cancel at ffffffffa072cef6 [ptlrpc] #18 [ffff8807b50a9db8] ldlm_cancel_handler at ffffffffa06fd2cf [ptlrpc] #19 [ffff8807b50a9df8] ptlrpc_main at ffffffffa0724ad1 [ptlrpc] #20 [ffff8807b50a9f48] kernel_thread at ffffffff8100c14a One thread in llog_cat_add_rec(): PID: 5119 TASK: ffff880831f10080 CPU: 9 COMMAND: "mdt_78" #0 [ffff8806e6e7b568] schedule at ffffffff814ef052 #1 [ffff8806e6e7b630] rwsem_down_failed_common at ffffffff814f1605 #2 [ffff8806e6e7b690] rwsem_down_read_failed at ffffffff814f1796 #3 [ffff8806e6e7b6d0] call_rwsem_down_read_failed at ffffffff81278d04 #4 [ffff8806e6e7b738] llog_cat_current_log.clone.0 at ffffffffa05aa715 [obdclass] #5 [ffff8806e6e7b7d8] llog_cat_add_rec at ffffffffa05ac18a [obdclass] #6 [ffff8806e6e7b828] llog_obd_origin_add at ffffffffa05b09a6 [obdclass] #7 [ffff8806e6e7b858] llog_add at ffffffffa05b0b01 [obdclass] #8 [ffff8806e6e7b898] lov_llog_origin_add at ffffffffa091d0c4 [lov] #9 [ffff8806e6e7b918] llog_add at ffffffffa05b0b01 [obdclass] #10 [ffff8806e6e7b958] mds_llog_origin_add at ffffffffa0b915d3 [mds] #11 [ffff8806e6e7b9a8] llog_add at ffffffffa05b0b01 [obdclass] #12 [ffff8806e6e7b9e8] mds_llog_add_unlink at ffffffffa0b91a95 [mds] #13 [ffff8806e6e7ba38] mds_log_op_unlink at ffffffffa0b91f84 [mds] #14 [ffff8806e6e7ba98] mdd_unlink_log at ffffffffa0bc0e4e [mdd] #15 [ffff8806e6e7bac8] mdd_object_kill at ffffffffa0bb8242 [mdd] #16 [ffff8806e6e7bb08] mdd_finish_unlink at ffffffffa0bcb5f6 [mdd] #17 [ffff8806e6e7bb48] mdd_unlink at ffffffffa0bd010a [mdd] #18 [ffff8806e6e7bc08] cml_unlink at ffffffffa0ce4e78 [cmm] #19 [ffff8806e6e7bc58] mdt_reint_unlink at ffffffffa0c3aded [mdt] #20 [ffff8806e6e7bce8] mdt_reint_rec at ffffffffa0c392a0 [mdt] #21 [ffff8806e6e7bd18] mdt_reint_internal at ffffffffa0c34098 [mdt] #22 [ffff8806e6e7bd68] mdt_reint at ffffffffa0c34354 [mdt] #23 [ffff8806e6e7bd98] mdt_handle_common at ffffffffa0c287ad [mdt] #24 [ffff8806e6e7bde8] mdt_regular_handle at ffffffffa0c29405 [mdt] #25 [ffff8806e6e7bdf8] ptlrpc_main at ffffffffa0724ad1 [ptlrpc] #26 [ffff8806e6e7bf48] kernel_thread at ffffffff8100c14a And 115 threads stuck in jbd2_journal_start(), i.e. PID: 4075 TASK: ffff880231f93540 CPU: 15 COMMAND: "mdt_01" #0 [ffff8801018cf6a0] schedule at ffffffff814ef052 #1 [ffff8801018cf768] start_this_handle at ffffffffa0a9e072 [jbd2] #2 [ffff8801018cf828] jbd2_journal_start at ffffffffa0a9e4f0 [jbd2] #3 [ffff8801018cf878] ldiskfs_journal_start_sb at ffffffffa0aec098 [ldiskfs] #4 [ffff8801018cf888] osd_trans_start at ffffffffa0cb5e0c [osd_ldiskfs] #5 [ffff8801018cf8c8] mdd_trans_start at ffffffffa0bd9753 [mdd] #6 [ffff8801018cf8e8] mdd_create at ffffffffa0bd086b [mdd] #7 [ffff8801018cfa28] cml_create at ffffffffa0ce54d8 [cmm] #8 [ffff8801018cfa78] mdt_reint_open at ffffffffa0c4c924 [mdt] #9 [ffff8801018cfb68] mdt_reint_rec at ffffffffa0c392a0 [mdt] #10 [ffff8801018cfb98] mdt_reint_internal at ffffffffa0c34098 [mdt] #11 [ffff8801018cfbe8] mdt_intent_reint at ffffffffa0c34555 [mdt] #12 [ffff8801018cfc38] mdt_intent_policy at ffffffffa0c2fe29 [mdt] #13 [ffff8801018cfc88] ldlm_lock_enqueue at ffffffffa06dfb42 [ptlrpc] #14 [ffff8801018cfcf8] ldlm_handle_enqueue0 at ffffffffa06fe906 [ptlrpc] #15 [ffff8801018cfd68] mdt_enqueue at ffffffffa0c2fa9a [mdt] #16 [ffff8801018cfd98] mdt_handle_common at ffffffffa0c287ad [mdt] #17 [ffff8801018cfde8] mdt_regular_handle at ffffffffa0c29405 [mdt] #18 [ffff8801018cfdf8] ptlrpc_main at ffffffffa0724ad1 [ptlrpc] #19 [ffff8801018cff48] kernel_thread at ffffffff8100c14a

Peter Jones added a comment - 16/Apr/12 8:21 AM

Believed to be a duplicate of ~~LU-81~~

Peter Jones added a comment - 16/Apr/12 8:21 AM Believed to be a duplicate of LU-81

Christopher Morrone (Inactive) added a comment - 10/Apr/12 6:25 PM

Ok, I have pulled the fix from ~~LU-81~~ into the 2.1.1-llnl branch.

Christopher Morrone (Inactive) added a comment - 10/Apr/12 6:25 PM Ok, I have pulled the fix from LU-81 into the 2.1.1-llnl branch.

Oleg Drokin added a comment - 10/Apr/12 5:18 PM

Yes, that does match the ~~LU-81~~ deadlock description.

Oleg Drokin added a comment - 10/Apr/12 5:18 PM Yes, that does match the LU-81 deadlock description.

People

Assignee:: Oleg Drokin

Reporter:: Ned Bass (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 30/Mar/12 8:56 PM

Updated:: 05/May/16 2:56 AM

Resolved:: 30/Apr/13 10:52 PM