[LU-81] Some JBD2 journaling deadlock at BULL Created: 09/Feb/11  Updated: 24/Nov/17  Resolved: 29/Mar/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.0.0
Fix Version/s: Lustre 2.2.0, Lustre 2.1.2

Type: Bug Priority: Critical
Reporter: Oleg Drokin Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File hang_changelog_foreach_bt.txt    
Issue Links:
Related
Severity: 2
Bugzilla ID: 24,438
Rank (Obsolete): 4793

 Description   

BULL reports at the bugzilla that there are some possible deadlock issues on MDS with jbd2 (just run away transactions?):

At CEA, they have encountered several occurrences of the same scenario where all Lustre activity is
hung. Each time they live-debug the problem, they end-up on the MDS node where all Lustre
operations appear to be frozen.

As a consequence, MDS has to be rebooted and Lustre layer has to be restarted on it with recovery.

The MDS threads which appear to be strongly involved in the frozen situation have the following
stack traces, taken from one of the forced crash-dumps:
==================================

There are about 234 tasks with the same following stack:

PID 5250 mdt_rdpg_143
schedule()
start_this_handle()
jbd2_journal_start()
ldiskfs_journal_start_sb()
osd_trans_start()
mdd_trans_start()
cml_close()

One is with:

Pid: 4990 mdt_395
schedule()
jbd2_log_wait_commit()
jbd2_journal_stop()
__ldiskfs_journal_stop()
osd_trans_stop()
mdd_trans_stop()
mdd_attr_set()
cml_attr_set()

And another with:

Pid: 4534 "jbd2/sdd-8"
schedule()
jbd2_journal_commit_transaction()
kjournald2()
kthread()
kernel_thread()

==================================

Analyzing the crash dump shows that the task hung in jbd2_journal_commit_transaction() is in this
state since a very long time.

This problem looks like bug 16667, but unfortunately it is not applicable 'as is' as it dates back
to 1.6. Here it seems there is a race or deadlock in Lustre/JBD2 layers.
As a workaround the customer deactivated the ChangeLog feature, and since then the problem never
reoccurred. Sadly ChangeLogs are required by HSM so this workaround cannot last...

Can you see the reason for this deadlock?

I have to precise that this bug is critical as it blocks normal cluster operation (ie with HSM).



 Comments   
Comment by Alex Zhuravlev [ 15/Feb/11 ]

is there possibility to reproduce the issue and grab crash image to have access to the stacks with the offsets? or probably the customer saved the crash?

also whether changelog consumer (HSM userspace agent) were running? it's important to understand whether MDS was just generating records or records got cancelled as well.

Comment by Peter Jones [ 16/Feb/11 ]

I have added Bull to this ticket in the hope that someone there can answer Alex's question and help move this issue forward

Comment by Sebastien Buisson (Inactive) [ 16/Feb/11 ]

Hi Alex, Peter,

Thanks to open this Jira ticket.

I think CEA saved the crash dumps, but as the cluster is classified it is not possible to get them out. So please tell us precisely what you need, and I will have our on-site Support team send it (forearch bt? bt -a? ...).

I do not know if HSM userspace agent was running, I will forward this question.

Cheers,
Sebastien.

Comment by Alex Zhuravlev [ 16/Feb/11 ]

Hello Sebastien,

I think the very first info we need is detailed stacks for all the processes.

Comment by Peter Jones [ 15/Mar/11 ]

Update from Bull is that the onsite support team are working on getting this information

Comment by Cory Spitz [ 15/Mar/11 ]

Given the information presented here, I was reminded of Lustre Bug 21406. Perhaps that ticket could be inspected to see if the conditions are similar. Further, implementing the workaround from attachment 28496 (https://bugzilla.lustre.org/attachment.cgi?id=28496), which did not land to 2.x, may be a useful experiment if the problem can be easily reproduced. However, I also remember that 21406 was associated with OST threads, not MDT threads, so perhaps it doesn't apply.

Comment by Peter Jones [ 15/Mar/11 ]

Alex

Cray observed on the 2.1 call that this seems somewhat similar to bz 21760. Does this seem plausible from the evidence available?

Thanks

Peter

Comment by Peter Jones [ 15/Mar/11 ]

Johann

You were involved in 21760. Are you able to comment on this theory? If so, what evidence should the on-site Bull support staff look for to prove\disprove this theory? Is there a workaround\fix that could be tried out to see if it prevents this problem?

Thanks

Peter

Comment by Cory Spitz [ 15/Mar/11 ]

Oops, did I say 21760? I meant 21406, but I also missed that this issue was MDT related. See my earlier (edited) comment. Sorry if I caused any misdirection.

Comment by Peter Jones [ 15/Mar/11 ]

Heh. Actually, my notes said 21706 so I guess the wrong transposition Of course, Johann was still involved in that one too so my question still stands with the corrected id...

Comment by Johann Lombardi (Inactive) [ 16/Mar/11 ]

I don't think that bugzilla ticket 21706 is related to this issue.
As Cory noted, 21706 is a problem with OSS read cache whereas this bug is a MDS deadlock.

That said, i have noticed that the jbd2-commit-timer-no-jiffies-rounding.diff patch
is missing from the RHEL6 patch series. The round_jiffies() used to cause significant
delays in transaction commit, see bugzilla ticket 19321. I am not totally sure this
can fix this bug, but it is worth adding the missing patch and trying again IMO.

HTH

Comment by Peter Jones [ 24/Mar/11 ]

I think that the patch Johann mentions is http://review.whamcloud.com/#change,358

Comment by Peter Jones [ 04/Apr/11 ]

Any word back from CEA as to whether this issue still manifests itself with the missing patch applied?

Comment by Peter Jones [ 04/Apr/11 ]

Any word back from CEA as to whether this issue still manifests itself with the missing patch applied?

Comment by Sebastien Buisson (Inactive) [ 04/Apr/11 ]

Hi Peter,

Still no news from CEA on this. At least we will have more information on Thursday.

Cheers,
Sebastien.

Comment by Sebastien Buisson (Inactive) [ 06/Apr/11 ]

Hi,

Bad news from CEA. They reactivated Changelogs yesterday evening, and this bug appeared this afternoon.
They are currently running with a patched kernel including the patch lustre-jbd2-commit-timer-no-jiffies-rounding-rhel6.patch.

Any 'new' ideas on how to tackle this issue?

Sebastien.

Comment by Johann Lombardi (Inactive) [ 06/Apr/11 ]

Not without looking at the crash dump.
Have you tried to reproduce the problem on one of your internal cluster?

Comment by Sebastien Buisson (Inactive) [ 06/Apr/11 ]

Hi Johann,

> Not without looking at the crash dump.
Unfortunately the crash dump cannot be taken out of CEA. What crash commands would you like Bruno to launch?

> Not without looking at the crash dump.
If Changelogs are activated by default, then yes. Indeed, CEA hits this bug as soon as they activate the Changelogs. But internally we have never seen it.

Cheers,
Sebastien.

Comment by Johann Lombardi (Inactive) [ 06/Apr/11 ]

Hi Sebastien,

No, changelogs are not activated by default, you need to register a changelog user to enable it.
As for the crash dump, we need to look at the jbd structures to understand why everyone thinks
that the transaction is committing while the jdb commit thread is sitting idle.

Comment by Alex Zhuravlev [ 06/Apr/11 ]

> Unfortunately the crash dump cannot be taken out of CEA. What crash commands would you like Bruno to launch?

list of all the threads with backtraces would be a good start.

Comment by Peter Jones [ 07/Apr/11 ]

update from Bull "problem reoccurred yesterday, after less than 24 hours with ChangeLogs activated.
CEA will be able to send 'foreach bt' from live debug (no dump available)"

Comment by Peter Jones [ 21/Apr/11 ]

As per Bull, CEA do not expect to be able to gather this data until the end of May.

Comment by Sebastien Buisson (Inactive) [ 01/Jun/11 ]

Hi,

Here is the the long-time awaited 'foreach bt' (in fact the Alt+SysRq+T console output taken live during one occurrence of the problem).

Cheers,
Sebastien.

Comment by Alex Zhuravlev [ 22/Jun/11 ]

PID: 26299 TASK: ffff88047d851620 CPU: 28 COMMAND: "llog_process_th"
#0 [ffff880998a65900] schedule at ffffffff81452851
#1 [ffff880998a659c8] start_this_handle at ffffffffa08ec0d7
#2 [ffff880998a65a88] jbd2_journal_start at ffffffffa08ec520
#3 [ffff880998a65ad8] ldiskfs_journal_start_sb at ffffffffa0936fb8
#4 [ffff880998a65ae8] fsfilt_ldiskfs_write_record at ffffffffa098a0fc
#5 [ffff880998a65b68] llog_lvfs_write_blob at ffffffffa050917c
#6 [ffff880998a65c18] llog_lvfs_write_rec at ffffffffa050a722
#7 [ffff880998a65cf8] llog_cancel_rec at ffffffffa05010a4
#8 [ffff880998a65d58] llog_cat_cancel_records at ffffffffa0505de2
#9 [ffff880998a65de8] llog_changelog_cancel_cb at ffffffffa099ec12
#10 [ffff880998a65e68] llog_process_thread at ffffffffa0503573
#11 [ffff880998a65f48] kernel_thread at ffffffff8100d1aa

PID: 22091 TASK: ffff8808695bad90 CPU: 22 COMMAND: "mdt_attr_101"
#0 [ffff8808695c1358] schedule at ffffffff81452851
#1 [ffff8808695c1420] rwsem_down_failed_common at ffffffff81454cb5
#2 [ffff8808695c1470] rwsem_down_read_failed at ffffffff81454e46
#3 [ffff8808695c14b0] call_rwsem_down_read_failed at ffffffff81248024
#4 [ffff8808695c1518] llog_cat_current_log.clone.0 at ffffffffa05068a5
#5 [ffff8808695c15f8] llog_cat_add_rec at ffffffffa050785a
#6 [ffff8808695c1678] llog_obd_origin_add at ffffffffa050e196
#7 [ffff8808695c16d8] llog_add at ffffffffa050e371
#8 [ffff8808695c1748] mdd_changelog_llog_write at ffffffffa09dc905
#9 [ffff8808695c17c8] mdd_changelog_data_store at ffffffffa09b3c1a
#10 [ffff8808695c1858] mdd_attr_set at ffffffffa09bc0a2
#11 [ffff8808695c1968] cml_attr_set at ffffffffa0a9775f
#12 [ffff8808695c19c8] mdt_attr_set at ffffffffa0a256e4
#13 [ffff8808695c1a58] mdt_reint_setattr at ffffffffa0a25e36
#14 [ffff8808695c1ae8] mdt_reint_rec at ffffffffa0a2167f
#15 [ffff8808695c1b38] mdt_reint_internal at ffffffffa0a18a34
#16 [ffff8808695c1bc8] mdt_reint at ffffffffa0a18d9c
#17 [ffff8808695c1c18] mdt_handle_common at ffffffffa0a0da45
#18 [ffff8808695c1c98] mdt_regular_handle at ffffffffa0a0ea55
#19 [ffff8808695c1ca8] ptlrpc_server_handle_request at ffffffffa0648b11
#20 [ffff8808695c1de8] ptlrpc_main at ffffffffa0649f0a
#21 [ffff8808695c1f48] kernel_thread at ffffffff8100d1aa

seem to be known ordering issue with journal_start() vs. catalog semaphore.
I'm looking for the bug in bugzilla ...

Comment by Sebastien Buisson (Inactive) [ 12/Jul/11 ]

Hi,

Any news about this?

TIA,
Sebastien.

Comment by Alex Zhuravlev [ 18/Jul/11 ]

Hello Sebastien,

the fix I was thinking of was work being done for the Sequoia project. we can't land it onto master due
to many related changes in the code. probably someone else work on a different workaround for master branch.

in general, canceling code should follow "start transaction first, then do locking in llog" rule.

Comment by Alexandre Louvet [ 02/Aug/11 ]

Hi,

Just wanted to report that the hit frequency of this problem did increased recently. We are know at about 2 or 3 hangs a day. Is there anything we can provide to help ?

Alex.

Comment by Diego Moreno (Inactive) [ 04/Aug/11 ]

Hi,

As the priority of this issue is rising up, just another question: do you think we can deploy any kind of work-around (different from just "deactivate changelog", of course)?

Thanks,

Comment by Peter Jones [ 04/Aug/11 ]

Niu

Could you please look into a workaround\fix for this issue that will work with the existing master code?

Thanks

Peter

Comment by Patrick Valentin (Inactive) [ 05/Aug/11 ]

Hi,
I have prepared a compressed tarball file containing the kernel core dump and the kernel image, but the file size is about 460 Mb and I just saw that attachements are limited to 10 Mb. Do you have a ftp server on which I can put this file, or is there another way to provide large files.

TIA
Cheers
Patrick

Comment by Peter Jones [ 06/Aug/11 ]

Patrick

I have sent you information on this

Peter

Comment by Patrick Valentin (Inactive) [ 09/Aug/11 ]

The tarball containing the kernel core dump and kernel image is available on whamcloud ftp server.
The first transfer was aborted and the "dump.tar.gz" corresponding file is to be removed.
The name of the correct file is "20110809_1021_dump.tar.gz". Its size is 468396374 bytes and the "sum" command gives:

  1. sum 20110809_1021_dump.tar.gz
    65153 457419
    It contains:
    the kernel core dump: vmcore
    the kernel image: vmlinux_2.6.32-71.14.1.el6.Bull.23

It must be analysed on a 2.6.32 kernel using the corresponding crash command (5.0.0).

In case of troubles, crash_5.1.7 ("crash_5.1.7.tar.gz") is also available in the traball. To use it, you have to set the following variables:
export PATH=/your/home/local/crash/5.1.7/bin:${PATH}
export CRASH_EXTENSIONS=/your/home/local/crash/5.1.7/usr/lib64/extensions

Let me know if you need additional information

Regards,
Patrick

Comment by Niu Yawei (Inactive) [ 17/Aug/11 ]

Hi, Alex/Johann

Given that it's difficult to port the Orion llog changes onto master, I think we could probably introduce a simple workaround temporarily for master: Let's invent a rw lock for each mdd to protect the changelog, each changelog adding will takes the read lock, and the changelog cancelling has to hold the write lock, since changelog cancelling only happens when user issue the changelog clear command, I think the performance impact will be acceptable.

Considering it's just a temporary workaround, I want to minimize the code changes as much as possible, and another advantage of this approach is that it doesn't affect other llog users except the changelog.

If this workaround sounds ok to you, I'll make the patch soon. Thanks

Comment by Diego Moreno (Inactive) [ 17/Aug/11 ]

Hi Niu,

From my point of view I think this is what we are looking for. Just a work-around based on a simple lock, with a moderate impact on performances.

Comment by Johann Lombardi (Inactive) [ 17/Aug/11 ]

Could we start a transaction earlier like done in bugzilla 18030?
If too complicated/intrusive, then i'm fine with the brute-force locking.

Comment by Niu Yawei (Inactive) [ 18/Aug/11 ]

Ok, I tried to make a patch which start transaction before catlog locking in llog_cat_cancel_records(). Thanks.

http://review.whamcloud.com/1260

Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » i686,server,el6,inkernel #397
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision 4ecb94432df9f0f0866538f64b27f006562eae13)

Result = SUCCESS
Oleg Drokin : 4ecb94432df9f0f0866538f64b27f006562eae13
Files :

  • lustre/mdd/mdd_device.c
  • lustre/mds/mds_log.c
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » x86_64,server,el5,ofa #397
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision 4ecb94432df9f0f0866538f64b27f006562eae13)

Result = FAILURE
Oleg Drokin : 4ecb94432df9f0f0866538f64b27f006562eae13
Files :

  • lustre/mdd/mdd_device.c
  • lustre/mds/mds_log.c
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » x86_64,client,el5,ofa #397
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision 4ecb94432df9f0f0866538f64b27f006562eae13)

Result = SUCCESS
Oleg Drokin : 4ecb94432df9f0f0866538f64b27f006562eae13
Files :

  • lustre/mds/mds_log.c
  • lustre/mdd/mdd_device.c
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #397
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision 4ecb94432df9f0f0866538f64b27f006562eae13)

Result = SUCCESS
Oleg Drokin : 4ecb94432df9f0f0866538f64b27f006562eae13
Files :

  • lustre/mdd/mdd_device.c
  • lustre/mds/mds_log.c
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #397
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision 4ecb94432df9f0f0866538f64b27f006562eae13)

Result = SUCCESS
Oleg Drokin : 4ecb94432df9f0f0866538f64b27f006562eae13
Files :

  • lustre/mdd/mdd_device.c
  • lustre/mds/mds_log.c
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » i686,client,el6,inkernel #397
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision 4ecb94432df9f0f0866538f64b27f006562eae13)

Result = SUCCESS
Oleg Drokin : 4ecb94432df9f0f0866538f64b27f006562eae13
Files :

  • lustre/mds/mds_log.c
  • lustre/mdd/mdd_device.c
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #397
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision 4ecb94432df9f0f0866538f64b27f006562eae13)

Result = SUCCESS
Oleg Drokin : 4ecb94432df9f0f0866538f64b27f006562eae13
Files :

  • lustre/mdd/mdd_device.c
  • lustre/mds/mds_log.c
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #397
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision 4ecb94432df9f0f0866538f64b27f006562eae13)

Result = SUCCESS
Oleg Drokin : 4ecb94432df9f0f0866538f64b27f006562eae13
Files :

  • lustre/mds/mds_log.c
  • lustre/mdd/mdd_device.c
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » i686,server,el5,inkernel #397
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision 4ecb94432df9f0f0866538f64b27f006562eae13)

Result = SUCCESS
Oleg Drokin : 4ecb94432df9f0f0866538f64b27f006562eae13
Files :

  • lustre/mds/mds_log.c
  • lustre/mdd/mdd_device.c
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » i686,server,el5,ofa #397
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision 4ecb94432df9f0f0866538f64b27f006562eae13)

Result = SUCCESS
Oleg Drokin : 4ecb94432df9f0f0866538f64b27f006562eae13
Files :

  • lustre/mdd/mdd_device.c
  • lustre/mds/mds_log.c
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #397
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision 4ecb94432df9f0f0866538f64b27f006562eae13)

Result = SUCCESS
Oleg Drokin : 4ecb94432df9f0f0866538f64b27f006562eae13
Files :

  • lustre/mds/mds_log.c
  • lustre/mdd/mdd_device.c
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » i686,client,el5,inkernel #397
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision 4ecb94432df9f0f0866538f64b27f006562eae13)

Result = SUCCESS
Oleg Drokin : 4ecb94432df9f0f0866538f64b27f006562eae13
Files :

  • lustre/mds/mds_log.c
  • lustre/mdd/mdd_device.c
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #397
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision 4ecb94432df9f0f0866538f64b27f006562eae13)

Result = SUCCESS
Oleg Drokin : 4ecb94432df9f0f0866538f64b27f006562eae13
Files :

  • lustre/mds/mds_log.c
  • lustre/mdd/mdd_device.c
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » i686,client,el5,ofa #397
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision 4ecb94432df9f0f0866538f64b27f006562eae13)

Result = SUCCESS
Oleg Drokin : 4ecb94432df9f0f0866538f64b27f006562eae13
Files :

  • lustre/mdd/mdd_device.c
  • lustre/mds/mds_log.c
Comment by Peter Jones [ 04/Jan/12 ]

Landed for 2.2

Comment by Peter Jones [ 05/Jan/12 ]

Bull report that this has reocurred at CEA

Comment by Bruno Faccini (Inactive) [ 05/Jan/12 ]

So seems that the work-around ("patch which start transaction before catlog locking in
llog_cat_cancel_records()") described in JIRA LU-81 is not sufficient and we may need a patch to implement
"the brute-force locking", the alternate solution already described in LU-81 ....

What do you think ???

Comment by Niu Yawei (Inactive) [ 05/Jan/12 ]

Could you provide the statck trace? If we don't know the exact reason, I'm afraid that the brute-force lock can't resolve the problem too.

Comment by Bruno Faccini (Inactive) [ 10/Jan/12 ]

This last time, the thread hung since a long time in jbd2_journal_commit_transaction() is named "jbd2/dm-0-8" but still with the same stack !!! :
==========================
Pid: 23067 "jbd2/dm-0-8"
schedule()
jbd2_journal_commit_transaction()
kjournald2()
kthread()
kernel_thread()
==========================

then there is a bunch of other Lustre threads (ll_<...>, mdt_[rdpg_]<id>, ...) stuck with the same/following stack's ending stages :
====================================================================================================================================
schedule()
start_this_handle()
jbd2_journal_start()
ldiskfs_journal_start_sb()

......

====================================================================================================================================

Comment by Niu Yawei (Inactive) [ 10/Jan/12 ]

hi, Bruno, is there exact full stack trace available?

Comment by Peter Jones [ 29/Mar/12 ]

Landed for 2.2. Bull advised separately that this issue no longer exists with the fix

Comment by Build Master (Inactive) [ 08/Apr/12 ]

Integrated in lustre-b2_1 » x86_64,client,sles11,inkernel #41
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

Result = SUCCESS
Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
Files :

  • lustre/mdd/mdd_device.c
  • lustre/mds/mds_log.c
Comment by Build Master (Inactive) [ 08/Apr/12 ]

Integrated in lustre-b2_1 » i686,client,el6,inkernel #41
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

Result = SUCCESS
Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
Files :

  • lustre/mdd/mdd_device.c
  • lustre/mds/mds_log.c
Comment by Build Master (Inactive) [ 08/Apr/12 ]

Integrated in lustre-b2_1 » x86_64,server,el6,inkernel #41
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

Result = SUCCESS
Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
Files :

  • lustre/mdd/mdd_device.c
  • lustre/mds/mds_log.c
Comment by Build Master (Inactive) [ 08/Apr/12 ]

Integrated in lustre-b2_1 » i686,client,el5,ofa #41
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

Result = SUCCESS
Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
Files :

  • lustre/mdd/mdd_device.c
  • lustre/mds/mds_log.c
Comment by Build Master (Inactive) [ 08/Apr/12 ]

Integrated in lustre-b2_1 » x86_64,server,el5,ofa #41
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

Result = SUCCESS
Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
Files :

  • lustre/mdd/mdd_device.c
  • lustre/mds/mds_log.c
Comment by Build Master (Inactive) [ 08/Apr/12 ]

Integrated in lustre-b2_1 » x86_64,client,el6,inkernel #41
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

Result = SUCCESS
Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
Files :

  • lustre/mds/mds_log.c
  • lustre/mdd/mdd_device.c
Comment by Build Master (Inactive) [ 08/Apr/12 ]

Integrated in lustre-b2_1 » i686,server,el6,inkernel #41
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

Result = SUCCESS
Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
Files :

  • lustre/mds/mds_log.c
  • lustre/mdd/mdd_device.c
Comment by Build Master (Inactive) [ 08/Apr/12 ]

Integrated in lustre-b2_1 » x86_64,client,el5,inkernel #41
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

Result = SUCCESS
Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
Files :

  • lustre/mdd/mdd_device.c
  • lustre/mds/mds_log.c
Comment by Build Master (Inactive) [ 08/Apr/12 ]

Integrated in lustre-b2_1 » i686,server,el5,inkernel #41
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

Result = SUCCESS
Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
Files :

  • lustre/mds/mds_log.c
  • lustre/mdd/mdd_device.c
Comment by Build Master (Inactive) [ 08/Apr/12 ]

Integrated in lustre-b2_1 » x86_64,server,el5,inkernel #41
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

Result = SUCCESS
Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
Files :

  • lustre/mds/mds_log.c
  • lustre/mdd/mdd_device.c
Comment by Build Master (Inactive) [ 08/Apr/12 ]

Integrated in lustre-b2_1 » i686,server,el5,ofa #41
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

Result = SUCCESS
Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
Files :

  • lustre/mds/mds_log.c
  • lustre/mdd/mdd_device.c
Comment by Build Master (Inactive) [ 08/Apr/12 ]

Integrated in lustre-b2_1 » x86_64,client,el5,ofa #41
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

Result = SUCCESS
Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
Files :

  • lustre/mds/mds_log.c
  • lustre/mdd/mdd_device.c
Comment by Build Master (Inactive) [ 08/Apr/12 ]

Integrated in lustre-b2_1 » i686,client,el5,inkernel #41
LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

Result = SUCCESS
Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
Files :

  • lustre/mds/mds_log.c
  • lustre/mdd/mdd_device.c
Comment by Nathan Rutman [ 30/Jul/12 ]

So seems that the work-around ("patch which start transaction before catlog locking in
llog_cat_cancel_records()") described in JIRA LU-81 is not sufficient and we may need a patch to implement
"the brute-force locking", the alternate solution already described in LU-81 ....

What do you think ???

I believe there is a general problem here that is not resolved by simply increasing the journal credits, which really just serves to mask the problem in some cases. We're looking at a case now where cancelling lots of unlink records results in a similar lock inversion caused by the journal restart in the llog updates. The code really needs to be changed so that the lock inversion can't happen.

Comment by Bruno Faccini (Inactive) [ 01/Aug/12 ]

I understand there are strong assumptions that we don't have a definitive fix for this quite un-frequent problem/dead-lock actually ... And BTW, I just got a new occurence of this same scenario, but on an OSS this time running with Lustre 2.1.1 and a Kernel version 2.6.32-131.12.1 which contains the JBD2 patch jbd2-commit-timer-no-jiffies-rounding.diff patch ...

The involved hung thread's stacks look about the same :
=======================================================
PID: 19269 TASK: ffff880470699340 CPU: 1 COMMAND: "ll_ost_io_249"
#0 [ffff88047069f520] schedule at ffffffff8147bdd9
0000001 [ffff88047069f5e8] jbd2_log_wait_commit at ffffffffa00867a5
0000002 [ffff88047069f678] fsfilt_ldiskfs_commit_wait at ffffffffa07bf25e
0000003 [ffff88047069f6c8] filter_commitrw_write at ffffffffa0a794c9
0000004 [ffff88047069f908] filter_commitrw at ffffffffa0a6b33d
0000005 [ffff88047069f9c8] obd_commitrw at ffffffffa069df5a
0000006 [ffff88047069fa48] ost_brw_write at ffffffffa06a7922
0000007 [ffff88047069fbf8] ost_handle at ffffffffa06abcd5
0000008 [ffff88047069fd68] ptlrpc_main at ffffffffa07103e9
0000009 [ffff88047069ff48] kernel_thread at ffffffff810041aa

PID: 15704 TASK: ffff88062c52c0c0 CPU: 4 COMMAND: "jbd2/dm-5-8"
#0 [ffff8804c1467c50] schedule at ffffffff8147bdd9
0000001 [ffff8804c1467d18] jbd2_journal_commit_transaction at ffffffffa0080970
0000002 [ffff8804c1467e68] kjournald2 at ffffffffa0086b48
0000003 [ffff8804c1467ee8] kthread at ffffffff8107ad36
0000004 [ffff8804c1467f48] kernel_thread at ffffffff810041aa

and many other like this one

PID: 15892 TASK: ffff88062c73f4c0 CPU: 4 COMMAND: "ll_ost_io_36"
#0 [ffff8804bd81b430] schedule at ffffffff8147bdd9
0000001 [ffff8804bd81b4f8] start_this_handle at ffffffffa007f092
0000002 [ffff8804bd81b5b8] jbd2_journal_start at ffffffffa007f510
0000003 [ffff8804bd81b608] ldiskfs_journal_start_sb at ffffffffa0a13758
0000004 [ffff8804bd81b618] fsfilt_ldiskfs_brw_start at ffffffffa07bf792
0000005 [ffff8804bd81b6c8] filter_commitrw_write at ffffffffa0a78cb8
0000006 [ffff8804bd81b908] filter_commitrw at ffffffffa0a6b33d
0000007 [ffff8804bd81b9c8] obd_commitrw at ffffffffa069df5a
0000008 [ffff8804bd81ba48] ost_brw_write at ffffffffa06a7922
0000009 [ffff8804bd81bbf8] ost_handle at ffffffffa06abcd5
0000010 [ffff8804bd81bd68] ptlrpc_main at ffffffffa07103e9
0000011 [ffff8804bd81bf48] kernel_thread at ffffffff810041aa
=======================================================

But since we are on an OSS this can not be implied from any "llog" activity, can we just consider that we are back on a "pure" JBD2 issue there ???

Comment by William Power [ 01/Aug/12 ]

Bruno - can you post/attach the full set of stack traces for this lockup.

Generated at Sat Feb 10 01:03:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.