[LU-4271] mds load goes very high and filesystem hangs after mounting mdt Created: 19/Nov/13  Updated: 14/Aug/14  Resolved: 14/Aug/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.5
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Mahmoud Hanafi Assignee: Bruno Faccini (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None

Attachments: File nbp8-mds1-dmesg.mar25.gz     File service200-20131119    
Issue Links:
Related
is related to LU-4335 MDS hangs due to mdt thread hung/inac... Resolved
Severity: 4
Rank (Obsolete): 11731

 Description   

after recover of a crashed mds the system load goes to >800
Filesystem is DOWN. We need help to bring the filesystem online!

here is the error
Lustre: Skipped 2 previous similar messages
Lustre: Service thread pid 7014 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Pid: 7014, comm: mdt_01

Call Trace:
[<ffffffff8151d552>] schedule_timeout+0x192/0x2e0
[<ffffffff8107bf80>] ? process_timeout+0x0/0x10
[<ffffffffa04e45e1>] cfs_waitq_timedwait+0x11/0x20 [libcfs]
[<ffffffffa0da2508>] osc_create+0x528/0xdc0 [osc]
[<ffffffff8105fab0>] ? default_wake_function+0x0/0x20
[<ffffffffa0e13337>] lov_check_and_create_object+0x187/0x570 [lov]
[<ffffffffa0e13a1b>] qos_remedy_create+0x1db/0x220 [lov]
[<ffffffffa0e1059a>] lov_fini_create_set+0x24a/0x1200 [lov]
[<ffffffffa0dfa0f2>] lov_create+0x792/0x1400 [lov]
[<ffffffffa11000d6>] ? mdd_get_md+0x96/0x2f0 [mdd]
[<ffffffff8105fab0>] ? default_wake_function+0x0/0x20
[<ffffffffa1120916>] ? mdd_read_unlock+0x26/0x30 [mdd]
[<ffffffffa110490e>] mdd_lov_create+0x9ee/0x1ba0 [mdd]
[<ffffffffa1116871>] mdd_create+0xf81/0x1a90 [mdd]
[<ffffffffa121edf3>] ? osd_oi_lookup+0x83/0x110 [osd_ldiskfs]
[<ffffffffa121956c>] ? osd_object_init+0xdc/0x3e0 [osd_ldiskfs]
[<ffffffffa124f3f7>] cml_create+0x97/0x250 [cmm]
[<ffffffffa118b5e1>] ? mdt_version_get_save+0x91/0xd0 [mdt]
[<ffffffffa11a106e>] mdt_reint_open+0x1aae/0x28a0 [mdt]
[<ffffffffa077a724>] ? lustre_msg_add_version+0x74/0xd0 [ptlrpc]
[<ffffffffa111956e>] ? md_ucred+0x1e/0x60 [mdd]
[<ffffffffa1189c81>] mdt_reint_rec+0x41/0xe0 [mdt]
[<ffffffffa1180ed4>] mdt_reint_internal+0x544/0x8e0 [mdt]
[<ffffffffa118153d>] mdt_intent_reint+0x1ed/0x530 [mdt]
[<ffffffffa117fc09>] mdt_intent_policy+0x379/0x690 [mdt]
[<ffffffffa0736351>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
[<ffffffffa075c1ad>] ldlm_handle_enqueue0+0x48d/0xf50 [ptlrpc]
[<ffffffffa1180586>] mdt_enqueue+0x46/0x130 [mdt]
[<ffffffffa1175772>] mdt_handle_common+0x932/0x1750 [mdt]
[<ffffffffa1176665>] mdt_regular_handle+0x15/0x20 [mdt]
[<ffffffffa078ab4e>] ptlrpc_main+0xc4e/0x1a40 [ptlrpc]
[<ffffffffa0789f00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffffa0789f00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
[<ffffffffa0789f00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
[<ffffffff8100c0c0>] ? child_rip+0x0/0x20



 Comments   
Comment by Mahmoud Hanafi [ 19/Nov/13 ]

Console log showing the start of the issue.

Comment by Bruno Faccini (Inactive) [ 19/Nov/13 ]

Hello mahmoud,
A load of 800 is quite huge, but it does not mean there are 800 threads being runnable/schedulable at the same time because threads in UNinterruptible state are also take into account.

I suppose the stack you attached is one that shows-up frequently during the problem, are there others showing attached/running/spinning processes instead blocked ones ?

This stack indicates we are waiting for OSS reply, did you check health of OSSes ??

Comment by Bruno Faccini (Inactive) [ 19/Nov/13 ]

Having a look to the Console log you attached, I think more interesting is the fact there are multiple threads reported as stuck in JBD2 layer. And particulary the jbd2/dm-3-8 instances with stack like :

sp                ip                Function (args)
0xffff881e643c9c60 0xffffffff8151c712 thread_return
0xffff881e643c9d28 0xffffffffa0c338df [jbd2]jbd2_journal_commit_transaction+0x19f (0xffff883f0c696000)
0xffff881e643c9e68 0xffffffffa0c3a6c8 [jbd2]kjournald2+0xb8 (0xffff883f0c696000)
0xffff881e643c9ee8 0xffffffff8108fb96 kthread+0x96 (0xffff881e641458e8)
0xffff881e643c9f48 0xffffffff8100c0ca child_rip+0xa (unknown, unknown)

This looks like situation described in LU-81, so are the ChangeLogs activated ?

Comment by Mahmoud Hanafi [ 19/Nov/13 ]

we were able to get things working by
1. mounting the mgs
2. then all ost
3. then waiting for a bit
4. then mdt
I checked we have the patch that was part LU-81

Comment by Bruno Faccini (Inactive) [ 19/Nov/13 ]

Hello Mahmoud, since you have been able to restart your file-system, can we downgrade the priority from blocker ?
Also is the "dm-3-8" device, with the JBD2 ops stuck, the one where stands your MDT ??

Comment by Jay Lan (Inactive) [ 19/Nov/13 ]

Please note that I cherry-picked patch set #3 of LU-2943, which has not closed yet into our server codes. We were hit with the same problem in LU-2943 multiple times.

Comment by Mahmoud Hanafi [ 19/Nov/13 ]

Yes we can downgrade it from blocker

dm-3-8 is the mdt
dm-2-8 is the mgs

I have gone through the raid logs and nothing shows up there.
And there nothing in the logs calling out any host to raid errors.

This is the first trace that is printed. Wouldn't this suggest that it is stuck in ko2iblnd?
Nov 18 18:04:47 nbp8-mds1 kernel: INFO: task ldlm_cn_00:5870 blocked for more than 120 seconds.
Nov 18 18:04:47 nbp8-mds1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 18 18:04:51 nbp8-mds1 kernel: ldlm_cn_00 D 0000000000000003 0 5870 2 0x00000080
Nov 18 18:04:51 nbp8-mds1 kernel: ffff881e8189fb30 0000000000000046 0000000000000000 ffffffffa0925cd0
Nov 18 18:04:51 nbp8-mds1 kernel: ffff881e8189fad0 000000004a6d8511 ffff882000000000 ffff881e65f5bc27
Nov 18 18:04:51 nbp8-mds1 kernel: ffff881fac6d5098 ffff881e8189ffd8 000000000000fc40 ffff881fac6d5098
Nov 18 18:04:51 nbp8-mds1 kernel: Call Trace:
Nov 18 18:04:51 nbp8-mds1 kernel: [<ffffffffa0925cd0>] ? kiblnd_send+0x2a0/0x9e0 [ko2iblnd]
Nov 18 18:04:51 nbp8-mds1 kernel: [<ffffffffa0c3214a>] start_this_handle+0x27a/0x4a0 [jbd2]
Nov 18 18:04:51 nbp8-mds1 kernel: [<ffffffff8108ff00>] ? autoremove_wake_function+0x0/0x40
Nov 18 18:04:51 nbp8-mds1 kernel: [<ffffffffa0c32570>] jbd2_journal_start+0xd0/0x110 [jbd2]
Nov 18 18:04:51 nbp8-mds1 kernel: [<ffffffffa0c8d338>] ldiskfs_journal_start_sb+0x58/0x90 [ldiskfs]
Nov 18 18:04:52 nbp8-mds1 kernel: [<ffffffffa0d30017>] fsfilt_ldiskfs_start+0x77/0x5e0 [fsfilt_ldiskfs]
Nov 18 18:04:52 nbp8-mds1 kernel: [<ffffffffa07a9ac0>] llog_origin_handle_cancel+0x4b0/0xd70 [ptlrpc]
Nov 18 18:04:52 nbp8-mds1 kernel: [<ffffffffa04f9923>] ? cfs_alloc+0x63/0x90 [libcfs]
Nov 18 18:04:52 nbp8-mds1 kernel: [<ffffffffa0630ebf>] ? keys_fill+0x6f/0x1a0 [obdclass]
Nov 18 18:04:52 nbp8-mds1 kernel: [<ffffffffa076f71f>] ldlm_cancel_handler+0x1bf/0x5e0 [ptlrpc]
Nov 18 18:04:52 nbp8-mds1 kernel: [<ffffffffa079fb4e>] ptlrpc_main+0xc4e/0x1a40 [ptlrpc]
Nov 18 18:04:52 nbp8-mds1 kernel: [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
Nov 18 18:04:52 nbp8-mds1 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
Nov 18 18:04:52 nbp8-mds1 kernel: [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
Nov 18 18:04:52 nbp8-mds1 kernel: [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
Nov 18 18:04:52 nbp8-mds1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

Comment by Mahmoud Hanafi [ 20/Nov/13 ]

What do these error mean..
LustreError: 5906:0:(obd_class.h:503:obd_set_info_async()) obd_set_info_async: dev 0 no operation

We see a lot of these.

Comment by Bruno Faccini (Inactive) [ 25/Nov/13 ]

The "? kiblnd_send+0x2a0/0x9e0 [ko2iblnd]" entry in the stack is likely to be an old+stale reference found by the unwinder.

The obd_set_info_async() msgs should have been printed upon your MDS+MGS restart and are likely to occur during massive MGS re-connect from all Clients. It just means probably that target_handle_connect() generically (when some cleanup need in import/export is detected) calls obd_set_info_async() for the MGS device but it has no o_set_info_async method/operation available to proceed. So we can easily consider msg is harmless.

Comment by Bruno Faccini (Inactive) [ 25/Nov/13 ]

I downgraded priority to Major, is it ok ?? I don't think there is a lot more to be done for this ticket, my feeling is that we faced a JBD2 issue upon MDS restart+reconnection, and it is unlikely to re-occur but if this happen, we will need a crash-dump.

Comment by John Fuchs-Chesney (Inactive) [ 14/Mar/14 ]

Hello Mahmoud,
Do you want us to keep this issue open?
Have you seen any recurrence of the problem?
Thanks,
~ jfc.

Comment by Mahmoud Hanafi [ 26/Mar/14 ]

We have hit this bug again. Here is the details of the stack

Lustre: 7240:0:(ldlm_lib.c:952:target_handle_connect()) Skipped 1 previous similar message^M
Lustre: nbp8-MDT0000: haven't heard from client fae2954d-57b1-3826-45b9-252e2516d6f0 (at 10.151.55.93@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff883f62fef400, cur 1395779273 expire 1395779123 last 1395779046^M
Lustre: 7228:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from e3993dc5-8b59-86ee-9304-faf298c767b4@10.151.23.158@o2ib t0 exp (null) cur 1395779280 last 0^M
Lustre: 7228:0:(ldlm_lib.c:952:target_handle_connect()) Skipped 1 previous similar message^M
Lustre: 7398:0:(quota_interface.c:543:quota_chk_acq_common()) still haven't managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)^M
INFO: task ldlm_cn_01:6369 blocked for more than 120 seconds.^M
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.^M
ldlm_cn_01    D 000000000000001d     0  6369      2 0x00000080^M
 ffff881e9b8cdb30 0000000000000046 0000000000000000 ffffffffa0925cd0^M
 ffff881e9b8cdad0 00000000fdc0d821 ffff883feeb58e00 ffff883e4235124d^M
 ffff881ff32bf098 ffff881e9b8cdfd8 000000000000fc40 ffff881ff32bf098^M
Call Trace:^M
 [<ffffffffa0925cd0>] ? kiblnd_send+0x2a0/0x9e0 [ko2iblnd]^M
 [<ffffffffa055c14a>] start_this_handle+0x27a/0x4a0 [jbd2]^M
 [<ffffffff8108ff00>] ? autoremove_wake_function+0x0/0x40^M
 [<ffffffffa055c570>] jbd2_journal_start+0xd0/0x110 [jbd2]^M
 [<ffffffffa08e6338>] ldiskfs_journal_start_sb+0x58/0x90 [ldiskfs]^M
 [<ffffffffa072c017>] fsfilt_ldiskfs_start+0x77/0x5e0 [fsfilt_ldiskfs]^M
 [<ffffffffa07a9ac0>] llog_origin_handle_cancel+0x4b0/0xd70 [ptlrpc]^M
 [<ffffffffa04f9923>] ? cfs_alloc+0x63/0x90 [libcfs]^M
 [<ffffffffa0630ebf>] ? keys_fill+0x6f/0x1a0 [obdclass]^M
 [<ffffffffa076f71f>] ldlm_cancel_handler+0x1bf/0x5e0 [ptlrpc]^M
 [<ffffffffa079fb4e>] ptlrpc_main+0xc4e/0x1a40 [ptlrpc]^M
 [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]^M
 [<ffffffff8100c0ca>] child_rip+0xa/0x20^M
 [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]^M
 [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]^M
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20^M
INFO: task ldlm_cb_00:6370 blocked for more than 120 seconds.^M

Attached full dmesg and full process list to ftp site (nbp8-mds1-dmesg.mar25.gz)

Comment by Bruno Faccini (Inactive) [ 27/Mar/14 ]

Did you also successfully take a crash-dump of your MDS, as I requested in my update/comment dated "25/Nov/13 4:47 PM" for this ticket, and as seem to indicate the end of the dmesg you uploaded for this latest occurrence ?? If yes, can you upload it with associated vmlinux (or kernel-debuginfo* RPMs) and Lustre modules.

BTW, at the time you dumped all threads stacks with Alt+SysRq+T key-stroke, there are 502 threads stuck in jbd2_journal_start()/start_this_handle(), 129 threads stuck in dqacq_handler()/... because one of them is itself also stuck in the journal/jbd2 layer via lustre_commit_dquot(), so the load should be greater than 630 at that time ... But I am unable to find who is responsible to block all of these threads with the only full stack listing.

Comment by Mahmoud Hanafi [ 27/Mar/14 ]

We did take a crash dump. I will encrypt and uploaded. It will have to be accessed by US citizens only.

Comment by Mahmoud Hanafi [ 31/Mar/14 ]

crash dump was uploaded and email was sent to cliff white.

Comment by Bruno Faccini (Inactive) [ 15/Apr/14 ]

Cliff, I wonder if you found the same situation in the crash-dump than the one I described from the all threads stacks log ??

Comment by Cliff White (Inactive) [ 15/Apr/14 ]

Mahmoud, apparently I missed your email, very sorry, can you re-send the information on the dump?

Comment by Mahmoud Hanafi [ 15/Apr/14 ]

Cliff,

Email sent

-Mahmoud

Comment by Cliff White (Inactive) [ 16/Apr/14 ]

Mahmoud, you including a large number of kernel RPMs in the tarball, can you please specify exactly which version corresponds to the kernel running on the node where the vmcore was taken?

Comment by Bob Glossman (Inactive) [ 21/Apr/14 ]

Have been looking over the crash dump from the customer. As noted in previous comments there is quite a large number of threads all stuck in jbd2_journal_start with the common calling sequence ... osd_trans_start -> ldiskfs_journal_start_sb -> jbd2_journal_start -> start_this_handle. It seems very likely there's a single other thread holding a lock or otherwise blocking all these callers of jbd2_journal_start, but so far I haven't been able to find the culprit.

Comment by Bob Glossman (Inactive) [ 21/Apr/14 ]

I note that in addition to the 500+ threads stuck in jbd2_journal_start, there's one thread stuck elsewhere it jbd2 code. The stack trace of this thread looks like:

PID: 7885   TASK: ffff881e506e2ae0  CPU: 6   COMMAND: "mdt_439"
 #0 [ffff881e506f38c0] schedule at ffffffff8151c712
 #1 [ffff881e506f3988] jbd2_log_wait_commit at ffffffffa0564325 [jbd2]
 #2 [ffff881e506f3a18] jbd2_journal_stop at ffffffffa055bacb [jbd2]
 #3 [ffff881e506f3a78] __ldiskfs_journal_stop at ffffffffa08e62a8 [ldiskfs]
 #4 [ffff881e506f3aa8] osd_trans_stop at ffffffffa0d26476 [osd_ldiskfs]
 #5 [ffff881e506f3ad8] mdd_trans_stop at ffffffffa0c2d4aa [mdd]
 #6 [ffff881e506f3ae8] mdd_attr_set at ffffffffa0c0ca5f [mdd]
 #7 [ffff881e506f3bc8] cml_attr_set at ffffffffa0d4fa76 [cmm]
 #8 [ffff881e506f3bf8] mdt_attr_set at ffffffffa0c9ec68 [mdt]
 #9 [ffff881e506f3c48] mdt_reint_setattr at ffffffffa0c9f2b5 [mdt]
#10 [ffff881e506f3cd8] mdt_reint_rec at ffffffffa0c98c81 [mdt]
#11 [ffff881e506f3cf8] mdt_reint_internal at ffffffffa0c8fed4 [mdt]
#12 [ffff881e506f3d48] mdt_reint at ffffffffa0c902b4 [mdt]
#13 [ffff881e506f3d68] mdt_handle_common at ffffffffa0c84772 [mdt]
#14 [ffff881e506f3db8] mdt_regular_handle at ffffffffa0c85665 [mdt]
#15 [ffff881e506f3dc8] ptlrpc_main at ffffffffa079fb4e [ptlrpc]
#16 [ffff881e506f3f48] kernel_thread at ffffffff8100c0ca

I can't see how this thread might be blocking all the others, but it is interesting that it's the only one I have found so far in similar code but different from all the others.

Comment by Bob Glossman (Inactive) [ 21/Apr/14 ]

another odd outlier thread:

PID: 6425   TASK: ffff883fdc6c9500  CPU: 5   COMMAND: "jbd2/dm-3-8"
 #0 [ffff883fec013c60] schedule at ffffffff8151c712
 #1 [ffff883fec013d28] jbd2_journal_commit_transaction at ffffffffa055d8df [jbd2]
 #2 [ffff883fec013e68] kjournald2 at ffffffffa05646c8 [jbd2]
 #3 [ffff883fec013ee8] kthread at ffffffff8108fb96
 #4 [ffff883fec013f48] kernel_thread at ffffffff8100c0ca

I'm wondering if this might be a dup of LU-4794

Comment by Bruno Faccini (Inactive) [ 21/Apr/14 ]

Sure Bob, this last thread should be the one blocking all the others in JBD2 layer! And again, and like in one of my original updates on 19/Nov/2013, the concerned device is "dm-3-8" that Mahmoud confirmed to be the MDT.

And yes, it looks like a dup of LU-4794, but also earlier tickets that have been simply closed due to no new occurrence ...

What would be cool now would be to identify if this last thread has been scheduled recently, if not why or if yes, why it is looping+re-schedule()ing (t_updates != NULL?) ...

Comment by Bruno Faccini (Inactive) [ 02/May/14 ]

Bob, can you check if you find the same kind of dead-lock scenario/stacks that Bobi detailled in LU-4794? BTW, I did not find it in the dmesg with full stacks trace attached to this ticket.

Comment by Bruno Faccini (Inactive) [ 24/May/14 ]

I have been working "blindly" with Bob having direct access to the crash-dump, and here is the analysis of the deadlock out from the crash-dump you provided.
Bob already posted before 2 threads stacks, were one is stuck in jbd2_journal_stop()/jbd2_log_wait_commit() synchronously waiting for the current transaction to be committed, and the other started the commit process, in jbd2_journal_commit_transaction(), but schedule()'d.
The reason the committing thread schedule()'d is because it is waiting for the last handle to complete (ie, we miss a last journal_stop() to have journal_t->j_running_transaction->t_updates to become 0 when it is still 1).
It took us some time to find who owns this last handle, but finally we found it and it is stuck with the following stack :

PID: 19803  TASK: ffff883ff3754080  CPU: 1   COMMAND: "mdt_504"
 #0 [ffff883dc31f76b0] schedule at ffffffff8151c712
 #1 [ffff883dc31f7778] cfs_waitq_wait at ffffffffa04f960e [libcfs]
 #2 [ffff883dc31f7788] qctxt_wait_pending_dqacq at ffffffffa09bfe1b [lquota]
 #3 [ffff883dc31f7878] qctxt_adjust_qunit at ffffffffa09c5d41 [lquota]
 #4 [ffff883dc31f7908] quota_acquire_common at ffffffffa09c9653 [lquota]
 #5 [ffff883dc31f7938] quota_chk_acq_common at ffffffffa09cb722 [lquota]
 #6 [ffff883dc31f7a78] lquota_chkquota.clone.3 at ffffffffa0c0300b [mdd]
 #7 [ffff883dc31f7ae8] mdd_attr_set at ffffffffa0c0dcf6 [mdd]
 #8 [ffff883dc31f7bc8] cml_attr_set at ffffffffa0d4fa76 [cmm]
 #9 [ffff883dc31f7bf8] mdt_attr_set at ffffffffa0c9ec68 [mdt]
#10 [ffff883dc31f7c48] mdt_reint_setattr at ffffffffa0c9f2b5 [mdt]
#11 [ffff883dc31f7cd8] mdt_reint_rec at ffffffffa0c98c81 [mdt]
#12 [ffff883dc31f7cf8] mdt_reint_internal at ffffffffa0c8fed4 [mdt]
#13 [ffff883dc31f7d48] mdt_reint at ffffffffa0c902b4 [mdt]
#14 [ffff883dc31f7d68] mdt_handle_common at ffffffffa0c84772 [mdt]
#15 [ffff883dc31f7db8] mdt_regular_handle at ffffffffa0c85665 [mdt]
#16 [ffff883dc31f7dc8] ptlrpc_main at ffffffffa079fb4e [ptlrpc]
#17 [ffff883dc31f7f48] kernel_thread at ffffffff8100c0ca

This is confirmed by looking into the source code of mdd_attr_set(), where we can see the mdd_trans_start()/lquota_chkquota()/mdd_trans_stop() call sequence.

The reason this thread is waiting is because the qunit it wants to process is presently hashed/"inflight".

Looking more into others threads to find a reason to explain this blocked situation, we found a thread that is working on this same qunit but is itself stuck trying to start a new journal/transaction handle (like hundreds other threads ...), with the following stack :

PID: 7460   TASK: ffff883f65bc9500  CPU: 25  COMMAND: "mdt_308"
 #0 [ffff883f65bcb2b0] schedule at ffffffff8151c712
 #1 [ffff883f65bcb378] start_this_handle at ffffffffa055c14a [jbd2]
 #2 [ffff883f65bcb438] jbd2_journal_start at ffffffffa055c570 [jbd2]
 #3 [ffff883f65bcb488] ldiskfs_journal_start_sb at ffffffffa08e6338 [ldiskfs]
 #4 [ffff883f65bcb498] lustre_quota_journal_start at ffffffffa072c61f [fsfilt_ldiskfs]
 #5 [ffff883f65bcb4a8] lustre_commit_dquot at ffffffffa0734968 [fsfilt_ldiskfs]
 #6 [ffff883f65bcb548] fsfilt_ldiskfs_dquot at ffffffffa072b604 [fsfilt_ldiskfs]
 #7 [ffff883f65bcb568] dqacq_handler at ffffffffa09d59fe [lquota]
 #8 [ffff883f65bcb628] schedule_dqacq at ffffffffa09c2529 [lquota]
 #9 [ffff883f65bcb728] qctxt_adjust_qunit at ffffffffa09c5cdc [lquota]
#10 [ffff883f65bcb7b8] quota_acquire_common at ffffffffa09c9653 [lquota]
#11 [ffff883f65bcb7e8] quota_chk_acq_common at ffffffffa09cb722 [lquota]
#12 [ffff883f65bcb928] lquota_chkquota.clone.1 at ffffffffa0c1c28b [mdd]
#13 [ffff883f65bcb998] mdd_rename at ffffffffa0c22c0a [mdd]
#14 [ffff883f65bcbb08] cml_rename at ffffffffa0d52b04 [cmm]
#15 [ffff883f65bcbb88] mdt_reint_rename at ffffffffa0c9da73 [mdt]
#16 [ffff883f65bcbcd8] mdt_reint_rec at ffffffffa0c98c81 [mdt]
#17 [ffff883f65bcbcf8] mdt_reint_internal at ffffffffa0c8fed4 [mdt]
#18 [ffff883f65bcbd48] mdt_reint at ffffffffa0c902b4 [mdt]
#19 [ffff883f65bcbd68] mdt_handle_common at ffffffffa0c84772 [mdt]
#20 [ffff883f65bcbdb8] mdt_regular_handle at ffffffffa0c85665 [mdt]
#21 [ffff883f65bcbdc8] ptlrpc_main at ffffffffa079fb4e [ptlrpc]
#22 [ffff883f65bcbf48] kernel_thread at ffffffff8100c0ca

and which is presently blocked due to the current/running transaction being in commit process!!
So, dead-lock ...

Seems that this has been possible due to the fact in mdd_attr_set(), a journal_start()/mdd_trans_start() has been done before starting a quota operation, when it is never the case in same layer's routines, like mdd_rename()/mdd_[un]link()/mdd_create()/..., where all "heavy" quota operations are made out of mdd_trans_start()/mdd_trans_stop() boundaries, to prevent such dead-lock situations.

So, a possible fix for this should be to change the code of mdd_attr_set() by simply moving the #ifdef HAVE_QUOTA_SUPPORT/#endif block out from mdd_trans_start()/mdd_trans_stop() boundaries. I give it a try at http://review.whamcloud.com/10443.

BTW, this problem is a pure b2_1/2.1.x one since all the concerned code has been re-written in 2.4.

Comment by John Fuchs-Chesney (Inactive) [ 29/Jul/14 ]

Hello Bruno and Mahmoud,

Can I ask if there is going to be more progress on this ticket?

Many thanks,
~ jfc.

Comment by Peter Jones [ 14/Aug/14 ]

As per NASA ok to close ticket

Generated at Sat Feb 10 01:41:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.