[LU-2807] lockup in server completion ast -> lu_object_find_at - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Blocker
Fix Version/s: None
Affects Version/s: Lustre 2.4.0
Labels:
- MB

Severity:
3
Rank (Obsolete):
6794

Description

Running racer I hit a problem multiple times where on completion AST the callback gets stuck looking for some object.
Alex thinks it's a not fully fixed race vs object deletion of some sort.
The stack trace looks like this:

[175924.328073] INFO: task ptlrpc_hr01_003:16414 blocked for more than 120 seconds.
[175924.328610] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[175924.329108] ptlrpc_hr01_0 D 0000000000000006  3952 16414      2 0x00000000
[175924.329432]  ffff880076a19920 0000000000000046 0000000000000040 0000000000000286
[175924.329950]  ffff880076a198a0 0000000000000286 0000000000000286 ffffc9000376b040
[175924.330457]  ffff8800573a67b8 ffff880076a19fd8 000000000000fba8 ffff8800573a67b8
[175924.330950] Call Trace:
[175924.331191]  [<ffffffffa0743c36>] ? htable_lookup+0x1a6/0x1c0 [obdclass]
[175924.331505]  [<ffffffffa041e79e>] cfs_waitq_wait+0xe/0x10 [libcfs]
[175924.331807]  [<ffffffffa0744243>] lu_object_find_at+0xb3/0x360 [obdclass]
[175924.332104]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
[175924.332403]  [<ffffffffa07413df>] ? keys_fill+0x6f/0x190 [obdclass]
[175924.332746]  [<ffffffffa0744506>] lu_object_find+0x16/0x20 [obdclass]
[175924.333035]  [<ffffffffa0549ea6>] mdt_object_find+0x56/0x170 [mdt]
[175924.333398]  [<ffffffffa0586e63>] mdt_lvbo_fill+0x2f3/0x800 [mdt]
[175924.333715]  [<ffffffffa0845c1a>] ldlm_server_completion_ast+0x18a/0x640 [ptlrpc]
[175924.334204]  [<ffffffffa0845a90>] ? ldlm_server_completion_ast+0x0/0x640 [ptlrpc]
[175924.334655]  [<ffffffffa081bbdc>] ldlm_work_cp_ast_lock+0xcc/0x200 [ptlrpc]
[175924.334976]  [<ffffffffa085c18f>] ptlrpc_set_wait+0x6f/0x880 [ptlrpc]
[175924.335264]  [<ffffffff81090154>] ? __init_waitqueue_head+0x24/0x40
[175924.335559]  [<ffffffffa041e8a5>] ? cfs_waitq_init+0x15/0x20 [libcfs]
[175924.335867]  [<ffffffffa085876e>] ? ptlrpc_prep_set+0x11e/0x300 [ptlrpc]
[175924.336134]  [<ffffffffa081bb10>] ? ldlm_work_cp_ast_lock+0x0/0x200 [ptlrpc]
[175924.336444]  [<ffffffffa081e19b>] ldlm_run_ast_work+0x1db/0x460 [ptlrpc]
[175924.336767]  [<ffffffffa081eda4>] ldlm_reprocess_all+0x114/0x300 [ptlrpc]
[175924.337067]  [<ffffffffa08372e3>] ldlm_cli_cancel_local+0x2b3/0x470 [ptlrpc]
[175924.337445]  [<ffffffffa083bbab>] ldlm_cli_cancel+0x5b/0x360 [ptlrpc]
[175924.337719]  [<ffffffffa083bf42>] ldlm_blocking_ast_nocheck+0x92/0x320 [ptlrpc]
[175924.338177]  [<ffffffffa0819070>] ? lock_res_and_lock+0x30/0x50 [ptlrpc]
[175924.338464]  [<ffffffffa0549d40>] mdt_blocking_ast+0x190/0x2a0 [mdt]
[175924.338759]  [<ffffffffa042e401>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
[175924.339051]  [<ffffffff814faf3e>] ? _spin_unlock+0xe/0x10
[175924.339339]  [<ffffffffa083f950>] ldlm_handle_bl_callback+0x130/0x400 [ptlrpc]
[175924.339814]  [<ffffffffa0820cc6>] ldlm_lock_decref_internal+0x426/0xc80 [ptlrpc]
[175924.340282]  [<ffffffff814faf3e>] ? _spin_unlock+0xe/0x10
[175924.340614]  [<ffffffffa0712217>] ? class_handle2object+0x97/0x170 [obdclass]
[175924.341175]  [<ffffffffa0821f49>] ldlm_lock_decref+0x39/0x90 [ptlrpc]
[175924.341527]  [<ffffffffa087112b>] ptlrpc_hr_main+0x39b/0x760 [ptlrpc]
[175924.341824]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
[175924.342141]  [<ffffffffa0870d90>] ? ptlrpc_hr_main+0x0/0x760 [ptlrpc]
[175924.342444]  [<ffffffff8100c14a>] child_rip+0xa/0x20
[175924.342734]  [<ffffffffa0870d90>] ? ptlrpc_hr_main+0x0/0x760 [ptlrpc]
[175924.343068]  [<ffffffffa0870d90>] ? ptlrpc_hr_main+0x0/0x760 [ptlrpc]
[175924.343376]  [<ffffffff8100c140>] ? child_rip+0x0/0x20

Attachments

Issue Links

is related to

LU-3124 bad interaction between layout lock and wide striping

Resolved

is related to

LU-3124 bad interaction between layout lock and wide striping

Resolved

LU-4106 racer test hang

Resolved

Activity

[LU-2807] lockup in server completion ast -> lu_object_find_at

Jinshan Xiong (Inactive) added a comment - 05/Apr/13 4:30 PM

I'm going to fix this issue by finding a field in ldlm_lock, say l_tree_node, to store mdt_object, if it's an intent operation which find the object firstly and then request dlm lock. So in mdt_lvbo_fill(), it only calls mdt_object_find() if it's NULL.

Jinshan Xiong (Inactive) added a comment - 05/Apr/13 4:30 PM I'm going to fix this issue by finding a field in ldlm_lock, say l_tree_node, to store mdt_object, if it's an intent operation which find the object firstly and then request dlm lock. So in mdt_lvbo_fill(), it only calls mdt_object_find() if it's NULL.

Jinshan Xiong (Inactive) added a comment - 01/Apr/13 9:43 PM

patch is at: http://review.whamcloud.com/5911

Jinshan Xiong (Inactive) added a comment - 01/Apr/13 9:43 PM patch is at: http://review.whamcloud.com/5911

Jinshan Xiong (Inactive) added a comment - 28/Mar/13 4:12 PM

I mean we can declare a new function, say: mdt_object_lookup() which will lookup the hash table and make sure the object exists in the cache. In mdt_object_lookup(), it also calls lu_object_find(), but with a new flags in lu_object_conf, say: LOC_F_LOOKUP. With this flag, lu_object_find() will look up hash table only, and of course, if the object is dying it will return -ENOENT.

This assumes that the object must have been referenced by someone. For getattr intent request, this is true. However we need to check other code path to make sure.

Jinshan Xiong (Inactive) added a comment - 28/Mar/13 4:12 PM I mean we can declare a new function, say: mdt_object_lookup() which will lookup the hash table and make sure the object exists in the cache. In mdt_object_lookup(), it also calls lu_object_find(), but with a new flags in lu_object_conf, say: LOC_F_LOOKUP. With this flag, lu_object_find() will look up hash table only, and of course, if the object is dying it will return -ENOENT. This assumes that the object must have been referenced by someone. For getattr intent request, this is true. However we need to check other code path to make sure.

Bruno Faccini (Inactive) added a comment - 28/Mar/13 3:54 PM

Thank's Jinshan, so for you problem has not been introduced by LVB/layout-swap changes but only highlighted.

And the fix you suggest is to give getattr the mean to detect unlink occurred and object is dying with a new lu_object_lookup() method, just after it acquired the "inodebit dlm lock" and return ENOENT if object is dying ?

Bruno Faccini (Inactive) added a comment - 28/Mar/13 3:54 PM Thank's Jinshan, so for you problem has not been introduced by LVB/layout-swap changes but only highlighted. And the fix you suggest is to give getattr the mean to detect unlink occurred and object is dying with a new lu_object_lookup() method, just after it acquired the "inodebit dlm lock" and return ENOENT if object is dying ?

Jinshan Xiong (Inactive) added a comment - 27/Mar/13 8:23 PM

I think this is a race between unlink and getattr. Let's make up a test case for this race, say:
1. client1 unlink reaches the MDT;
2. before unlink enqueues lock, client2 tries to send a getattr intent req;
3. unlink acquires inodebits dlm lock;
4. before unlink releases the lock, getattr comes to acquire the lock, blocked;
5. unlink finishes and releases the lock, getattr's completion_ast will be invoked;
6. this problem should be reproduced.

If this is the case, we can work out a lu_object_lookup() and if the object is already killed or not existed, -ENOENT should be returned; then -ENOENT should be returned to getattr intent request too.

Jinshan Xiong (Inactive) added a comment - 27/Mar/13 8:23 PM I think this is a race between unlink and getattr. Let's make up a test case for this race, say: 1. client1 unlink reaches the MDT; 2. before unlink enqueues lock, client2 tries to send a getattr intent req; 3. unlink acquires inodebits dlm lock; 4. before unlink releases the lock, getattr comes to acquire the lock, blocked; 5. unlink finishes and releases the lock, getattr's completion_ast will be invoked; 6. this problem should be reproduced. If this is the case, we can work out a lu_object_lookup() and if the object is already killed or not existed, -ENOENT should be returned; then -ENOENT should be returned to getattr intent request too.

Bruno Faccini (Inactive) added a comment - 27/Mar/13 1:06 PM

The object lookup, due to lvb and causing the dead-lock, is the result of recent integration of patches for layout-swap, mainly from ~~LU-1876~~.

I'll ask Jinshan if this scenario sounds familiar to him and if has some idea on how to fix it without breaking layout-swap.

Bruno Faccini (Inactive) added a comment - 27/Mar/13 1:06 PM The object lookup, due to lvb and causing the dead-lock, is the result of recent integration of patches for layout-swap, mainly from LU-1876 . I'll ask Jinshan if this scenario sounds familiar to him and if has some idea on how to fix it without breaking layout-swap.

Bruno Faccini (Inactive) added a comment - 25/Mar/13 5:17 PM - edited

The dead-lock comes from the fact thread 18000, which owns last object reference, is waiting for lock-completion, which is itself handled by thread 15574. But 15574 can not accomplish the completion for this lock because, in parsing its l_cp_ast list, it is already running completion for an other lock on same object when it has been marked dying in-between and due to lvb an object lookup has been necessary, causing to wait for object full death that can never happen.

I am currently reviewing concerned source code to propose a fix now.

Bruno Faccini (Inactive) added a comment - 25/Mar/13 5:17 PM - edited The dead-lock comes from the fact thread 18000, which owns last object reference, is waiting for lock-completion, which is itself handled by thread 15574. But 15574 can not accomplish the completion for this lock because, in parsing its l_cp_ast list, it is already running completion for an other lock on same object when it has been marked dying in-between and due to lvb an object lookup has been necessary, causing to wait for object full death that can never happen. I am currently reviewing concerned source code to propose a fix now.

Bruno Faccini (Inactive) added a comment - 20/Mar/13 5:54 PM

Thus my understanding now of the situation, it is a dead-lock between pid 18000 having set a reference to the object and now waiting for lock completion ast on it, and pid 15574 running the completion but stuck waiting for the object to die that will not happen since reference count is set.

Can this be fixed by canceling the lock during object death ??

Bruno Faccini (Inactive) added a comment - 20/Mar/13 5:54 PM Thus my understanding now of the situation, it is a dead-lock between pid 18000 having set a reference to the object and now waiting for lock completion ast on it, and pid 15574 running the completion but stuck waiting for the object to die that will not happen since reference count is set. Can this be fixed by canceling the lock during object death ??

Bruno Faccini (Inactive) added a comment - 20/Mar/13 10:46 AM

Forget the 2nd part of my last update (I might be tired ...) pointer to the dying object in the 3rd (pid 18000) is NOT the parent but the child !!! So it is definitely the one that bumped the object's loh_ref/ref-count and highly probably the cause the 2 others threads have not been awaken/broadcasted.

Bruno Faccini (Inactive) added a comment - 20/Mar/13 10:46 AM Forget the 2nd part of my last update (I might be tired ...) pointer to the dying object in the 3rd (pid 18000) is NOT the parent but the child !!! So it is definitely the one that bumped the object's loh_ref/ref-count and highly probably the cause the 2 others threads have not been awaken/broadcasted.

Bruno Faccini (Inactive) added a comment - 18/Mar/13 3:55 PM - edited

Humm in fact seems that there is a 3rd guy involved (stuck since about the same time and with pointer/reference to the object the 2 others/first threads wait for) here and probably the cause of the remaining reference to the dying object :

PID: 18000  TASK: ffff88004519a500  CPU: 2   COMMAND: "mdt00_004"
 #0 [ffff88002e71d820] schedule at ffffffff814f7c98
 #1 [ffff88002e71d8e8] cfs_waitq_wait at ffffffffa0a4a79e [libcfs]
 #2 [ffff88002e71d8f8] ldlm_completion_ast at ffffffffa0eabdba [ptlrpc]
 #3 [ffff88002e71d9a8] ldlm_cli_enqueue_local at ffffffffa0eab478 [ptlrpc]
 #4 [ffff88002e71da38] mdt_object_lock0 at ffffffffa0611889 [mdt]
 #5 [ffff88002e71dae8] mdt_object_lock at ffffffffa0612104 [mdt]
 #6 [ffff88002e71daf8] mdt_getattr_name_lock at ffffffffa0624ef9 [mdt]
 #7 [ffff88002e71dbb8] mdt_intent_getattr at ffffffffa0625ced [mdt]
 #8 [ffff88002e71dc18] mdt_intent_policy at ffffffffa06228fe [mdt]
 #9 [ffff88002e71dc58] ldlm_lock_enqueue at ffffffffa0e8c70a [ptlrpc]
#10 [ffff88002e71dcb8] ldlm_handle_enqueue0 at ffffffffa0eb2df7 [ptlrpc]
#11 [ffff88002e71dd28] mdt_enqueue at ffffffffa0622dd6 [mdt]
#12 [ffff88002e71dd48] mdt_handle_common at ffffffffa0617f18 [mdt]
#13 [ffff88002e71dd98] mds_regular_handle at ffffffffa06504c5 [mdt]
#14 [ffff88002e71dda8] ptlrpc_server_handle_request at ffffffffa0ee2de3 [ptlrpc]
#15 [ffff88002e71dea8] ptlrpc_main at ffffffffa0ee58ad [ptlrpc]
#16 [ffff88002e71df48] kernel_thread at ffffffff8100c14a

and after some de-assembly to retrieve and resolve where the pointer to dying object come for this thread, I was able to determine it is the parent pointer in :

/*
 * UPDATE lock should be taken against parent, and be release before exit;
 * child_bits lock should be taken against child, and be returned back:
 *            (1)normal request should release the child lock;
 *            (2)intent request will grant the lock to client.
 */
static int mdt_getattr_name_lock(struct mdt_thread_info *info,
                                 struct mdt_lock_handle *lhc,
                                 __u64 child_bits,
                                 struct ldlm_reply *ldlm_rep)
{
        struct ptlrpc_request  *req       = mdt_info_req(info);
        struct mdt_body        *reqbody   = NULL;
        struct mdt_object      *parent    = info->mti_object;  <<<<<<<<<
....

Bruno Faccini (Inactive) added a comment - 18/Mar/13 3:55 PM - edited Humm in fact seems that there is a 3rd guy involved (stuck since about the same time and with pointer/reference to the object the 2 others/first threads wait for) here and probably the cause of the remaining reference to the dying object : PID: 18000 TASK: ffff88004519a500 CPU: 2 COMMAND: "mdt00_004" #0 [ffff88002e71d820] schedule at ffffffff814f7c98 #1 [ffff88002e71d8e8] cfs_waitq_wait at ffffffffa0a4a79e [libcfs] #2 [ffff88002e71d8f8] ldlm_completion_ast at ffffffffa0eabdba [ptlrpc] #3 [ffff88002e71d9a8] ldlm_cli_enqueue_local at ffffffffa0eab478 [ptlrpc] #4 [ffff88002e71da38] mdt_object_lock0 at ffffffffa0611889 [mdt] #5 [ffff88002e71dae8] mdt_object_lock at ffffffffa0612104 [mdt] #6 [ffff88002e71daf8] mdt_getattr_name_lock at ffffffffa0624ef9 [mdt] #7 [ffff88002e71dbb8] mdt_intent_getattr at ffffffffa0625ced [mdt] #8 [ffff88002e71dc18] mdt_intent_policy at ffffffffa06228fe [mdt] #9 [ffff88002e71dc58] ldlm_lock_enqueue at ffffffffa0e8c70a [ptlrpc] #10 [ffff88002e71dcb8] ldlm_handle_enqueue0 at ffffffffa0eb2df7 [ptlrpc] #11 [ffff88002e71dd28] mdt_enqueue at ffffffffa0622dd6 [mdt] #12 [ffff88002e71dd48] mdt_handle_common at ffffffffa0617f18 [mdt] #13 [ffff88002e71dd98] mds_regular_handle at ffffffffa06504c5 [mdt] #14 [ffff88002e71dda8] ptlrpc_server_handle_request at ffffffffa0ee2de3 [ptlrpc] #15 [ffff88002e71dea8] ptlrpc_main at ffffffffa0ee58ad [ptlrpc] #16 [ffff88002e71df48] kernel_thread at ffffffff8100c14a and after some de-assembly to retrieve and resolve where the pointer to dying object come for this thread, I was able to determine it is the parent pointer in : /* * UPDATE lock should be taken against parent, and be release before exit; * child_bits lock should be taken against child, and be returned back: * (1)normal request should release the child lock; * (2)intent request will grant the lock to client. */ static int mdt_getattr_name_lock(struct mdt_thread_info *info, struct mdt_lock_handle *lhc, __u64 child_bits, struct ldlm_reply *ldlm_rep) { struct ptlrpc_request *req = mdt_info_req(info); struct mdt_body *reqbody = NULL; struct mdt_object *parent = info->mti_object; <<<<<<<<< ....

Bruno Faccini (Inactive) added a comment - 15/Mar/13 9:22 AM

Thank's Oleg, I am working on the new crash-dump with the object reference the debug message printed.

Interesting is that dying object found but still hashed condition was triggered earlier during your tests on same node but did not induce threads hangs. This is likely to be the consequence of Bobijam's patch for ~~LU-2942~~.

Also interesting in our case is that seem that always 2 threads trigger the same dying object condition, and with help of debug patch trace I found that concerned dying object is still hashed :

[ 8468.524706] Lustre: 17062:0:(lu_object.c:565:htable_lookup()) found object ffff880029715ea8 dying but still hashed.
[ 8468.540255] Lustre: 15574:0:(lu_object.c:565:htable_lookup()) found object ffff880029715ea8 dying but still hashed.

PID: 17062  TASK: ffff88001cd863c0  CPU: 6   COMMAND: "mdt01_001"
 #0 [ffff8800b574da50] schedule at ffffffff814f7c98
 #1 [ffff8800b574db18] cfs_waitq_wait at ffffffffa0a4a79e [libcfs]
 #2 [ffff8800b574db28] lu_object_find_at at ffffffffa0d0af83 [obdclass]
 #3 [ffff8800b574dbe8] lu_object_find at ffffffffa0d0b246 [obdclass]
 #4 [ffff8800b574dbf8] mdt_object_find at ffffffffa060eea6 [mdt]
 #5 [ffff8800b574dc28] mdt_lvbo_fill at ffffffffa064c3e3 [mdt]
 #6 [ffff8800b574dcb8] ldlm_handle_enqueue0 at ffffffffa0eb2f47 [ptlrpc]
 #7 [ffff8800b574dd28] mdt_enqueue at ffffffffa0622dd6 [mdt]
 #8 [ffff8800b574dd48] mdt_handle_common at ffffffffa0617f18 [mdt]
 #9 [ffff8800b574dd98] mds_regular_handle at ffffffffa06504c5 [mdt]
#10 [ffff8800b574dda8] ptlrpc_server_handle_request at ffffffffa0ee2de3 [ptlrpc]
#11 [ffff8800b574dea8] ptlrpc_main at ffffffffa0ee58ad [ptlrpc]
#12 [ffff8800b574df48] kernel_thread at ffffffff8100c14a

PID: 15574  TASK: ffff880058d08340  CPU: 2   COMMAND: "ptlrpc_hr00_001"
 #0 [ffff8800affef850] schedule at ffffffff814f7c98
 #1 [ffff8800affef918] cfs_waitq_wait at ffffffffa0a4a79e [libcfs]
 #2 [ffff8800affef928] lu_object_find_at at ffffffffa0d0af83 [obdclass]
 #3 [ffff8800affef9e8] lu_object_find at ffffffffa0d0b246 [obdclass]
 #4 [ffff8800affef9f8] mdt_object_find at ffffffffa060eea6 [mdt]
 #5 [ffff8800affefa28] mdt_lvbo_fill at ffffffffa064c3e3 [mdt]
 #6 [ffff8800affefab8] ldlm_server_completion_ast at ffffffffa0eb3b8a [ptlrpc]
 #7 [ffff8800affefb28] ldlm_work_cp_ast_lock at ffffffffa0e89bdc [ptlrpc]
 #8 [ffff8800affefb58] ptlrpc_set_wait at ffffffffa0eca0ff [ptlrpc]
 #9 [ffff8800affefbf8] ldlm_run_ast_work at ffffffffa0e8c19b [ptlrpc]
#10 [ffff8800affefc28] ldlm_reprocess_all at ffffffffa0e8cda4 [ptlrpc]
#11 [ffff8800affefc78] ldlm_cli_cancel_local at ffffffffa0ea5303 [ptlrpc]
#12 [ffff8800affefca8] ldlm_cli_cancel at ffffffffa0ea9bd0 [ptlrpc]
#13 [ffff8800affefce8] ldlm_blocking_ast_nocheck at ffffffffa0ea9f67 [ptlrpc]
#14 [ffff8800affefd18] mdt_blocking_ast at ffffffffa060ed40 [mdt]
#15 [ffff8800affefd98] ldlm_handle_bl_callback at ffffffffa0ead970 [ptlrpc]
#16 [ffff8800affefdc8] ldlm_lock_decref_internal at ffffffffa0e8ecc6 [ptlrpc]
#17 [ffff8800affefe28] ldlm_lock_decref at ffffffffa0e8ff49 [ptlrpc]
#18 [ffff8800affefe58] ptlrpc_hr_main at ffffffffa0edf11b [ptlrpc]
#19 [ffff8800affeff48] kernel_thread at ffffffff8100c14a

*(struct lu_object_header *)0xffff880029715ea8
$4 = {
  loh_flags = 0x1,
  loh_ref = {
    counter = 0x1
  },
  loh_fid = {
    f_seq = 0x200000401,
    f_oid = 0x42,
    f_ver = 0x0
  },
  loh_attr = 0x8001,
  loh_hash = {
    next = 0x0,
    pprev = 0xffffc90005c96b08
  },
  loh_lru = {
    next = 0xffff880029715ee0,
    prev = 0xffff880029715ee0
  },
  loh_layers = {
    next = 0xffff880029715f18,
    prev = 0xffff880082dc0f08
  },
  loh_reference = {<No data fields>}
}

Bruno Faccini (Inactive) added a comment - 15/Mar/13 9:22 AM Thank's Oleg, I am working on the new crash-dump with the object reference the debug message printed. Interesting is that dying object found but still hashed condition was triggered earlier during your tests on same node but did not induce threads hangs. This is likely to be the consequence of Bobijam's patch for LU-2942 . Also interesting in our case is that seem that always 2 threads trigger the same dying object condition, and with help of debug patch trace I found that concerned dying object is still hashed : [ 8468.524706] Lustre: 17062:0:(lu_object.c:565:htable_lookup()) found object ffff880029715ea8 dying but still hashed. [ 8468.540255] Lustre: 15574:0:(lu_object.c:565:htable_lookup()) found object ffff880029715ea8 dying but still hashed. PID: 17062 TASK: ffff88001cd863c0 CPU: 6 COMMAND: "mdt01_001" #0 [ffff8800b574da50] schedule at ffffffff814f7c98 #1 [ffff8800b574db18] cfs_waitq_wait at ffffffffa0a4a79e [libcfs] #2 [ffff8800b574db28] lu_object_find_at at ffffffffa0d0af83 [obdclass] #3 [ffff8800b574dbe8] lu_object_find at ffffffffa0d0b246 [obdclass] #4 [ffff8800b574dbf8] mdt_object_find at ffffffffa060eea6 [mdt] #5 [ffff8800b574dc28] mdt_lvbo_fill at ffffffffa064c3e3 [mdt] #6 [ffff8800b574dcb8] ldlm_handle_enqueue0 at ffffffffa0eb2f47 [ptlrpc] #7 [ffff8800b574dd28] mdt_enqueue at ffffffffa0622dd6 [mdt] #8 [ffff8800b574dd48] mdt_handle_common at ffffffffa0617f18 [mdt] #9 [ffff8800b574dd98] mds_regular_handle at ffffffffa06504c5 [mdt] #10 [ffff8800b574dda8] ptlrpc_server_handle_request at ffffffffa0ee2de3 [ptlrpc] #11 [ffff8800b574dea8] ptlrpc_main at ffffffffa0ee58ad [ptlrpc] #12 [ffff8800b574df48] kernel_thread at ffffffff8100c14a PID: 15574 TASK: ffff880058d08340 CPU: 2 COMMAND: "ptlrpc_hr00_001" #0 [ffff8800affef850] schedule at ffffffff814f7c98 #1 [ffff8800affef918] cfs_waitq_wait at ffffffffa0a4a79e [libcfs] #2 [ffff8800affef928] lu_object_find_at at ffffffffa0d0af83 [obdclass] #3 [ffff8800affef9e8] lu_object_find at ffffffffa0d0b246 [obdclass] #4 [ffff8800affef9f8] mdt_object_find at ffffffffa060eea6 [mdt] #5 [ffff8800affefa28] mdt_lvbo_fill at ffffffffa064c3e3 [mdt] #6 [ffff8800affefab8] ldlm_server_completion_ast at ffffffffa0eb3b8a [ptlrpc] #7 [ffff8800affefb28] ldlm_work_cp_ast_lock at ffffffffa0e89bdc [ptlrpc] #8 [ffff8800affefb58] ptlrpc_set_wait at ffffffffa0eca0ff [ptlrpc] #9 [ffff8800affefbf8] ldlm_run_ast_work at ffffffffa0e8c19b [ptlrpc] #10 [ffff8800affefc28] ldlm_reprocess_all at ffffffffa0e8cda4 [ptlrpc] #11 [ffff8800affefc78] ldlm_cli_cancel_local at ffffffffa0ea5303 [ptlrpc] #12 [ffff8800affefca8] ldlm_cli_cancel at ffffffffa0ea9bd0 [ptlrpc] #13 [ffff8800affefce8] ldlm_blocking_ast_nocheck at ffffffffa0ea9f67 [ptlrpc] #14 [ffff8800affefd18] mdt_blocking_ast at ffffffffa060ed40 [mdt] #15 [ffff8800affefd98] ldlm_handle_bl_callback at ffffffffa0ead970 [ptlrpc] #16 [ffff8800affefdc8] ldlm_lock_decref_internal at ffffffffa0e8ecc6 [ptlrpc] #17 [ffff8800affefe28] ldlm_lock_decref at ffffffffa0e8ff49 [ptlrpc] #18 [ffff8800affefe58] ptlrpc_hr_main at ffffffffa0edf11b [ptlrpc] #19 [ffff8800affeff48] kernel_thread at ffffffff8100c14a *(struct lu_object_header *)0xffff880029715ea8 $4 = { loh_flags = 0x1, loh_ref = { counter = 0x1 }, loh_fid = { f_seq = 0x200000401, f_oid = 0x42, f_ver = 0x0 }, loh_attr = 0x8001, loh_hash = { next = 0x0, pprev = 0xffffc90005c96b08 }, loh_lru = { next = 0xffff880029715ee0, prev = 0xffff880029715ee0 }, loh_layers = { next = 0xffff880029715f18, prev = 0xffff880082dc0f08 }, loh_reference = {<No data fields>} }

People

Assignee:: Jinshan Xiong (Inactive)

Reporter:: Oleg Drokin

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 13/Feb/13 1:00 PM

Updated:: 16/Oct/13 10:48 PM

Resolved:: 16/Apr/13 3:18 PM