Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
None
-
None
-
Versions of 2.6.54 on clients & servers.
Cray SLES11SP3 clients, CentOS servers (2.6.32-431.5.1.el6.x86_64).
Most recent commit on clients:
Ie7a2a98be8cc97db9af7a64476c06fc7321544eb
http://review.whamcloud.com/12142
Most recent commit on servers:
If24443955290b091fd22905dfb74b0d6a6d1b4e8
http://review.whamcloud.com/12490Versions of 2.6.54 on clients & servers. Cray SLES11SP3 clients, CentOS servers (2.6.32-431.5.1.el6.x86_64). Most recent commit on clients: Ie7a2a98be8cc97db9af7a64476c06fc7321544eb http://review.whamcloud.com/12142 Most recent commit on servers: If24443955290b091fd22905dfb74b0d6a6d1b4e8 http://review.whamcloud.com/12490
-
3
-
16457
Description
During DNE II testing (same run as LU-5883), both of our MDSes reported hung threads here:
Nov 6 20:22:21 perses-esf-mds001 kernel: Pid: 21117, comm: mdt02_008
Nov 6 20:22:21 perses-esf-mds001 kernel:
Nov 6 20:22:21 perses-esf-mds001 kernel: Call Trace:
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0aaa1a1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e32230>] ? ldlm_expired_completion_wait+0x0/0x360 [ptlrpc]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e36df5>] ldlm_completion_ast+0x665/0x9a0 [ptlrpc]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e3619e>] ldlm_cli_enqueue_local+0x21e/0x810 [ptlrpc]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e36790>] ? ldlm_completion_ast+0x0/0x9a0 [ptlrpc]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa1559d70>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa156240d>] mdt_object_local_lock+0x1bd/0xa80 [mdt]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa1559d70>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e36790>] ? ldlm_completion_ast+0x0/0x9a0 [ptlrpc]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa1562d35>] mdt_object_lock_internal+0x65/0x360 [mdt]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa15630f4>] mdt_object_lock+0x14/0x20 [mdt]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa156dd6c>] mdt_getattr_name_lock+0xd9c/0x1a50 [mdt]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8128a0ea>] ? strlcpy+0x4a/0x60
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e63774>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e65d40>] ? lustre_swab_ldlm_reply+0x0/0x40 [ptlrpc]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa156ef42>] mdt_intent_getattr+0x292/0x470 [mdt]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa155bc44>] mdt_intent_policy+0x494/0xcf0 [mdt]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e16549>] ldlm_lock_enqueue+0x129/0x9d0 [ptlrpc]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e4235b>] ldlm_handle_enqueue0+0x51b/0x1340 [ptlrpc]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0a9a4ce>] ? cfs_timer_arm+0xe/0x10 [libcfs]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0ec3842>] tgt_enqueue+0x62/0x1d0 [ptlrpc]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0ec40ce>] tgt_request_handle+0x71e/0xb10 [ptlrpc]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e73964>] ptlrpc_main+0xe64/0x1990 [ptlrpc]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff81527c20>] ? thread_return+0x4e/0x76e
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e72b00>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8109aee6>] kthread+0x96/0xa0
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Nov 6 20:22:21 perses-esf-mds001 kernel:
Patrick, could you please try current master to see if you can still reproduce the problem? Thanks.