[LU-1640] Test failure on test suite lustre-rsync-test, subtest test_2c Created: 17/Jul/12 Updated: 13/Dec/12 Resolved: 13/Dec/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Zhenyu Xu |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 6363 | ||||||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/dcdbf220-cd1f-11e1-957a-52540035b04c. The sub-test test_2c failed with the following error:
It seems the MDS is stuck for some reason mdt00_000 D 0000000000000001 0 12165 2 0x00000080 ffff8800673a5aa0 0000000000000046 ffff8800779fdb40 ffff8800673a5b20 ffff8800673a5a50 ffffc900018c502c 0000000000000246 0000000000000246 ffff8800355fc6b8 ffff8800673a5fd8 000000000000f4e8 ffff8800355fc6b8 Call Trace: [<ffffffffa053b5d4>] ? htable_lookup+0x1a4/0x1c0 [obdclass] [<ffffffffa0ced77e>] cfs_waitq_wait+0xe/0x10 [libcfs] [<ffffffffa053b6a0>] lu_object_find_at+0xb0/0x450 [obdclass] [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20 [<ffffffffa053ba7f>] lu_object_find_slice+0x1f/0x80 [obdclass] [<ffffffffa095c160>] mdd_object_find+0x10/0x70 [mdd] [<ffffffffa096395f>] mdd_path+0x35f/0x1060 [mdd] [<ffffffffa053b67c>] ? lu_object_find_at+0x8c/0x450 [obdclass] [<ffffffffa0963600>] ? mdd_path+0x0/0x1060 [mdd] [<ffffffffa0af47da>] cml_path+0x6a/0x180 [cmm] [<ffffffffa09c9db6>] ? mdt_object_find+0x66/0x170 [mdt] [<ffffffffa09ce3ff>] mdt_get_info+0x64f/0xa90 [mdt] [<ffffffffa09c9f0d>] ? mdt_unpack_req_pack_rep+0x4d/0x4d0 [mdt] [<ffffffffa09d2922>] mdt_handle_common+0x922/0x1740 [mdt] [<ffffffffa09d3815>] mdt_regular_handle+0x15/0x20 [mdt] [<ffffffffa066757d>] ptlrpc_server_handle_request+0x40d/0xea0 [ptlrpc] [<ffffffffa0ced65e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa065ea37>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc] [<ffffffff81051ba3>] ? __wake_up+0x53/0x70 [<ffffffffa0668b79>] ptlrpc_main+0xb69/0x1870 [ptlrpc] [<ffffffffa0668010>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] [<ffffffff8100c14a>] child_rip+0xa/0x20 [<ffffffffa0668010>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] [<ffffffffa0668010>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] [<ffffffff8100c140>] ? child_rip+0x0/0x20 |
| Comments |
| Comment by Peter Jones [ 20/Jul/12 ] |
|
Bobijam Could you please look into this one? Thanks Peter |
| Comment by Zhenyu Xu [ 21/Jul/12 ] |
|
patch tracking at http://review.whamcloud.com/3439 obdclass: htable_lookup could miss a waking up signal In lu_object_free(), a wakeing up signal is issued to hash bucket |
| Comment by Peter Jones [ 08/Aug/12 ] |
|
As per Bobijam this issue rarely occurs (not seen in the last three tags) and so decreasing in priority to focus on more frequently hit issues |
| Comment by Jian Yu [ 10/Oct/12 ] |
|
Lustre Tag: v2_3_0_RC2 The same issue occurred again: https://maloo.whamcloud.com/test_sets/664cd250-12ac-11e2-bd97-52540035b04c |
| Comment by Zhenyu Xu [ 13/Dec/12 ] |
|
discussion move to |