[LU-3546] 2.4.0<->2.3 interop: sanity test 56w: MDS unknown opcode 61 Created: 02/Jul/13 Updated: 15/Aug/13 Resolved: 24/Jul/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.1, Lustre 2.5.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Jian Yu | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | mn1, yuc2 | ||
| Environment: |
Lustre server build: https://build.whamcloud.com/job/lustre-chris/25/ |
||
| Severity: | 3 |
| Rank (Obsolete): | 8924 |
| Description |
|
sanity test 56w hung as follows: == sanity test 56w: check lfs_migrate -c stripe_count works == 00:42:51 (1372405371) /usr/bin/lfs_migrate -y -c 6 /mnt/lustre/d0.sanity/d56w/file1 Console log on the MDS showed that: 00:39:22:Lustre: DEBUG MARKER: == sanity test 56w: check lfs_migrate -c stripe_count works == 00:42:51 (1372405371) 00:39:44:LustreError: 14887:0:(mdt_handler.c:3026:mdt_msg_check_version()) MDS unknown opcode 61 00:39:44:LustreError: 14887:0:(mdt_handler.c:3065:mdt_handle0()) mdt drops mal-formed request 00:40:25:Lustre: lustre-MDT0000: Client ca0ab3a3-a02f-6aeb-30e4-9241dc18b04d (at 10.10.4.121@tcp) reconnecting 00:40:25:LustreError: 22982:0:(mdt_handler.c:3026:mdt_msg_check_version()) MDS unknown opcode 61 00:40:25:LustreError: 22982:0:(mdt_handler.c:3065:mdt_handle0()) mdt drops mal-formed request 00:41:07:Lustre: lustre-MDT0000: Client ca0ab3a3-a02f-6aeb-30e4-9241dc18b04d (at 10.10.4.121@tcp) reconnecting 00:41:07:LustreError: 14887:0:(mdt_handler.c:3026:mdt_msg_check_version()) MDS unknown opcode 61 00:41:07:LustreError: 14887:0:(mdt_handler.c:3065:mdt_handle0()) mdt drops mal-formed request 00:41:49:Lustre: lustre-MDT0000: Client ca0ab3a3-a02f-6aeb-30e4-9241dc18b04d (at 10.10.4.121@tcp) reconnecting Maloo report: https://maloo.whamcloud.com/test_sets/0219094e-e0f0-11e2-b3fd-52540035b04c |
| Comments |
| Comment by Jian Yu [ 02/Jul/13 ] |
|
sanity test 163 also hung: == sanity test 163: kernel <-> userspace comms == 10:33:03 (1372699983) Console log on MDS showed that: 10:29:27:Lustre: DEBUG MARKER: == sanity test 163: kernel <-> userspace comms == 10:33:03 (1372699983) 10:29:27:LustreError: 13247:0:(mdt_handler.c:3026:mdt_msg_check_version()) MDS unknown opcode 59 10:29:27:LustreError: 13247:0:(mdt_handler.c:3065:mdt_handle0()) mdt drops mal-formed request 10:29:38:Lustre: lustre-MDT0000: Client ddfe9157-a350-ad6b-9ef9-8ce50157cd82 (at 10.10.4.125@tcp) reconnecting 10:29:38:LustreError: 13247:0:(mdt_handler.c:3026:mdt_msg_check_version()) MDS unknown opcode 59 10:29:38:LustreError: 13247:0:(mdt_handler.c:3065:mdt_handle0()) mdt drops mal-formed request 10:29:38:LustreError: 10855:0:(mdt_handler.c:3026:mdt_msg_check_version()) MDS unknown opcode 59 10:29:38:LustreError: 10855:0:(mdt_handler.c:3065:mdt_handle0()) mdt drops mal-formed request 10:30:30:Lustre: lustre-MDT0000: haven't heard from client ddfe9157-a350-ad6b-9ef9-8ce50157cd82 (at 10.10.4.125@tcp) in 51 seconds. I think it's dead, and I am evicting it. exp ffff88005bc78800, cur 1372700043 expire 1372700013 last 1372699992 Maloo report: https://maloo.whamcloud.com/test_sets/2bf3c3f6-e287-11e2-8171-52540035b04c |
| Comment by Andreas Dilger [ 02/Jul/13 ] |
|
The "lfs migrate" RPC is not supported by the 2.3 MDS, but I guess it is a bug in the 2.x MDS code that it just drops this reply instead of returning an error to the caller? It should definitely return -EPROTO or -EOPNOTSUPP or similar. |
| Comment by Emoly Liu [ 02/Jul/13 ] |
|
The following opcodes are not supported by 2.3 MDS. MDS_HSM_STATE_GET = 54,
MDS_HSM_STATE_SET = 55,
MDS_HSM_ACTION = 56,
MDS_HSM_PROGRESS = 57,
MDS_HSM_REQUEST = 58,
MDS_HSM_CT_REGISTER = 59,
MDS_HSM_CT_UNREGISTER = 60,
MDS_SWAP_LAYOUTS = 61,
|
| Comment by Jinshan Xiong (Inactive) [ 02/Jul/13 ] |
|
patch is at: http://review.whamcloud.com/6864, the patch is for master but we should cherry-pick it to b2_3 and b2_4. |
| Comment by Andreas Dilger [ 02/Jul/13 ] |
|
This patch is also needed for 2.1.7. |
| Comment by Oleg Drokin [ 06/Jul/13 ] |
|
the patch was landed and cherry-picked to b2_1 and b2_4 |
| Comment by Jodi Levi (Inactive) [ 09/Jul/13 ] |
|
Can this ticket be closed now? Or does the patch still need to be cherry picked? |
| Comment by Jodi Levi (Inactive) [ 11/Jul/13 ] |
|
The fix for the build failure needs to be cherry picked to b2_1 |
| Comment by Oleg Drokin [ 19/Jul/13 ] |
|
cherrypicked compile fix to b2.1 |
| Comment by Jodi Levi (Inactive) [ 19/Jul/13 ] |
|
Can this be closed? Or is there more work to do? |
| Comment by Sarah Liu [ 15/Aug/13 ] |
|
Also hit this problem when testing interop between 2.3.0 server and 2.5 client: https://maloo.whamcloud.com/test_sets/4926deda-02b8-11e3-a4b4-52540035b04c |