[LU-3546] 2.4.0<->2.3 interop: sanity test 56w: MDS unknown opcode 61 Created: 02/Jul/13  Updated: 15/Aug/13  Resolved: 24/Jul/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.1, Lustre 2.5.0

Type: Bug Priority: Blocker
Reporter: Jian Yu Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: mn1, yuc2
Environment:

Lustre server build: https://build.whamcloud.com/job/lustre-chris/25/
Lustre client build: http://build.whamcloud.com/job/lustre-b2_4/13/


Severity: 3
Rank (Obsolete): 8924

 Description   

sanity test 56w hung as follows:

== sanity test 56w: check lfs_migrate -c stripe_count works == 00:42:51 (1372405371)
/usr/bin/lfs_migrate -y -c 6 /mnt/lustre/d0.sanity/d56w/file1

Console log on the MDS showed that:

00:39:22:Lustre: DEBUG MARKER: == sanity test 56w: check lfs_migrate -c stripe_count works == 00:42:51 (1372405371)
00:39:44:LustreError: 14887:0:(mdt_handler.c:3026:mdt_msg_check_version()) MDS unknown opcode 61
00:39:44:LustreError: 14887:0:(mdt_handler.c:3065:mdt_handle0()) mdt drops mal-formed request
00:40:25:Lustre: lustre-MDT0000: Client ca0ab3a3-a02f-6aeb-30e4-9241dc18b04d (at 10.10.4.121@tcp) reconnecting
00:40:25:LustreError: 22982:0:(mdt_handler.c:3026:mdt_msg_check_version()) MDS unknown opcode 61
00:40:25:LustreError: 22982:0:(mdt_handler.c:3065:mdt_handle0()) mdt drops mal-formed request
00:41:07:Lustre: lustre-MDT0000: Client ca0ab3a3-a02f-6aeb-30e4-9241dc18b04d (at 10.10.4.121@tcp) reconnecting
00:41:07:LustreError: 14887:0:(mdt_handler.c:3026:mdt_msg_check_version()) MDS unknown opcode 61
00:41:07:LustreError: 14887:0:(mdt_handler.c:3065:mdt_handle0()) mdt drops mal-formed request
00:41:49:Lustre: lustre-MDT0000: Client ca0ab3a3-a02f-6aeb-30e4-9241dc18b04d (at 10.10.4.121@tcp) reconnecting

Maloo report: https://maloo.whamcloud.com/test_sets/0219094e-e0f0-11e2-b3fd-52540035b04c



 Comments   
Comment by Jian Yu [ 02/Jul/13 ]

sanity test 163 also hung:

== sanity test 163: kernel <-> userspace comms == 10:33:03 (1372699983)

Console log on MDS showed that:

10:29:27:Lustre: DEBUG MARKER: == sanity test 163: kernel <-> userspace comms == 10:33:03 (1372699983)
10:29:27:LustreError: 13247:0:(mdt_handler.c:3026:mdt_msg_check_version()) MDS unknown opcode 59
10:29:27:LustreError: 13247:0:(mdt_handler.c:3065:mdt_handle0()) mdt drops mal-formed request
10:29:38:Lustre: lustre-MDT0000: Client ddfe9157-a350-ad6b-9ef9-8ce50157cd82 (at 10.10.4.125@tcp) reconnecting
10:29:38:LustreError: 13247:0:(mdt_handler.c:3026:mdt_msg_check_version()) MDS unknown opcode 59
10:29:38:LustreError: 13247:0:(mdt_handler.c:3065:mdt_handle0()) mdt drops mal-formed request
10:29:38:LustreError: 10855:0:(mdt_handler.c:3026:mdt_msg_check_version()) MDS unknown opcode 59
10:29:38:LustreError: 10855:0:(mdt_handler.c:3065:mdt_handle0()) mdt drops mal-formed request
10:30:30:Lustre: lustre-MDT0000: haven't heard from client ddfe9157-a350-ad6b-9ef9-8ce50157cd82 (at 10.10.4.125@tcp) in 51 seconds. I think it's dead, and I am evicting it. exp ffff88005bc78800, cur 1372700043 expire 1372700013 last 1372699992

Maloo report: https://maloo.whamcloud.com/test_sets/2bf3c3f6-e287-11e2-8171-52540035b04c

Comment by Andreas Dilger [ 02/Jul/13 ]

The "lfs migrate" RPC is not supported by the 2.3 MDS, but I guess it is a bug in the 2.x MDS code that it just drops this reply instead of returning an error to the caller? It should definitely return -EPROTO or -EOPNOTSUPP or similar.

Comment by Emoly Liu [ 02/Jul/13 ]

The following opcodes are not supported by 2.3 MDS.

        MDS_HSM_STATE_GET       = 54,
        MDS_HSM_STATE_SET       = 55,
        MDS_HSM_ACTION          = 56,
        MDS_HSM_PROGRESS        = 57,
        MDS_HSM_REQUEST         = 58,
        MDS_HSM_CT_REGISTER     = 59,
        MDS_HSM_CT_UNREGISTER   = 60,
        MDS_SWAP_LAYOUTS        = 61,
Comment by Jinshan Xiong (Inactive) [ 02/Jul/13 ]

patch is at: http://review.whamcloud.com/6864, the patch is for master but we should cherry-pick it to b2_3 and b2_4.

Comment by Andreas Dilger [ 02/Jul/13 ]

This patch is also needed for 2.1.7.

Comment by Oleg Drokin [ 06/Jul/13 ]

the patch was landed and cherry-picked to b2_1 and b2_4

Comment by Jodi Levi (Inactive) [ 09/Jul/13 ]

Can this ticket be closed now? Or does the patch still need to be cherry picked?

Comment by Jodi Levi (Inactive) [ 11/Jul/13 ]

The fix for the build failure needs to be cherry picked to b2_1

Comment by Oleg Drokin [ 19/Jul/13 ]

cherrypicked compile fix to b2.1

Comment by Jodi Levi (Inactive) [ 19/Jul/13 ]

Can this be closed? Or is there more work to do?

Comment by Sarah Liu [ 15/Aug/13 ]

Also hit this problem when testing interop between 2.3.0 server and 2.5 client:

https://maloo.whamcloud.com/test_sets/4926deda-02b8-11e3-a4b4-52540035b04c

Generated at Sat Feb 10 01:34:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.