[LU-3187] Interop 2.3.64<->2.3/2.1: sanity test_180a: lustre-OST0000: client sent bad object 0x2ce2:2: rc = -EPROTO Created: 17/Apr/13 Updated: 20/May/13 Resolved: 13/May/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0, Lustre 2.1.6 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Sarah Liu | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | MB | ||
| Environment: |
server: lustre-master tag-2.3.64 build #1411 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 7775 | ||||||||
| Description |
|
Hit following error when running sanity subtest 180a: client console: Lustre: DEBUG MARKER: == sanity test 180a: test obdecho on osc == 09:59:45 (1366304385) Lustre: 26569:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1366304386/real 1366304386] req@ffff88030bf2cc00 x1432673905016940/t0(0) o4->lustre-OST0000_osc@10.10.4.134@tcp:6/4 lens 488/448 e 0 to 1 dl 1366304398 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Lustre: lustre-OST0000_osc: Connection to lustre-OST0000 (at 10.10.4.134@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: lustre-OST0000_osc: Connection restored to lustre-OST0000 (at 10.10.4.134@tcp) Lustre: 26569:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1366304398/real 1366304398] req@ffff88030bf2cc00 x1432673905016959/t0(0) o4->lustre-OST0000_osc@10.10.4.134@tcp:6/4 lens 488/448 e 0 to 1 dl 1366304410 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Lustre: lustre-OST0000_osc: Connection to lustre-OST0000 (at 10.10.4.134@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: lustre-OST0000_osc: Connection restored to lustre-OST0000 (at 10.10.4.134@tcp) Lustre: 26569:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1366304410/real 1366304410] req@ffff88030bf2cc00 x1432673905016978/t0(0) o4->lustre-OST0000_osc@10.10.4.134@tcp:6/4 lens 488/448 e 0 to 1 dl 1366304422 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Lustre: lustre-OST0000_osc: Connection to lustre-OST0000 (at 10.10.4.134@tcp) was lost; in progress operations using this service will wait for recovery to complete OST console: Lustre: DEBUG MARKER: == sanity test 180a: test obdecho on osc == 01:56:15 (1366188975) LustreError: 16656:0:(ost_handler.c:123:ost_validate_obdo()) lustre-OST0000: client 10.10.4.5@tcp sent bad object 0x2ce2:2: rc = -EPROTO LustreError: 16656:0:(ost_handler.c:2107:ost_io_hpreq_handler()) invalid object ids Lustre: lustre-OST0000: Client lustre-OST0000_osc_UUID (at 10.10.4.5@tcp) reconnecting LustreError: 16656:0:(ost_handler.c:123:ost_validate_obdo()) lustre-OST0000: client 10.10.4.5@tcp sent bad object 0x2ce2:2: rc = -EPROTO LustreError: 16656:0:(ost_handler.c:2107:ost_io_hpreq_handler()) invalid object ids Lustre: lustre-OST0000: Client lustre-OST0000_osc_UUID (at 10.10.4.5@tcp) reconnecting LustreError: 16656:0:(ost_handler.c:123:ost_validate_obdo()) lustre-OST0000: client 10.10.4.5@tcp sent bad object 0x2ce2:2: rc = -EPROTO |
| Comments |
| Comment by Keith Mannthey (Inactive) [ 17/Apr/13 ] |
|
Do you have a link to more logs? Your output just show the start of 180a. This was 2.3 master interop testing? |
| Comment by Sarah Liu [ 18/Apr/13 ] |
|
The system just hang there, I just updated the client console logs, what other logs do you need? |
| Comment by Sarah Liu [ 22/Apr/13 ] |
|
Also hit the similar issue between 2.1.5 client and 2.4 server in bdfilter-survey https://maloo.whamcloud.com/test_sets/777db7c8-a78b-11e2-b3cc-52540035b04c OST console: 10:02:39:LustreError: 31434:0:(ost_handler.c:123:ost_validate_obdo()) lustre-OST0003: client 10.10.4.117@tcp sent bad object 0x49a62:2: rc = -EPROTO 10:02:39:LustreError: 31434:0:(ost_handler.c:123:ost_validate_obdo()) Skipped 2799 previous similar messages 10:02:39:LustreError: 31434:0:(ost_handler.c:2107:ost_io_hpreq_handler()) invalid object ids 10:02:39:LustreError: 31434:0:(ost_handler.c:2107:ost_io_hpreq_handler()) Skipped 2799 previous similar messages 10:07:50:LustreError: 32677:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0006: cli lustre-OST0006_osc_UUID/ffff88007ab45c00 left 4009984 < tot_grant 4750380 unstable 0 pending 0 10:07:52:LustreError: 32677:0:(ofd_grant.c:255:ofd_grant_space_left()) Skipped 239 previous similar messages 10:12:17:Lustre: lustre-OST0003: Client lustre-OST0003_osc_UUID (at 10.10.4.117@tcp) reconnecting 10:12:17:Lustre: Skipped 349 previous similar messages 10:12:40:LustreError: 31434:0:(ost_handler.c:123:ost_validate_obdo()) lustre-OST0003: client 10.10.4.117@tcp sent bad object 0x49a62:2: rc = -EPROTO 10:12:41:LustreError: 31434:0:(ost_handler.c:123:ost_validate_obdo()) Skipped 2799 previous similar messages 10:12:41:LustreError: 31434:0:(ost_handler.c:2107:ost_io_hpreq_handler()) invalid object ids 10:12:41:LustreError: 31434:0:(ost_handler.c:2107:ost_io_hpreq_handler()) Skipped 2799 previous similar messages 10:17:50:LustreError: 32686:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0006: cli lustre-OST0006_osc_UUID/ffff88007ab45c00 left 4009984 < tot_grant 4750380 unstable 0 pending 0 10:17:50:LustreError: 32686:0:(ofd_grant.c:255:ofd_grant_space_left()) Skipped 239 previous similar messages |
| Comment by Andreas Dilger [ 07/May/13 ] |
|
This was introduced with the Bumping to be a blocker until we decide whether interop of echo_client and obdfilter-survey is a requirement for 2.4.0. I don't think this could easily be fixed after 2.4.0 anymore. Possible options for fixing this include:
If we don't consider this fix itself to be a blocker, it probably makes sense to have the clients and MDS send OBD_CONNECT_OSTFID and the servers accept this, so that it is at least possible to detect and fix this situation later. |
| Comment by Peter Jones [ 07/May/13 ] |
|
Di Could you please comment? Thanks Peter |
| Comment by Di Wang [ 08/May/13 ] |
|
Hmm, echo client does not use OBD_CONNECT_FID yet, so we can still add OBD_CONNECT_FID for 2.4 echo client. Then ost_validate_obdo can tell new and old echo client. I will cook a patch. |
| Comment by Di Wang [ 08/May/13 ] |
| Comment by Andreas Dilger [ 10/May/13 ] |
|
Minor fixup for b2_1: http://review.whamcloud.com/6319 |
| Comment by Jodi Levi (Inactive) [ 13/May/13 ] |
|
Patch landed to master. |