[LU-559] 1.8<->2.1 interop: LBUG: ASSERTION(diff >= 0) failed: lustre-OST0000: 1 - 33 = -32 Created: 01/Aug/11 Updated: 02/Jul/12 Resolved: 14/Aug/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0, Lustre 1.8.6 |
| Fix Version/s: | Lustre 2.1.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jian Yu | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Clients: Lustre Servers: |
||
| Severity: | 3 |
| Bugzilla ID: | 20,324 |
| Rank (Obsolete): | 4923 |
| Description |
|
While running obdfilter-survey test 2a, the following LBUG occurred on the OSS node: LustreError: 20123:0:(filter.c:3688:filter_handle_precreate()) ASSERTION(diff >= 0) failed: lustre-OST0000: 1 - 33 = -32 LustreError: 20123:0:(filter.c:3688:filter_handle_precreate()) LBUG Pid: 20123, comm: ll_ost_119 Call Trace: [<ffffffffa069d855>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa069de95>] lbug_with_loc+0x75/0xe0 [libcfs] [<ffffffffa0be7a0d>] filter_create+0x153d/0x1570 [obdfilter] [<ffffffffa0805498>] ? lustre_msg_check_version+0xc8/0xe0 [ptlrpc] [<ffffffff8126d016>] ? vsnprintf+0x2b6/0x5f0 [<ffffffffa0808310>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc] [<ffffffffa0bb7278>] ost_handle+0x48c8/0x4b90 [ost] [<ffffffffa06a85d1>] ? libcfs_debug_vmsg2+0x4d1/0xb50 [libcfs] [<ffffffffa0803144>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc] [<ffffffffa080588c>] ? lustre_msg_get_status+0x3c/0xa0 [ptlrpc] [<ffffffffa0813d3e>] ptlrpc_main+0xb8e/0x1900 [ptlrpc] [<ffffffff8100c1ca>] child_rip+0xa/0x20 [<ffffffffa08131b0>] ? ptlrpc_main+0x0/0x1900 [ptlrpc] [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Kernel panic - not syncing: LBUG Pid: 20123, comm: ll_ost_119 Tainted: G ---------------- T 2.6.32-131.2.1.el6_lustre.x86_64 #1 Call Trace: [<ffffffff814db1b8>] ? panic+0x78/0x143 [<ffffffffa069deeb>] ? lbug_with_loc+0xcb/0xe0 [libcfs] [<ffffffffa0be7a0d>] ? filter_create+0x153d/0x1570 [obdfilter] [<ffffffffa0805498>] ? lustre_msg_check_version+0xc8/0xe0 [ptlrpc] [<ffffffff8126d016>] ? vsnprintf+0x2b6/0x5f0 [<ffffffffa0808310>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc] [<ffffffffa0bb7278>] ? ost_handle+0x48c8/0x4b90 [ost] [<ffffffffa06a85d1>] ? libcfs_debug_vmsg2+0x4d1/0xb50 [libcfs] [<ffffffffa0803144>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc] [<ffffffffa080588c>] ? lustre_msg_get_status+0x3c/0xa0 [ptlrpc] [<ffffffffa0813d3e>] ? ptlrpc_main+0xb8e/0x1900 [ptlrpc] [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 [<ffffffffa08131b0>] ? ptlrpc_main+0x0/0x1900 [ptlrpc] [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Maloo report: https://maloo.whamcloud.com/test_sets/39a9f35c-bc24-11e0-8bdf-52540025f9af This is an known issue: bug 20324 |
| Comments |
| Comment by Peter Jones [ 03/Aug/11 ] |
|
Niu will look into this one |
| Comment by Niu Yawei (Inactive) [ 03/Aug/11 ] |
|
I don't see why echo client can trigger pre-create on OST, and seems the debug log is missed in the maloo test result. Hi, Yujian |
| Comment by Jian Yu [ 04/Aug/11 ] |
Yes, the LBUG could be reproduced consistently. |
| Comment by Niu Yawei (Inactive) [ 04/Aug/11 ] |
|
I found that 2.0 code will always change the oa->o_seq to 0 for the request from 1.8 client, which makes the 1.8 echo client use the 0 group mistakenly. /** * Validate oa from client. * 1. If the request comes from 1.8 clients, it will reset o_seq with MDT0. * 2. If the request comes from 2.0 clients, currently only RSVD seq and IDIF * req are valid. * a. for single MDS seq = FID_SEQ_OST_MDT0, * b. for CMD, seq = FID_SEQ_OST_MDT0, FID_SEQ_OST_MDT1 - FID_SEQ_OST_MAX */ static int ost_validate_obdo(struct obd_export *exp, struct obdo *oa, struct obd_ioobj *ioobj) { if (oa != NULL && (!(oa->o_valid & OBD_MD_FLGROUP) || !(exp->exp_connect_flags & OBD_CONNECT_FULL20))) { oa->o_seq = FID_SEQ_OST_MDT0; if (ioobj) ioobj->ioo_seq = FID_SEQ_OST_MDT0; I think we need to check the OBD_MD_FLGROUP also for the 1.8 requests. Will make a patch soon. |
| Comment by Niu Yawei (Inactive) [ 04/Aug/11 ] |
|
patch is tracking at: http://review.whamcloud.com/1182 |
| Comment by Build Master (Inactive) [ 09/Aug/11 ] |
|
Integrated in Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
|
| Comment by Build Master (Inactive) [ 09/Aug/11 ] |
|
Integrated in Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
|
| Comment by Build Master (Inactive) [ 09/Aug/11 ] |
|
Integrated in Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
|
| Comment by Build Master (Inactive) [ 09/Aug/11 ] |
|
Integrated in Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
|
| Comment by Build Master (Inactive) [ 09/Aug/11 ] |
|
Integrated in Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
|
| Comment by Build Master (Inactive) [ 09/Aug/11 ] |
|
Integrated in Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
|
| Comment by Build Master (Inactive) [ 09/Aug/11 ] |
|
Integrated in Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
|
| Comment by Build Master (Inactive) [ 09/Aug/11 ] |
|
Integrated in Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
|
| Comment by Build Master (Inactive) [ 09/Aug/11 ] |
|
Integrated in Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
|
| Comment by Build Master (Inactive) [ 09/Aug/11 ] |
|
Integrated in Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
|
| Comment by Build Master (Inactive) [ 09/Aug/11 ] |
|
Integrated in Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
|
| Comment by Build Master (Inactive) [ 09/Aug/11 ] |
|
Integrated in Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
|
| Comment by Build Master (Inactive) [ 09/Aug/11 ] |
|
Integrated in Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
|
| Comment by Build Master (Inactive) [ 09/Aug/11 ] |
|
Integrated in Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
|
| Comment by Niu Yawei (Inactive) [ 14/Aug/11 ] |
|
landed for 2.1 |
| Comment by Jay Lan (Inactive) [ 02/Jul/12 ] |
|
Our 2.1.1 OSS server that contains this patch just crashed: LustreError: 7204:0:(filter.c:3685:filter_handle_precreate()) ASSERTION(diff >= 0) failed: nbp1-OST006d: 42944172 - 42944239 = -67^M We can still hit this LBUG with the commit. |