[LU-559] 1.8<->2.1 interop: LBUG: ASSERTION(diff >= 0) failed: lustre-OST0000: 1 - 33 = -32 Created: 01/Aug/11  Updated: 02/Jul/12  Resolved: 14/Aug/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0, Lustre 1.8.6
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Major
Reporter: Jian Yu Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Lustre Clients:
Tag: 1.8.6-wc1
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32_131.2.1.el6)
Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/arch=x86_64,build_type=client,distro=el6,ib_stack=inkernel/
Network: IB (inkernel OFED)
ENABLE_QUOTA=yes

Lustre Servers:
Tag: v2_0_66_0
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32-131.2.1.el6_lustre)
Build: http://newbuild.whamcloud.com/job/lustre-master/228/arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel/
Network: IB (inkernel OFED)


Severity: 3
Bugzilla ID: 20,324
Rank (Obsolete): 4923

 Description   

While running obdfilter-survey test 2a, the following LBUG occurred on the OSS node:

LustreError: 20123:0:(filter.c:3688:filter_handle_precreate()) ASSERTION(diff >= 0) failed: lustre-OST0000: 1 - 33 = -32
LustreError: 20123:0:(filter.c:3688:filter_handle_precreate()) LBUG
Pid: 20123, comm: ll_ost_119

Call Trace:
[<ffffffffa069d855>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa069de95>] lbug_with_loc+0x75/0xe0 [libcfs]
[<ffffffffa0be7a0d>] filter_create+0x153d/0x1570 [obdfilter]
[<ffffffffa0805498>] ? lustre_msg_check_version+0xc8/0xe0 [ptlrpc]
[<ffffffff8126d016>] ? vsnprintf+0x2b6/0x5f0
[<ffffffffa0808310>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc]
[<ffffffffa0bb7278>] ost_handle+0x48c8/0x4b90 [ost]
[<ffffffffa06a85d1>] ? libcfs_debug_vmsg2+0x4d1/0xb50 [libcfs]
[<ffffffffa0803144>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]
[<ffffffffa080588c>] ? lustre_msg_get_status+0x3c/0xa0 [ptlrpc]
[<ffffffffa0813d3e>] ptlrpc_main+0xb8e/0x1900 [ptlrpc]
[<ffffffff8100c1ca>] child_rip+0xa/0x20
[<ffffffffa08131b0>] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
[<ffffffff8100c1c0>] ? child_rip+0x0/0x20

Kernel panic - not syncing: LBUG
Pid: 20123, comm: ll_ost_119 Tainted: G           ---------------- T 2.6.32-131.2.1.el6_lustre.x86_64 #1
Call Trace:
[<ffffffff814db1b8>] ? panic+0x78/0x143
[<ffffffffa069deeb>] ? lbug_with_loc+0xcb/0xe0 [libcfs]
[<ffffffffa0be7a0d>] ? filter_create+0x153d/0x1570 [obdfilter]
[<ffffffffa0805498>] ? lustre_msg_check_version+0xc8/0xe0 [ptlrpc]
[<ffffffff8126d016>] ? vsnprintf+0x2b6/0x5f0
[<ffffffffa0808310>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc]
[<ffffffffa0bb7278>] ? ost_handle+0x48c8/0x4b90 [ost]
[<ffffffffa06a85d1>] ? libcfs_debug_vmsg2+0x4d1/0xb50 [libcfs]
[<ffffffffa0803144>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]
[<ffffffffa080588c>] ? lustre_msg_get_status+0x3c/0xa0 [ptlrpc]
[<ffffffffa0813d3e>] ? ptlrpc_main+0xb8e/0x1900 [ptlrpc]
[<ffffffff8100c1ca>] ? child_rip+0xa/0x20
[<ffffffffa08131b0>] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
[<ffffffff8100c1c0>] ? child_rip+0x0/0x20

Maloo report: https://maloo.whamcloud.com/test_sets/39a9f35c-bc24-11e0-8bdf-52540025f9af

This is an known issue: bug 20324



 Comments   
Comment by Peter Jones [ 03/Aug/11 ]

Niu will look into this one

Comment by Niu Yawei (Inactive) [ 03/Aug/11 ]

I don't see why echo client can trigger pre-create on OST, and seems the debug log is missed in the maloo test result.

Hi, Yujian
Is this bug easy to reproduce? Is it possible to get the debug log with D_INODE enabled? I want to see if the group passed to OST is correct. Thank you.

Comment by Jian Yu [ 04/Aug/11 ]

Is this bug easy to reproduce? Is it possible to get the debug log with D_INODE enabled?

Yes, the LBUG could be reproduced consistently.
Please take a look at this report which contains the debug logs gathered by running the test with "PTLDEBUG=-1":
https://maloo.whamcloud.com/test_sets/095c50b0-be4e-11e0-8bdf-52540025f9af

Comment by Niu Yawei (Inactive) [ 04/Aug/11 ]

I found that 2.0 code will always change the oa->o_seq to 0 for the request from 1.8 client, which makes the 1.8 echo client use the 0 group mistakenly.

/**
 * Validate oa from client.
 * 1. If the request comes from 1.8 clients, it will reset o_seq with MDT0.
 * 2. If the request comes from 2.0 clients, currently only RSVD seq and IDIF
 *    req are valid.
 *      a. for single MDS  seq = FID_SEQ_OST_MDT0,
 *      b. for CMD, seq = FID_SEQ_OST_MDT0, FID_SEQ_OST_MDT1 - FID_SEQ_OST_MAX
 */
static int ost_validate_obdo(struct obd_export *exp, struct obdo *oa,
                             struct obd_ioobj *ioobj)
{
        if (oa != NULL && (!(oa->o_valid & OBD_MD_FLGROUP) ||
            !(exp->exp_connect_flags & OBD_CONNECT_FULL20))) {
                oa->o_seq = FID_SEQ_OST_MDT0;
                if (ioobj)
                        ioobj->ioo_seq = FID_SEQ_OST_MDT0;

I think we need to check the OBD_MD_FLGROUP also for the 1.8 requests. Will make a patch soon.

Comment by Niu Yawei (Inactive) [ 04/Aug/11 ]

patch is tracking at: http://review.whamcloud.com/1182

Comment by Build Master (Inactive) [ 09/Aug/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #248
LU-559 Keep the o_seq unchanged for 1.8 client

Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
Files :

  • lustre/ost/ost_handler.c
  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 09/Aug/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #248
LU-559 Keep the o_seq unchanged for 1.8 client

Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
Files :

  • lustre/obdfilter/filter.c
  • lustre/ost/ost_handler.c
Comment by Build Master (Inactive) [ 09/Aug/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #248
LU-559 Keep the o_seq unchanged for 1.8 client

Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
Files :

  • lustre/ost/ost_handler.c
  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 09/Aug/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #248
LU-559 Keep the o_seq unchanged for 1.8 client

Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
Files :

  • lustre/ost/ost_handler.c
  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 09/Aug/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #248
LU-559 Keep the o_seq unchanged for 1.8 client

Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
Files :

  • lustre/obdfilter/filter.c
  • lustre/ost/ost_handler.c
Comment by Build Master (Inactive) [ 09/Aug/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #248
LU-559 Keep the o_seq unchanged for 1.8 client

Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
Files :

  • lustre/obdfilter/filter.c
  • lustre/ost/ost_handler.c
Comment by Build Master (Inactive) [ 09/Aug/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #248
LU-559 Keep the o_seq unchanged for 1.8 client

Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
Files :

  • lustre/ost/ost_handler.c
  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 09/Aug/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #248
LU-559 Keep the o_seq unchanged for 1.8 client

Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
Files :

  • lustre/obdfilter/filter.c
  • lustre/ost/ost_handler.c
Comment by Build Master (Inactive) [ 09/Aug/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #248
LU-559 Keep the o_seq unchanged for 1.8 client

Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
Files :

  • lustre/ost/ost_handler.c
  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 09/Aug/11 ]

Integrated in lustre-master » i686,server,el5,ofa #248
LU-559 Keep the o_seq unchanged for 1.8 client

Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
Files :

  • lustre/obdfilter/filter.c
  • lustre/ost/ost_handler.c
Comment by Build Master (Inactive) [ 09/Aug/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #248
LU-559 Keep the o_seq unchanged for 1.8 client

Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
Files :

  • lustre/obdfilter/filter.c
  • lustre/ost/ost_handler.c
Comment by Build Master (Inactive) [ 09/Aug/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #248
LU-559 Keep the o_seq unchanged for 1.8 client

Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
Files :

  • lustre/ost/ost_handler.c
  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 09/Aug/11 ]

Integrated in lustre-master » i686,client,el5,ofa #248
LU-559 Keep the o_seq unchanged for 1.8 client

Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
Files :

  • lustre/obdfilter/filter.c
  • lustre/ost/ost_handler.c
Comment by Build Master (Inactive) [ 09/Aug/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #248
LU-559 Keep the o_seq unchanged for 1.8 client

Oleg Drokin : 97e7c60bd9b2135ff5d498009b3996507fe6654b
Files :

  • lustre/ost/ost_handler.c
  • lustre/obdfilter/filter.c
Comment by Niu Yawei (Inactive) [ 14/Aug/11 ]

landed for 2.1

Comment by Jay Lan (Inactive) [ 02/Jul/12 ]

Our 2.1.1 OSS server that contains this patch just crashed:

LustreError: 7204:0:(filter.c:3685:filter_handle_precreate()) ASSERTION(diff >= 0) failed: nbp1-OST006d: 42944172 - 42944239 = -67^M
LustreError: 7185:0:(filter.c:4143:filter_destroy()) nbp1-OST0055: can not find olg of group 0^M
LustreError: 7204:0:(filter.c:3685:filter_handle_precreate()) LBUG^M

We can still hit this LBUG with the commit.

Generated at Sat Feb 10 01:08:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.