[LU-662] 1.8<->2.1 interop: LBUG: ASSERTION(!range_is_exhausted(&seq->lcs_space)) failed Created: 06/Sep/11  Updated: 06/Apr/12  Resolved: 14/Sep/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Blocker
Reporter: Jian Yu Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None
Environment:

Old Lustre Versions: 1.8.5 and 1.8.6-wc1
Lustre 1.8.6-wc1 Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/

New Lustre Version: 2.1.0
Lustre 2.1.0 Build: http://newbuild.whamcloud.com/job/lustre-master/276/
Network: TCP (1GigE)

Clean upgrading (Lustre servers and clients were upgraded all at once) from Lustre 1.8.5 and 1.8.6-wc1 to Lustre 2.1.0 under the following configuration:
OSS1: RHEL5/x86_64 upgrade from 1.8.6-wc1 to 2.1.0
OSS2: RHEL5/x86_64 upgrade from 1.8.5 to 2.1.0
MDS: RHEL5/x86_64 upgrade from 1.8.6-wc1 to 2.1.0
Client1: RHEL6/x86_64 upgrade from 1.8.6-wc1 to 2.1.0
Client2: RHEL5/x86_64 upgrade from 1.8.5 to 2.1.0

Test nodes:
OSS1: fat-amd-2
OSS2: fat-amd-3
MDS: fat-amd-1
Client1: client-12
Client2: client-13


Severity: 3
Rank (Obsolete): 4904

 Description   

After the clean upgrading, while running metabench test, client-13 hit the following LBUG:

Lustre: DEBUG MARKER: == parallel-scale test metabench: metabench == 23:53:25 (1315292005)
LustreError: 21857:0:(fid_request.c:213:seq_client_alloc_seq()) ASSERTION(!range_is_exhausted(&seq->lcs_space)) failed
LustreError: 21857:0:(fid_request.c:213:seq_client_alloc_seq()) LBUG
Pid: 21857, comm: metabench

Call Trace:
 [<ffffffff887b15f1>] libcfs_debug_dumpstack+0x51/0x60 [libcfs]
 [<ffffffff887b1b2a>] lbug_with_loc+0x7a/0xd0 [libcfs]
 [<ffffffff887bcde0>] cfs_tracefile_init+0x0/0x10a [libcfs]
 [<ffffffff88ab9cad>] seq_client_alloc_fid+0x4bd/0x810 [fid]
 [<ffffffff8001a870>] vsnprintf+0x3f8/0x627
 [<ffffffff8008e430>] default_wake_function+0x0/0xe
 [<ffffffff887bcaeb>] libcfs_debug_vmsg2+0x75b/0x9f0 [libcfs]
 [<ffffffff88b0bced>] mdc_fid_alloc+0x7d/0x100 [mdc]
 [<ffffffff88b1df45>] mdc_intent_lock+0x385/0x610 [mdc]
 [<ffffffff88d1748c>] ll_i2gids+0x6c/0xa0 [lustre]
 [<ffffffff88d177e0>] ll_md_blocking_ast+0x0/0x580 [lustre]
 [<ffffffff88971cd0>] ldlm_completion_ast+0x0/0x6a0 [ptlrpc]
 [<ffffffff88d17038>] ll_lookup_it+0x8c8/0xc90 [lustre]
 [<ffffffff88d177e0>] ll_md_blocking_ast+0x0/0x580 [lustre]
 [<ffffffff88d18166>] ll_lookup_nd+0x1e6/0x400 [lustre]
 [<ffffffff80022a76>] d_alloc+0x174/0x1a9
 [<ffffffff800370b8>] __lookup_hash+0x10b/0x12f
 [<ffffffff8001b227>] open_namei+0xf2/0x712
 [<ffffffff8002768a>] do_filp_open+0x1c/0x38
 [<ffffffff8001a061>] do_sys_open+0x44/0xbe
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

Kernel panic - not syncing: LBUG

Maloo report: https://maloo.whamcloud.com/test_sets/f2edb6ae-d863-11e0-8d02-52540025f9af



 Comments   
Comment by Peter Jones [ 07/Sep/11 ]

HongChao

Can you please look into this one as your top priority?

Thanks

Peter

Comment by Hongchao Zhang [ 08/Sep/11 ]

there is two possible cases to cause this issue,
1, there is a bug in "seq_client_proc_write_width"
...
cfs_down(&seq->lcs_sem);

rc = lprocfs_write_helper(buffer, count, &val);
if (rc) <------- there should add a cfs_up(&seq->lcs_sem) to release the semaphore
RETURN(rc);
...

2, "seq_client_flush" conflicts with "seq_client_alloc_seq", which updates "lu_client_seq->lcs_space" without holding
the "lcs_sem", and "seq_client_flush"(called in "mdc_import_event" for event "IMP_EVENT_INACTIVE") could reinitialize
it at the same time.

Comment by Jessica A. Popp (Inactive) [ 08/Sep/11 ]

Hi HongChao -

Please provide an update today on your thoughts on this so we can determine blocker status and decide whether we are ready to cut RC2 tomorrow.

Thanks,

Jessica

Comment by Hongchao Zhang [ 09/Sep/11 ]

patch is at http://review.whamcloud.com/#change,1364

Comment by Build Master (Inactive) [ 13/Sep/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #278
LU-662 fix conflict between seq_client_flush and seq_client_alloc_fid

Oleg Drokin : d1feb5c774d4690a4d4c4828d734a2604438f923
Files :

  • lustre/fid/lproc_fid.c
  • lustre/fid/fid_request.c
Comment by Build Master (Inactive) [ 13/Sep/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #278
LU-662 fix conflict between seq_client_flush and seq_client_alloc_fid

Oleg Drokin : d1feb5c774d4690a4d4c4828d734a2604438f923
Files :

  • lustre/fid/lproc_fid.c
  • lustre/fid/fid_request.c
Comment by Build Master (Inactive) [ 13/Sep/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #278
LU-662 fix conflict between seq_client_flush and seq_client_alloc_fid

Oleg Drokin : d1feb5c774d4690a4d4c4828d734a2604438f923
Files :

  • lustre/fid/fid_request.c
  • lustre/fid/lproc_fid.c
Comment by Build Master (Inactive) [ 13/Sep/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #278
LU-662 fix conflict between seq_client_flush and seq_client_alloc_fid

Oleg Drokin : d1feb5c774d4690a4d4c4828d734a2604438f923
Files :

  • lustre/fid/fid_request.c
  • lustre/fid/lproc_fid.c
Comment by Build Master (Inactive) [ 13/Sep/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #278
LU-662 fix conflict between seq_client_flush and seq_client_alloc_fid

Oleg Drokin : d1feb5c774d4690a4d4c4828d734a2604438f923
Files :

  • lustre/fid/fid_request.c
  • lustre/fid/lproc_fid.c
Comment by Build Master (Inactive) [ 13/Sep/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #278
LU-662 fix conflict between seq_client_flush and seq_client_alloc_fid

Oleg Drokin : d1feb5c774d4690a4d4c4828d734a2604438f923
Files :

  • lustre/fid/lproc_fid.c
  • lustre/fid/fid_request.c
Comment by Build Master (Inactive) [ 13/Sep/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #278
LU-662 fix conflict between seq_client_flush and seq_client_alloc_fid

Oleg Drokin : d1feb5c774d4690a4d4c4828d734a2604438f923
Files :

  • lustre/fid/lproc_fid.c
  • lustre/fid/fid_request.c
Comment by Build Master (Inactive) [ 13/Sep/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #278
LU-662 fix conflict between seq_client_flush and seq_client_alloc_fid

Oleg Drokin : d1feb5c774d4690a4d4c4828d734a2604438f923
Files :

  • lustre/fid/lproc_fid.c
  • lustre/fid/fid_request.c
Comment by Build Master (Inactive) [ 13/Sep/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #278
LU-662 fix conflict between seq_client_flush and seq_client_alloc_fid

Oleg Drokin : d1feb5c774d4690a4d4c4828d734a2604438f923
Files :

  • lustre/fid/lproc_fid.c
  • lustre/fid/fid_request.c
Comment by Build Master (Inactive) [ 13/Sep/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #278
LU-662 fix conflict between seq_client_flush and seq_client_alloc_fid

Oleg Drokin : d1feb5c774d4690a4d4c4828d734a2604438f923
Files :

  • lustre/fid/lproc_fid.c
  • lustre/fid/fid_request.c
Comment by Build Master (Inactive) [ 13/Sep/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #278
LU-662 fix conflict between seq_client_flush and seq_client_alloc_fid

Oleg Drokin : d1feb5c774d4690a4d4c4828d734a2604438f923
Files :

  • lustre/fid/lproc_fid.c
  • lustre/fid/fid_request.c
Comment by Build Master (Inactive) [ 13/Sep/11 ]

Integrated in lustre-master » i686,client,el5,ofa #278
LU-662 fix conflict between seq_client_flush and seq_client_alloc_fid

Oleg Drokin : d1feb5c774d4690a4d4c4828d734a2604438f923
Files :

  • lustre/fid/lproc_fid.c
  • lustre/fid/fid_request.c
Comment by Build Master (Inactive) [ 13/Sep/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #278
LU-662 fix conflict between seq_client_flush and seq_client_alloc_fid

Oleg Drokin : d1feb5c774d4690a4d4c4828d734a2604438f923
Files :

  • lustre/fid/lproc_fid.c
  • lustre/fid/fid_request.c
Comment by Build Master (Inactive) [ 13/Sep/11 ]

Integrated in lustre-master » i686,server,el5,ofa #278
LU-662 fix conflict between seq_client_flush and seq_client_alloc_fid

Oleg Drokin : d1feb5c774d4690a4d4c4828d734a2604438f923
Files :

  • lustre/fid/lproc_fid.c
  • lustre/fid/fid_request.c
Comment by Peter Jones [ 14/Sep/11 ]

Landed for 2.1

Comment by Christopher Morrone [ 06/Apr/12 ]

Just FYI, I think our BG/P system running our 1.8 client is hitting this. So it should go away when we upgrade to 2.1.1+ on our servers.

Generated at Sat Feb 10 01:09:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.