[LU-607] port bz24419 (ldlm namespace lock contention during oom) Created: 18/Aug/11  Updated: 28/Feb/18  Resolved: 28/Feb/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.0.0, Lustre 1.8.6
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Zhenyu Xu Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Bugzilla ID: 24,419
Rank (Obsolete): 9726

 Description   

While running a regression test:

[2011-01-20 19:14:30][c0-0c0s5n0]Kernel panic - not syncing: oom_kill_process killing invalid app
rcad_svcs.
[2011-01-20 19:14:30][c0-0c0s5n0]Pid: 5529, comm: stressapptest Tainted: P
2.6.32.24-0.2.1_1.0000.5704-cray_gem_c #1
[2011-01-20 19:14:30][c0-0c0s5n0]Call Trace:
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff810072b9>] try_stack_unwind+0x149/0x190
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff81005d90>] dump_trace+0x90/0x2f0
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff81006eb7>] show_trace_log_lvl+0x57/0x70
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff81006ee0>] show_trace+0x10/0x20
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff8125c69c>] dump_stack+0x72/0x7b
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff8125c71a>] panic+0x75/0x13b
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff810968e6>] __oom_kill_task+0xa6/0x190
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff81096db5>] oom_kill_process+0x245/0x2e0
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff81097296>] __out_of_memory+0x176/0x1e0
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff810977d9>] out_of_memory+0x4d9/0x560
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff8109ab72>] __alloc_pages_nodemask+0x662/0x680
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff810c1ba0>] alloc_page_vma+0x70/0x100
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff810b069f>] handle_mm_fault+0xbff/0xd00
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff8101f857>] do_page_fault+0x147/0x2c0
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff8125f7af>] page_fault+0x1f/0x30
[2011-01-20 19:14:30][c0-0c0s5n0] [<000000000041ae8a>] 0x41ae8a

crash> gdb list *(ldlm_pools_shrink+0x93)
0xffffffffa026a853 is in ldlm_pools_shrink
(/usr/src/packages/BUILD/cray-lustre-1.8.4/lustre/ptlrpc/../../lustre/ldlm/ldlm_pool.c:1086
1076 for (nr_ns = atomic_read(ldlm_namespace_nr(client));
1077 nr_ns > 0; nr_ns--)
1078 {
1079 mutex_down(ldlm_namespace_lock(client));
1080 if (list_empty(ldlm_namespace_list(client)))

{ 1081 mutex_up(ldlm_namespace_lock(client)); 1082 return 0; 1083 }

1084 ns = ldlm_namespace_first_locked(client);
1085 ldlm_namespace_get(ns);
1086 ldlm_namespace_move_locked(ns, client);
1087 mutex_up(ldlm_namespace_lock(client));
1088 total += ldlm_pool_shrink(&ns->ns_pool, 0, gfp_mask);
1089 ldlm_namespace_put(ns, 1);

Fix:

ldlm_namespace_free removes namespace from list and free memory without checking namespace's refcount
while ldlm_pools_shrink might get namespace from the list and start ldlm_pool_shrink() for it.



 Comments   
Comment by Zhenyu Xu [ 21/Aug/11 ]

b1_8 patch tracking at http://review.whamcloud.com/1273

Comment by Build Master (Inactive) [ 02/Sep/11 ]

Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #124
LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free

Johann Lombardi : ba79e90a7028e2637e64367535715c81729f4cb2
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 02/Sep/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #124
LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free

Johann Lombardi : ba79e90a7028e2637e64367535715c81729f4cb2
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 02/Sep/11 ]

Integrated in lustre-b1_8 » i686,client,el5,inkernel #124
LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free

Johann Lombardi : ba79e90a7028e2637e64367535715c81729f4cb2
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 02/Sep/11 ]

Integrated in lustre-b1_8 » i686,client,el5,ofa #124
LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free

Johann Lombardi : ba79e90a7028e2637e64367535715c81729f4cb2
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 02/Sep/11 ]

Integrated in lustre-b1_8 » x86_64,client,ubuntu1004,inkernel #124
LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free

Johann Lombardi : ba79e90a7028e2637e64367535715c81729f4cb2
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 02/Sep/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #124
LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free

Johann Lombardi : ba79e90a7028e2637e64367535715c81729f4cb2
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 02/Sep/11 ]

Integrated in lustre-b1_8 » i686,client,el6,inkernel #124
LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free

Johann Lombardi : ba79e90a7028e2637e64367535715c81729f4cb2
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 02/Sep/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,ofa #124
LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free

Johann Lombardi : ba79e90a7028e2637e64367535715c81729f4cb2
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 02/Sep/11 ]

Integrated in lustre-b1_8 » i686,server,el5,inkernel #124
LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free

Johann Lombardi : ba79e90a7028e2637e64367535715c81729f4cb2
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 02/Sep/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,ofa #124
LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free

Johann Lombardi : ba79e90a7028e2637e64367535715c81729f4cb2
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 02/Sep/11 ]

Integrated in lustre-b1_8 » i686,server,el5,ofa #124
LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free

Johann Lombardi : ba79e90a7028e2637e64367535715c81729f4cb2
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Zhenyu Xu [ 04/Sep/11 ]

master patch tracking at http://review.whamcloud.com/1334

Comment by Vladimir V. Saveliev [ 16/Sep/11 ]

It looks like the patch introduces a regression: https://bugzilla.lustre.org/show_bug.cgi?id=24540

the patch moved ldlm_namespace_unregister() from ldlm_namespace_free_post() to
ldlm_namespace_free_prior().
But it appears that in some cases ldlm_namespace_free_post() gets called alone without preceding
call to ldlm_namespace_free_prior() so that namespace does not get unregistered which hits LASSERT
in ldlm_namespace_free_post().

Comment by Zhenyu Xu [ 19/Sep/11 ]

thank you Vladimir, will keep checking out bz24419 for the update.

Comment by Zhenyu Xu [ 21/Sep/11 ]

b1_8 reverting patch tracking at http://review.whamcloud.com/1407

Comment by Build Master (Inactive) [ 23/Sep/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #127
Revert "LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free"

Johann Lombardi : 447794d5ebb71dbd39d7378944c3c9eeb230f8d0
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 23/Sep/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #127
Revert "LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free"

Johann Lombardi : 447794d5ebb71dbd39d7378944c3c9eeb230f8d0
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 23/Sep/11 ]

Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #127
Revert "LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free"

Johann Lombardi : 447794d5ebb71dbd39d7378944c3c9eeb230f8d0
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 23/Sep/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,ofa #127
Revert "LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free"

Johann Lombardi : 447794d5ebb71dbd39d7378944c3c9eeb230f8d0
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 23/Sep/11 ]

Integrated in lustre-b1_8 » x86_64,client,ubuntu1004,inkernel #127
Revert "LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free"

Johann Lombardi : 447794d5ebb71dbd39d7378944c3c9eeb230f8d0
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 23/Sep/11 ]

Integrated in lustre-b1_8 » i686,client,el6,inkernel #127
Revert "LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free"

Johann Lombardi : 447794d5ebb71dbd39d7378944c3c9eeb230f8d0
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 23/Sep/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,ofa #127
Revert "LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free"

Johann Lombardi : 447794d5ebb71dbd39d7378944c3c9eeb230f8d0
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 23/Sep/11 ]

Integrated in lustre-b1_8 » i686,client,el5,inkernel #127
Revert "LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free"

Johann Lombardi : 447794d5ebb71dbd39d7378944c3c9eeb230f8d0
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 23/Sep/11 ]

Integrated in lustre-b1_8 » i686,client,el5,ofa #127
Revert "LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free"

Johann Lombardi : 447794d5ebb71dbd39d7378944c3c9eeb230f8d0
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 23/Sep/11 ]

Integrated in lustre-b1_8 » i686,server,el5,inkernel #127
Revert "LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free"

Johann Lombardi : 447794d5ebb71dbd39d7378944c3c9eeb230f8d0
Files :

  • lustre/ldlm/ldlm_resource.c
Comment by Build Master (Inactive) [ 23/Sep/11 ]

Integrated in lustre-b1_8 » i686,server,el5,ofa #127
Revert "LU-607 Avoid race between ldlm_pools_shrink and ldlm_namespace_free"

Johann Lombardi : 447794d5ebb71dbd39d7378944c3c9eeb230f8d0
Files :

  • lustre/ldlm/ldlm_resource.c
Generated at Sat Feb 10 01:08:41 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.