Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
Lustre 2.0.0, Lustre 1.8.6
-
None
-
3
-
24,419
-
9726
Description
While running a regression test:
[2011-01-20 19:14:30][c0-0c0s5n0]Kernel panic - not syncing: oom_kill_process killing invalid app
rcad_svcs.
[2011-01-20 19:14:30][c0-0c0s5n0]Pid: 5529, comm: stressapptest Tainted: P
2.6.32.24-0.2.1_1.0000.5704-cray_gem_c #1
[2011-01-20 19:14:30][c0-0c0s5n0]Call Trace:
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff810072b9>] try_stack_unwind+0x149/0x190
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff81005d90>] dump_trace+0x90/0x2f0
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff81006eb7>] show_trace_log_lvl+0x57/0x70
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff81006ee0>] show_trace+0x10/0x20
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff8125c69c>] dump_stack+0x72/0x7b
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff8125c71a>] panic+0x75/0x13b
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff810968e6>] __oom_kill_task+0xa6/0x190
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff81096db5>] oom_kill_process+0x245/0x2e0
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff81097296>] __out_of_memory+0x176/0x1e0
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff810977d9>] out_of_memory+0x4d9/0x560
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff8109ab72>] __alloc_pages_nodemask+0x662/0x680
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff810c1ba0>] alloc_page_vma+0x70/0x100
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff810b069f>] handle_mm_fault+0xbff/0xd00
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff8101f857>] do_page_fault+0x147/0x2c0
[2011-01-20 19:14:30][c0-0c0s5n0] [<ffffffff8125f7af>] page_fault+0x1f/0x30
[2011-01-20 19:14:30][c0-0c0s5n0] [<000000000041ae8a>] 0x41ae8a
crash> gdb list *(ldlm_pools_shrink+0x93)
0xffffffffa026a853 is in ldlm_pools_shrink
(/usr/src/packages/BUILD/cray-lustre-1.8.4/lustre/ptlrpc/../../lustre/ldlm/ldlm_pool.c:1086
1076 for (nr_ns = atomic_read(ldlm_namespace_nr(client));
1077 nr_ns > 0; nr_ns--)
1078 {
1079 mutex_down(ldlm_namespace_lock(client));
1080 if (list_empty(ldlm_namespace_list(client)))
1084 ns = ldlm_namespace_first_locked(client);
1085 ldlm_namespace_get(ns);
1086 ldlm_namespace_move_locked(ns, client);
1087 mutex_up(ldlm_namespace_lock(client));
1088 total += ldlm_pool_shrink(&ns->ns_pool, 0, gfp_mask);
1089 ldlm_namespace_put(ns, 1);
Fix:
ldlm_namespace_free removes namespace from list and free memory without checking namespace's refcount
while ldlm_pools_shrink might get namespace from the list and start ldlm_pool_shrink() for it.
Attachments
Issue Links
- Trackbacks
-
Lustre 1.8.x known issues tracker While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA