[LU-4357] page allocation failure. mode:0x40 caused by missing __GFP_WAIT flag Created: 06/Dec/13  Updated: 17/Mar/14  Resolved: 26/Feb/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0, Lustre 2.6.0, Lustre 2.5.1, Lustre 2.4.3
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: MB, mn4, patch
Environment:

server: lustre-master build # 1791 RHEL6 ldiskf
client: 2.5.0


Issue Links:
Duplicate
duplicates LU-4622 Failure on test suite recovery-small ... Closed
is duplicated by LU-3851 Single-client scale test: ptlrpcd pag... Resolved
is duplicated by LU-4033 Failure on test suite parallel-scale-... Resolved
is duplicated by LU-3680 OOM crash: null_alloc_rs()) ASSERTION... Resolved
is duplicated by LU-3910 Interop 2.4.0<->2.5 failure on test s... Resolved
is duplicated by LU-4432 recovery-mds-scale test_failover_ost:... Resolved
is duplicated by LU-3598 Failure on test suite sanity-benchmar... Resolved
is duplicated by LU-3702 Failure on test suite parallel-scale ... Resolved
is duplicated by LU-4229 crash in NRS cleanup during mount fai... Resolved
Related
is related to LU-1436 Test failure on test suite sanity, su... Resolved
is related to LU-4053 client leaking objects/locks during IO Resolved
Severity: 3
Rank (Obsolete): 11936

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/5efd6668-5e3c-11e3-ae30-52540035b04c.

The sub-test test_iorssf failed with the following error:

ior failed! 1

Not sure if this is a dup of LU-3585 while LU-3585 has been fixed. client console

12:02:03:Lustre: DEBUG MARKER: == parallel-scale test iorssf: iorssf == 11:54:49 (1386273289)
12:02:03:ptlrpcd_0: page allocation failure. order:1, mode:0x40
12:02:03:IOR: page allocation failure. order:1, mode:0x40
12:02:03:Pid: 22725, comm: IOR Not tainted 2.6.32-358.18.1.el6.x86_64 #1
12:02:03:Call Trace:
12:02:03: [<ffffffff8112c257>] ? __alloc_pages_nodemask+0x757/0x8d0
12:02:03: [<ffffffff81166d92>] ? kmem_getpages+0x62/0x170
12:02:03: [<ffffffff811679aa>] ? fallback_alloc+0x1ba/0x270
12:02:03: [<ffffffff811673ff>] ? cache_grow+0x2cf/0x320
12:02:03: [<ffffffff81167729>] ? ____cache_alloc_node+0x99/0x160
12:02:03: [<ffffffffa0967c08>] ? ptlrpc_new_bulk+0x48/0x280 [ptlrpc]
12:02:03: [<ffffffff811684f9>] ? __kmalloc+0x189/0x220
12:02:03: [<ffffffffa0967c08>] ? ptlrpc_new_bulk+0x48/0x280 [ptlrpc]
12:02:03: [<ffffffffa0967e98>] ? ptlrpc_prep_bulk_imp+0x58/0x190 [ptlrpc]
12:02:03: [<ffffffffa03ca054>] ? osc_brw_prep_request+0x294/0x11e0 [osc]
12:02:03: [<ffffffffa03de4ac>] ? osc_req_attr_set+0x16c/0x5b0 [osc]
12:02:04: [<ffffffffa088dd21>] ? cl_req_attr_set+0xd1/0x230 [obdclass]
12:02:04: [<ffffffffa03d01d0>] ? osc_build_rpc+0x870/0x1850 [osc]
12:02:04: [<ffffffffa03ea1a7>] ? osc_io_unplug0+0x1257/0x1f00 [osc]
12:02:04: [<ffffffffa0881285>] ? cl_page_slice_add+0x55/0x140 [obdclass]
12:02:04: [<ffffffffa03ecbd1>] ? osc_io_unplug+0x11/0x20 [osc]
12:02:04: [<ffffffffa03ecdb0>] ? osc_queue_sync_pages+0x1d0/0x360 [osc]
12:02:04: [<ffffffffa03def80>] ? osc_io_submit+0x340/0x4b0 [osc]
12:02:04: [<ffffffffa088d68c>] ? cl_io_submit_rw+0x6c/0x160 [obdclass]
12:02:04: [<ffffffffa04ba3c2>] ? lov_io_submit+0x3a2/0xbb0 [lov]
12:02:04: [<ffffffffa088d68c>] ? cl_io_submit_rw+0x6c/0x160 [obdclass]
12:02:04: [<ffffffffa088fc9e>] ? cl_io_read_page+0xae/0x170 [obdclass]
12:02:04: [<ffffffffa0883a77>] ? cl_page_assume+0xf7/0x220 [obdclass]
12:02:05: [<ffffffffa0768176>] ? ll_readpage+0x96/0x1a0 [lustre]
12:02:05: [<ffffffff81096de0>] ? wake_bit_function+0x0/0x50
12:02:05: [<ffffffff8111b7ec>] ? generic_file_aio_read+0x1fc/0x700
12:02:05: [<ffffffffa0797797>] ? vvp_io_read_start+0x257/0x470 [lustre]
12:02:05: [<ffffffffa088d7ea>] ? cl_io_start+0x6a/0x140 [obdclass]
12:02:05: [<ffffffffa0891ef4>] ? cl_io_loop+0xb4/0x1b0 [obdclass]
12:02:05: [<ffffffffa0739f9f>] ? ll_file_io_generic+0x33f/0x610 [lustre]
12:02:05: [<ffffffffa073a3af>] ? ll_file_aio_read+0x13f/0x2c0 [lustre]
12:02:05: [<ffffffffa073ac4c>] ? ll_file_read+0x16c/0x2a0 [lustre]
12:02:05: [<ffffffff81181a95>] ? vfs_read+0xb5/0x1a0
12:02:06: [<ffffffff81181bd1>] ? sys_read+0x51/0x90
12:02:06: [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290
12:02:06: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
12:02:06:Mem-Info:
12:02:06:Node 0 DMA per-cpu:
12:02:06:CPU    0: hi:    0, btch:   1 usd:   0
12:02:06:CPU    1: hi:    0, btch:   1 usd:   0
12:02:07:Node 0 DMA32 per-cpu:
12:02:07:CPU    0: hi:  186, btch:  31 usd:   0
12:02:07:CPU    1: hi:  186, btch:  31 usd:   4
12:02:07:active_anon:3193 inactive_anon:6004 isolated_anon:0
12:02:07: active_file:8566 inactive_file:234838 isolated_file:0
12:02:07: unevictable:0 dirty:1 writeback:0 unstable:0
12:02:07: free:13356 slab_reclaimable:3430 slab_unreclaimable:197269
12:02:07: mapped:4138 shmem:39 pagetables:1255 bounce:0
12:02:07:Node 0 DMA free:8236kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:5824kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15324kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:52kB slab_unreclaimable:1620kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
12:02:07:lowmem_reserve[]: 0 2003 2003 2003
12:02:07:Node 0 DMA32 free:45188kB min:44720kB low:55900kB high:67080kB active_anon:12772kB inactive_anon:24016kB active_file:34264kB inactive_file:933528kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052064kB mlocked:0kB dirty:4kB writeback:0kB mapped:16552kB shmem:156kB slab_reclaimable:13668kB slab_unreclaimable:787456kB kernel_stack:1904kB pagetables:5020kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
12:02:07:lowmem_reserve[]: 0 0 0 0
12:02:07:Node 0 DMA: 3*4kB 2*8kB 3*16kB 3*32kB 4*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8236kB
12:02:08:Node 0 DMA32: 9673*4kB 6*8kB 23*16kB 2*32kB 2*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 45188kB
12:02:08:243500 total pagecache pages
12:02:08:0 pages in swap cache
12:02:08:Swap cache stats: add 0, delete 0, find 0/0
12:02:08:Free swap  = 4128760kB
12:02:08:Total swap = 4128760kB
12:02:08:Pid: 11613, comm: ptlrpcd_0 Not tainted 2.6.32-358.18.1.el6.x86_64 #1
12:02:08:Call Trace:
12:02:08: [<ffffffff8112c257>] ? __alloc_pages_nodemask+0x757/0x8d0
12:02:08: [<ffffffffa0707a78>] ? ksocknal_queue_tx_msg_v2+0x88/0xe0 [ksocklnd]
12:02:08: [<ffffffff81166d92>] ? kmem_getpages+0x62/0x170
12:02:08: [<ffffffff811679aa>] ? fallback_alloc+0x1ba/0x270
12:02:09: [<ffffffff811673ff>] ? cache_grow+0x2cf/0x320
12:02:09: [<ffffffff81167729>] ? ____cache_alloc_node+0x99/0x160
12:02:09: [<ffffffffa069aed7>] ? LNetMDAttach+0x157/0x5a0 [lnet]
12:02:09: [<ffffffff811684f9>] ? __kmalloc+0x189/0x220
12:02:09: [<ffffffffa069aed7>] ? LNetMDAttach+0x157/0x5a0 [lnet]
12:02:09: [<ffffffffa096fac5>] ? ptlrpc_register_bulk+0x265/0x9d0 [ptlrpc]
12:02:10: [<ffffffffa0970f22>] ? ptl_send_rpc+0x232/0xc40 [ptlrpc]
12:02:10: [<ffffffff81281734>] ? snprintf+0x34/0x40
12:02:10: [<ffffffffa05c77b1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
12:02:10: [<ffffffffa0966654>] ? ptlrpc_send_new_req+0x454/0x790 [ptlrpc]
12:02:10: [<ffffffffa096a3c8>] ? ptlrpc_check_set+0x888/0x1b40 [ptlrpc]
12:02:10: [<ffffffffa099520b>] ? ptlrpcd_check+0x53b/0x560 [ptlrpc]
12:02:10: [<ffffffffa099572b>] ? ptlrpcd+0x20b/0x370 [ptlrpc]
12:02:10: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
12:02:10: [<ffffffffa0995520>] ? ptlrpcd+0x0/0x370 [ptlrpc]
12:02:10: [<ffffffff81096a36>] ? kthread+0x96/0xa0
12:02:10: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
12:02:10: [<ffffffff810969a0>] ? kthread+0x0/0xa0
12:02:10: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
12:02:11:Mem-Info:
12:02:11:Node 0 DMA per-cpu:
12:02:11:CPU    0: hi:    0, btch:   1 usd:   0
12:02:11:CPU    1: hi:    0, btch:   1 usd:   0
12:02:12:Node 0 DMA32 per-cpu:
12:02:12:CPU    0: hi:  186, btch:  31 usd:   0
12:02:12:CPU    1: hi:  186, btch:  31 usd:   4
12:02:12:active_anon:3193 inactive_anon:6004 isolated_anon:0
12:02:12: active_file:8566 inactive_file:234838 isolated_file:0
12:02:13: unevictable:0 dirty:1 writeback:0 unstable:0
12:02:13: free:13356 slab_reclaimable:3430 slab_unreclaimable:197269
12:02:13: mapped:4138 shmem:39 pagetables:1255 bounce:0
12:02:13:Node 0 DMA free:8236kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:5824kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15324kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:52kB slab_unreclaimable:1620kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
12:02:13:lowmem_reserve[]: 0 2003 2003 2003
12:02:13:Node 0 DMA32 free:45188kB min:44720kB low:55900kB high:67080kB active_anon:12772kB inactive_anon:24016kB active_file:34264kB inactive_file:933528kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052064kB mlocked:0kB dirty:4kB writeback:0kB mapped:16552kB shmem:156kB slab_reclaimable:13668kB slab_unreclaimable:787456kB kernel_stack:1904kB pagetables:5020kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
12:02:13:lowmem_reserve[]: 0 0 0 0
12:02:13:Node 0 DMA: 3*4kB 2*8kB 3*16kB 3*32kB 4*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8236kB
12:02:14:Node 0 DMA32: 9673*4kB 6*8kB 23*16kB 2*32kB 2*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 45188kB
12:02:14:243500 total pagecache pages
12:02:14:0 pages in swap cache
12:02:14:Swap cache stats: add 0, delete 0, find 0/0
12:02:14:Free swap  = 4128760kB
12:02:15:Total swap = 4128760kB
12:02:15:524284 pages RAM
12:02:15:43709 pages reserved
12:02:15:259403 pages shared
12:02:16:221861 pages non-shared
12:02:16:LustreError: 22725:0:(osc_request.c:2161:osc_build_rpc()) prep_req failed: -12
12:02:16:LustreError: 22725:0:(osc_cache.c:2142:osc_check_rpcs()) Read request failed with -12
12:02:16:IOR: page allocation failure. order:1, mode:0x40
12:02:16:Pid: 22725, comm: IOR Not tainted 2.6.32-358.18.1.el6.x86_64 #1
12:02:16:Call Trace:
12:02:16: [<ffffffff8112c257>] ? __alloc_pages_nodemask+0x757/0x8d0
12:02:17: [<ffffffff81282796>] ? __const_udelay+0x46/0x50
12:02:17: [<ffffffff81166d92>] ? kmem_getpages+0x62/0x170
12:02:17: [<ffffffff811679aa>] ? fallback_alloc+0x1ba/0x270
12:02:17:524284 pages RAM
12:02:17:43709 pages reserved
12:02:17:259403 pages shared
12:02:17:221861 pages non-shared
12:02:18:LNetError: 11613:0:(lib-lnet.h:457:lnet_md_alloc()) LNET: out of memory at /var/lib/jenkins/workspace/lustre-b2_5/arch/x86_64/build_type/client/distro/el6/ib_stack/inkernel/BUILD/BUILD/lustre-2.5.0/lnet/include/lnet/lib-lnet.h:457 (tried to alloc '(md)' = 4208)


 Comments   
Comment by Oleg Drokin [ 09/Dec/13 ]

So if the servers would become writing faster (or cause clients to do it faster), then we'd run out of memory sooner due to uncommitted pages.

I wonder if this is the issue here.

Comment by Phil Schwan (Inactive) [ 29/Dec/13 ]

We're experiencing this regularly – at least 5 times, I believe – with our new 2.5.0 installation, both during reads and writes (the latter of which of course cause corruption, from the application's perspective, as the writes are asynchronous)

Lustre: Lustre: Build Version: 2.5.0-RC1--PRISTINE-2.6.32-279.14.1.el6.x86_64
Linux hnod0032 2.6.32-279.19.1.el6.x86_64 #1 SMP Wed Dec 19 07:05:20 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

The similar-looking LR-3851 requests /proc/slabinfo and /proc/meminfo, although by the time we discover this happening it's usually far too late.

Comment by Jian Yu [ 05/Jan/14 ]

Lustre client build: http://build.whamcloud.com/job/lustre-b2_5/5/
Lustre server build: http://build.whamcloud.com/job/lustre-b2_4/70/ (2.4.2)

While vetting the test report of https://maloo.whamcloud.com/test_sets/516dbb40-74fa-11e3-95ae-52540035b04c , I found the following error messages in client syslog:

Jan  3 05:42:11 wtm-24vm5 kernel: Lustre: DEBUG MARKER: == parallel-scale test iorssf: iorssf == 05:42:11 (1388756531)
Jan  3 05:42:11 wtm-24vm5 xinetd[1818]: EXIT: mshell status=0 pid=28818 duration=0(sec)
Jan  3 05:42:11 wtm-24vm5 xinetd[1818]: START: shell pid=28842 from=::ffff:10.10.16.245
Jan  3 05:42:11 wtm-24vm5 xinetd[1818]: EXIT: shell status=0 pid=28842 duration=0(sec)
Jan  3 05:42:15 wtm-24vm5 /usr/sbin/gmond[1834]: Error 1 sending the modular data for heartbeat#012
Jan  3 05:42:37 wtm-24vm5 kernel: ptlrpcd_1: page allocation failure. order:1, mode:0x40
Jan  3 05:42:37 wtm-24vm5 kernel: Pid: 2325, comm: ptlrpcd_1 Not tainted 2.6.32-358.18.1.el6.x86_64 #1
Jan  3 05:42:37 wtm-24vm5 kernel: Call Trace:
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffff8112c257>] ? __alloc_pages_nodemask+0x757/0x8d0
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffff81051439>] ? __wake_up_common+0x59/0x90
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffff81166d92>] ? kmem_getpages+0x62/0x170
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffff811679aa>] ? fallback_alloc+0x1ba/0x270
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffff811673ff>] ? cache_grow+0x2cf/0x320
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffff81167729>] ? ____cache_alloc_node+0x99/0x160
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffffa0497ed7>] ? LNetMDAttach+0x157/0x5a0 [lnet]
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffff811684f9>] ? __kmalloc+0x189/0x220
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffffa0497ed7>] ? LNetMDAttach+0x157/0x5a0 [lnet]
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffffa068dd85>] ? ptlrpc_register_bulk+0x265/0x9d0 [ptlrpc]
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffffa068f1e2>] ? ptl_send_rpc+0x232/0xc40 [ptlrpc]
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffff81281734>] ? snprintf+0x34/0x40
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffffa03e77b1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffffa0684894>] ? ptlrpc_send_new_req+0x454/0x790 [ptlrpc]
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffffa0688608>] ? ptlrpc_check_set+0x888/0x1b40 [ptlrpc]
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffffa06b354b>] ? ptlrpcd_check+0x53b/0x560 [ptlrpc]
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffffa06b39e5>] ? ptlrpcd+0x185/0x370 [ptlrpc]
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffffa06b3860>] ? ptlrpcd+0x0/0x370 [ptlrpc]
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffff81096a36>] ? kthread+0x96/0xa0
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffff810969a0>] ? kthread+0x0/0xa0
Jan  3 05:42:37 wtm-24vm5 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Comment by Jian Yu [ 12/Jan/14 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/9/
Distro/Arch: RHEL6.4/x86_64
FSTYPE=ldiskfs

parallel-scale test iorssf hit the same issue:
https://maloo.whamcloud.com/test_sets/7267ac0e-7a02-11e3-86c8-52540035b04c

Comment by Jian Yu [ 17/Jan/14 ]

Lustre client build: http://build.whamcloud.com/job/lustre-b2_5/13/
Lustre server build: http://build.whamcloud.com/job/lustre-b2_4/70/ (2.4.2)

The same issue occurred:
https://maloo.whamcloud.com/test_sets/90f513be-7ed5-11e3-8a9b-52540035b04c

Comment by Jian Yu [ 28/Jan/14 ]

More instance on Lustre b2_5 branch:
https://maloo.whamcloud.com/test_sets/ca3e6bd0-8799-11e3-8928-52540035b04c

Comment by Cory Spitz [ 11/Feb/14 ]

2.5.0 has broken many memory allocators due to the cfs wrapper conversion that dropped __GFP_WAIT. Is there another LU ticket tracking that problem? Ann K. @Cray has authored a patch that we can upload.

Comment by Ann Koehler (Inactive) [ 11/Feb/14 ]

Basic Problem: When the cfs wrappers around the memory allocation functions were dropped, the CFS_ALLOC_xxxx flags were replaced directly by matching __GFP_xxxx kernel flag. Thus CFS_ALLOC_IO was mapped directly to __GFP_IO. But if you look at the 2.4 code, you'll see that cfs_alloc_flags_to_gfp() - the function that used to map from CFS_ALLOC to __GFP - does a bit more than that. It adds __GFP_WAIT to all flags except CFS_ALLOC_ATOMIC.

static unsigned int cfs_alloc_flags_to_gfp(u_int32_t flags)
{
        unsigned int mflags = 0;

        if (flags & CFS_ALLOC_ATOMIC)
                mflags |= __GFP_HIGH;
        else
                mflags |= __GFP_WAIT;
        if (flags & CFS_ALLOC_NOWARN)
                mflags |= __GFP_NOWARN;
        if (flags & CFS_ALLOC_IO)
                mflags |= __GFP_IO;
        if (flags & CFS_ALLOC_FS)
                mflags |= __GFP_FS;
        if (flags & CFS_ALLOC_HIGHMEM)
                mflags |= __GFP_HIGHMEM;
        return mflags;
}

The missing __GFP_WAIT flag causes memory allocations to fail in 2.5 when they seldom failed in earlier releases. One such failure is the one described in this ticket. Other ones we've seen are LBUGs/GPFs because a cl_env could not be allocated.

  • LustreError: 11138:0:(osc_lock.c:599:osc_lock_upcall()) LBUG
  • LustreError: 25614:0:(osc_lock.c:781:osc_ldlm_blocking_ast()) LBUG
  • LustreError: 8855:0:(lov_page.c:100:lov_page_own()) LBUG
  • BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
    IP: [<ffffffffa0549a07>] lu_context_key_get+0x17/0x60 [obdclass]

The fix is straightforward: add the __GFP_WAIT flag to the flag set passed to the kernel allocation functions for all cases except GFP_ATOMIC. In particular, I replaced __GFP_IO with GFP_NOFS. GFP_NOFS is defined as __GFP_IO | __GFP_WAIT.

Comment by Ann Koehler (Inactive) [ 11/Feb/14 ]

Patch submitted: http://review.whamcloud.com/#/c/9223/

Comment by Andreas Dilger [ 13/Feb/14 ]

This was introduced in http://review.whamcloud.com/2831 commit 1b2547843817b4b7adbeb87ea9b070d9cac35c90.

Tao, could you please make a version of this patch for the upstream kernel.

Comment by Peter Jones [ 18/Feb/14 ]

Ann,

Do you expect to be able to refresh the patch in the near future? If you have other commitments, would you mind if one of our engineers do so?

Thanks

Peter

Comment by Ann Koehler (Inactive) [ 18/Feb/14 ]

Peter,

I am pretty busy with other bugs at the moment. I have no objections at all to one of your engineers updating the patch. On the contrary, I would really appreciate it.

Thanks,
Ann

Comment by Peter Jones [ 18/Feb/14 ]

ok. Thanks for the quick feedback Ann.

Emoly,

Could you please revise Ann's patch to reflect the review feedback from Andreas?

Thanks

Peter

Comment by Bob Glossman (Inactive) [ 25/Feb/14 ]

backport to b2_5:
http://review.whamcloud.com/9382

Comment by Peter Jones [ 26/Feb/14 ]

Landed for 2.5.1 and 2.6

Generated at Sat Feb 10 01:42:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.