[LU-7843] Is ptlrpc_obd_ping Returning (-ENOMEM) W/O issueing any errors Created: 03/Mar/16  Updated: 10/Mar/16  Resolved: 10/Mar/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.3
Fix Version/s: None

Type: Question/Request Priority: Minor
Reporter: Mahmoud Hanafi Assignee: John Fuchs-Chesney (Inactive)
Resolution: Done Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   

Shouldn't ptlrpc_obd_ping() issue a CERROR() before returning ENOMEM?

int ptlrpc_obd_ping(struct obd_device *obd)
{
        int rc;
        struct ptlrpc_request *req;
        ENTRY;

        req = ptlrpc_prep_ping(obd->u.cli.cl_import);
        if (req == NULL)
                RETURN(-ENOMEM); <========= HERE

        req->rq_send_state = LUSTRE_IMP_FULL;

        rc = ptlrpc_queue_wait(req);

        ptlrpc_req_finished(req);

        RETURN(rc);
}


 Comments   
Comment by Andreas Dilger [ 03/Mar/16 ]

Mahmoud, is there any particular reason you think that there should be an error printed to the console here? We could not possibly print a console message for every failure condition. In the case of ENOMEM, there typically isn't anything we can do about this anyway. The kernel itself will usually print an error in this case (unless __GFP_NOWARN is used), and for small allocations (4096 bytes or less) it isn't typically even possible for the allocation to fai unless there is fault injectionl.

Comment by Mahmoud Hanafi [ 08/Mar/16 ]

If the kernel is printing an error then that should be sufficient. We are tracking an issue where clients are getting evicted for from the OST, because the OST hasn't heard from the client.

So far we haven't found any network layer issues. We have compute nodes that run very close to 100% memory utilization, I was thinking that a failed malloc here could cause drop pings on the client.

Comment by Oleg Drokin [ 08/Mar/16 ]

The requests are allocated with GFP_NOFS flag, so no nOWARN.

as the result you'll see something like this in the dmesg (different message, but similar idea):

[4399061.470561] iscsi_trx: page allocation failure: order:4, mode:0x10c0d0
[4399061.470566] CPU: 0 PID: 3330006 Comm: iscsi_trx Not tainted 3.10.0-327.4.4.el7.x86_64 #1
[4399061.470568] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3403 01/22/2013
[4399061.470570]  000000000010c0d0 000000005ef58833 ffff88047a4dbb38 ffffffff8163515c
[4399061.470573]  ffff88047a4dbbc8 ffffffff8116ef70 0000000000000000 ffff88081fdb8000
[4399061.470576]  0000000000000004 000000000010c0d0 ffff88047a4dbbc8 000000005ef58833
[4399061.470578] Call Trace:
[4399061.470585]  [<ffffffff8163515c>] dump_stack+0x19/0x1b
[4399061.470588]  [<ffffffff8116ef70>] warn_alloc_failed+0x110/0x180
[4399061.470591]  [<ffffffff811736f8>] __alloc_pages_nodemask+0x9a8/0xb90
[4399061.470595]  [<ffffffff811b43d9>] alloc_pages_current+0xa9/0x170
[4399061.470597]  [<ffffffff8116deee>] __get_free_pages+0xe/0x50
[4399061.470599]  [<ffffffff811bf65e>] kmalloc_order_trace+0x2e/0xa0
[4399061.470601]  [<ffffffff811c1ef9>] __kmalloc+0x219/0x230
[4399061.470614]  [<ffffffffa0896f99>] iscsi_target_rx_thread+0x4c9/0xf80 [iscsi_target_mod]
[4399061.470617]  [<ffffffff81013588>] ? __switch_to+0xf8/0x4b0
[4399061.470620]  [<ffffffff8163a228>] ? __schedule+0x2d8/0x900
[4399061.470627]  [<ffffffffa0896ad0>] ? iscsi_target_tx_thread+0x200/0x200 [iscsi_target_mod]
[4399061.470630]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
[4399061.470633]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[4399061.470636]  [<ffffffff81645818>] ret_from_fork+0x58/0x90
[4399061.470639]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[4399061.470640] Mem-Info:
[4399061.470641] Node 0 DMA per-cpu:
[4399061.470643] CPU    0: hi:    0, btch:   1 usd:   0
[4399061.470644] CPU    1: hi:    0, btch:   1 usd:   0
[4399061.470645] CPU    2: hi:    0, btch:   1 usd:   0
[4399061.470647] CPU    3: hi:    0, btch:   1 usd:   0
[4399061.470648] CPU    4: hi:    0, btch:   1 usd:   0
[4399061.470649] CPU    5: hi:    0, btch:   1 usd:   0
[4399061.470650] CPU    6: hi:    0, btch:   1 usd:   0
[4399061.470651] CPU    7: hi:    0, btch:   1 usd:   0
[4399061.470652] Node 0 DMA32 per-cpu:
[4399061.470654] CPU    0: hi:  186, btch:  31 usd:   0
[4399061.470655] CPU    1: hi:  186, btch:  31 usd:  19
[4399061.470656] CPU    2: hi:  186, btch:  31 usd:   0
[4399061.470657] CPU    3: hi:  186, btch:  31 usd:   0
[4399061.470658] CPU    4: hi:  186, btch:  31 usd:   0
[4399061.470659] CPU    5: hi:  186, btch:  31 usd:   0
[4399061.470660] CPU    6: hi:  186, btch:  31 usd:   0
[4399061.470661] CPU    7: hi:  186, btch:  31 usd:   0
[4399061.470662] Node 0 Normal per-cpu:
[4399061.470664] CPU    0: hi:  186, btch:  31 usd:   0
[4399061.470664] CPU    1: hi:  186, btch:  31 usd: 151
[4399061.470666] CPU    2: hi:  186, btch:  31 usd:   8
[4399061.470667] CPU    3: hi:  186, btch:  31 usd:   0
[4399061.470668] CPU    4: hi:  186, btch:  31 usd:   0
[4399061.470669] CPU    5: hi:  186, btch:  31 usd:   0
[4399061.470670] CPU    6: hi:  186, btch:  31 usd:   3
[4399061.470671] CPU    7: hi:  186, btch:  31 usd:  30
[4399061.470675] active_anon:4045992 inactive_anon:1353481 isolated_anon:0
 active_file:680481 inactive_file:1593972 isolated_file:0
 unevictable:0 dirty:11 writeback:0 unstable:0
 free:69327 slab_reclaimable:276364 slab_unreclaimable:24916
 mapped:15615 shmem:34897 pagetables:13297 bounce:0
 free_cma:0
[4399061.470678] Node 0 DMA free:15900kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[4399061.470682] lowmem_reserve[]: 0 3234 31902 31902
[4399061.470685] Node 0 DMA32 free:128560kB min:6848kB low:8560kB high:10272kB active_anon:912176kB inactive_anon:1208272kB active_file:260392kB inactive_file:349184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3559324kB managed:3313932kB mlocked:0kB dirty:4kB writeback:0kB mapped:6396kB shmem:11972kB slab_reclaimable:420312kB slab_unreclaimable:10420kB kernel_stack:688kB pagetables:3604kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:24 all_unreclaimable? no
[4399061.470689] lowmem_reserve[]: 0 0 28668 28668
[4399061.470691] Node 0 Normal free:132848kB min:60700kB low:75872kB high:91048kB active_anon:15271792kB inactive_anon:4205652kB active_file:2461532kB inactive_file:6026704kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:29882368kB managed:29356040kB mlocked:0kB dirty:40kB writeback:0kB mapped:56064kB shmem:127616kB slab_reclaimable:685144kB slab_unreclaimable:89244kB kernel_stack:6656kB pagetables:49584kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[4399061.470696] lowmem_reserve[]: 0 0 0 0
[4399061.470698] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15900kB
[4399061.470707] Node 0 DMA32: 18886*4kB (UEM) 6591*8kB (UEM) 37*16kB (UEM) 6*32kB (E) 1*64kB (E) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 129120kB
[4399061.470714] Node 0 Normal: 9452*4kB (UEM) 10154*8kB (UEM) 441*16kB (UEM) 207*32kB (UEM) 18*64kB (EM) 1*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 134000kB
[4399061.470723] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[4399061.470724] 2342059 total pagecache pages
[4399061.470725] 32814 pages in swap cache
[4399061.470726] Swap cache stats: add 1238092, delete 1205278, find 643889/711117
[4399061.470727] Free swap  = 7800948kB
[4399061.470728] Total swap = 8388604kB
[4399061.470729] 8364419 pages RAM
[4399061.470730] 0 pages HighMem/MovableOnly
[4399061.470731] 192951 pages reserved
Comment by John Fuchs-Chesney (Inactive) [ 10/Mar/16 ]

Hello Mahmoud,

Can we check to see if you have what you need from this ticket?

Many thanks,
~ jfc.

Comment by Mahmoud Hanafi [ 10/Mar/16 ]

Yes please close this case.

Comment by John Fuchs-Chesney (Inactive) [ 10/Mar/16 ]

Thanks Mahmoud.
~ jfc.

Generated at Sat Feb 10 02:12:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.