[LU-8497] GPF after page allocation failure in osp_send_update_thread() Created: 11/Aug/16  Updated: 06/Dec/17  Resolved: 01/Dec/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.11.0, Lustre 2.10.2

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: John Hammond
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This is seen running racer with MDSCOUNT=4 this morning (v2_8_56_0-24-g272c136):

[ 1181.303743] Lustre: DEBUG MARKER: == racer test 1: racer on clients: t DURATION=300 == 
09:28:27 (1470925707)
[ 1182.650010] LustreError: 27979:0:(llog_cat.c:712:llog_cat_cancel_records()) lustre-MDT0
000-osp-MDT0002: fail to cancel 1 of 1 llog-records: rc = -116
[ 1182.662231] LustreError: 27979:0:(llog_cat.c:712:llog_cat_cancel_records()) Skipped 2 p
revious similar messages
[ 1183.479062] mdt_out01_000: page allocation failure. order:4, mode:0x50
[ 1183.484771] Pid: 27608, comm: mdt_out01_000 Not tainted 2.6.32-431.29.2.el6.lustre.x86_
64 #1
[ 1183.492551] Call Trace:
[ 1183.494672]  [<ffffffff81142941>] ? __alloc_pages_nodemask+0x791/0x950
[ 1183.500338]  [<ffffffff81183e8e>] ? kmem_getpages+0x6e/0x170
[ 1183.505207]  [<ffffffff81186819>] ? fallback_alloc+0x1c9/0x2b0
[ 1183.510370]  [<ffffffff81186077>] ? cache_grow+0x4c7/0x530
[ 1183.515511]  [<ffffffff811864fb>] ? ____cache_alloc_node+0xab/0x200
[ 1183.521154]  [<ffffffff81186c53>] ? __kmalloc+0x273/0x310
[ 1183.526091]  [<ffffffffa1666f84>] ? out_handle+0xa84/0x1870 [ptlrpc]
[ 1183.531889]  [<ffffffffa1666f84>] ? out_handle+0xa84/0x1870 [ptlrpc]
[ 1183.537747]  [<ffffffff81554a7a>] ? __mutex_lock_common+0x2ca/0x400
[ 1183.543492]  [<ffffffffa1657650>] ? req_can_reconstruct+0x50/0x130 [ptlrpc]
[ 1183.550049]  [<ffffffffa1657650>] ? req_can_reconstruct+0x50/0x130 [ptlrpc]
[ 1183.556371]  [<ffffffffa15fa5bc>] ? lustre_msg_get_opc+0x8c/0x100 [ptlrpc]
[ 1183.562476]  [<ffffffff815545ee>] ? mutex_unlock+0xe/0x10
[ 1183.567858]  [<ffffffffa165766e>] ? req_can_reconstruct+0x6e/0x130 [ptlrpc]
[ 1183.574021]  [<ffffffffa165e8ff>] ? tgt_request_handle+0x90f/0x1470 [ptlrpc]
[ 1183.580210]  [<ffffffffa160c752>] ? ptlrpc_main+0xd02/0x1800 [ptlrpc]
[ 1183.586378]  [<ffffffffa160ba50>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
[ 1183.592153]  [<ffffffff8109e856>] ? kthread+0x96/0xa0
[ 1183.597098]  [<ffffffff8100c30a>] ? child_rip+0xa/0x20
[ 1183.602225]  [<ffffffff815562e0>] ? _spin_unlock_irq+0x30/0x40
[ 1183.607772]  [<ffffffff8100bb10>] ? restore_args+0x0/0x30
[ 1183.612791]  [<ffffffff8109e7c0>] ? kthread+0x0/0xa0
[ 1183.617476]  [<ffffffff8100c300>] ? child_rip+0x0/0x20
[ 1183.622146] Mem-Info:
[ 1183.624184] Node 0 DMA per-cpu:
[ 1183.626999] CPU    0: hi:    0, btch:   1 usd:   0
[ 1183.631320] CPU    1: hi:    0, btch:   1 usd:   0
[ 1183.635664] CPU    2: hi:    0, btch:   1 usd:   0
[ 1183.639960] CPU    3: hi:    0, btch:   1 usd:   0
[ 1183.644777] Node 0 DMA32 per-cpu:
[ 1183.647943] CPU    0: hi:  186, btch:  31 usd:  29
[ 1183.652328] CPU    1: hi:  186, btch:  31 usd:  47
[ 1183.656776] CPU    2: hi:  186, btch:  31 usd:  46
[ 1183.661341] CPU    3: hi:  186, btch:  31 usd:   0
[ 1183.665760] active_anon:52142 inactive_anon:6 isolated_anon:0
[ 1183.665761]  active_file:56776 inactive_file:58980 isolated_file:0
[ 1183.665762]  unevictable:0 dirty:248 writeback:0 unstable:0
[ 1183.665763]  free:63417 slab_reclaimable:18898 slab_unreclaimable:98982
[ 1183.665764]  mapped:2060 shmem:97 pagetables:20216 bounce:0
[ 1183.693292] Node 0 DMA free:8340kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15364kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:7204kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 1183.727158] lowmem_reserve[]: 0 2004 2004 2004
[ 1183.731294] Node 0 DMA32 free:288564kB min:44720kB low:55900kB high:67080kB active_anon:208856kB inactive_anon:24kB active_file:208264kB inactive_file:208424kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052312kB mlocked:0kB dirty:1088kB writeback:0kB mapped:8164kB shmem:388kB slab_reclaimable:77228kB slab_unreclaimable:389060kB kernel_stack:13232kB pagetables:81316kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 1183.768744] lowmem_reserve[]: 0 0 0 0
[ 1183.772294] Node 0 DMA: 1*4kB 0*8kB 1*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 1*4096kB = 8340kB
[ 1183.783726] Node 0 DMA32: 30239*4kB 17514*8kB 3395*16kB 863*32kB 215*64kB 101*128kB 28*256kB 1*512kB 4*1024kB 2*2048kB 3*4096kB = 397852kB
[ 1183.798187] 70451 total pagecache pages
[ 1183.802374] 0 pages in swap cache
[ 1183.806010] Swap cache stats: add 0, delete 0, find 0/0
[ 1183.811691] Free swap  = 0kB
[ 1183.814789] Total swap = 0kB
[ 1183.824529] 524285 pages RAM
[ 1183.827700] 49009 pages reserved
[ 1183.831149] 291298 pages shared
[ 1183.834550] 215587 pages non-shared
[ 1183.838902] LustreError: 11-0: lustre-MDT0001-osp-MDT0002: operation out_update to node[ 1183.814789] Total swap = 0kB
[ 1183.824529] 524285 pages RAM
[ 1183.827700] 49009 pages reserved
[ 1183.831149] 291298 pages shared
[ 1183.834550] 215587 pages non-shared
[ 1183.838902] LustreError: 11-0: lustre-MDT0001-osp-MDT0002: operation out_update to node
 0@lo failed: rc = -12
[ 1183.849680] LustreError: 27973:0:(layout.c:2074:__req_capsule_get()) @@@ Wrong buffer for field `object_update_reply' (1 of 1) in format `OUT_UPDATE': 0 vs. 4096 (server)
[ 1183.849682]   req@ffff88006df62088 x1542376881533200/t0(0) o1000->lustre-MDT0001-osp-MDT0002@0@lo:24/4 lens 424/192 e 0 to 0 dl 1470925716 ref 2 fl Interpret:ReM/0/0 rc -12/-12
[ 1183.883591] general protection fault: 0000 [#1] SMP 
[ 1183.884386] last sysfs file: /sys/devices/system/cpu/possible
[ 1183.884386] CPU 3 
[ 1183.884386] Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U) osd_ldiskfs(U) ldiskfs(U) lquota(U) lfsck(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc_gss(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) libcfs(U) exportfs sha512_generic sha256_generic autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 zfs(U) zcommon(U) znvpair(U) spl(U) zlib_deflate zavl(U) zunicode(U) microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
[ 1183.884386] 
[ 1183.884386] Pid: 27973, comm: osp_up1-2 Not tainted 2.6.32-431.29.2.el6.lustre.x86_64 #1 Bochs Bochs
[ 1183.884386] RIP: 0010:[<ffffffffa0e4793f>]  [<ffffffffa0e4793f>] osp_send_update_thread+0x1bf/0x584 [osp]
[ 1183.884386] RSP: 0000:ffff880020b39dd0  EFLAGS: 00010282
[ 1183.884386] RAX: ffff880065e6c3e0 RBX: ffff880020a18208 RCX: 0000000000000000
[ 1183.884386] RDX: 000000000000000d RSI: 6b6b6b6b6b6b6b6b RDI: 0000000000000282
[ 1183.884386] RBP: ffff880020b39eb0 R08: 0000000000000000 R09: ffff8800729b4508
[ 1183.884386] R10: 09f911029d74e35b R11: 0000000000000000 R12: ffff8800197e7ed8
[ 1183.884386] R13: ffff8800197e7ee8 R14: ffff880020b39e10 R15: ffff880020b39e78
[ 1183.884386] FS:  0000000000000000(0000) GS:ffff88000d800000(0000) knlGS:0000000000000000
[ 1183.884386] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 1183.884386] CR2: 0000003d046acc6d CR3: 0000000001a85000 CR4: 00000000000006e0
[ 1183.884386] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1183.884386] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1183.884386] Process osp_up1-2 (pid: 27973, threadinfo ffff880020b38000, task ffff880016aa6480)
[ 1183.884386] Stack:
[ 1183.884386]  0000000000000000 ffff880016aa6480 ffff880016aa6480 ffff880020a18460
[ 1183.884386] <d> ffff8800197e7ee8 ffff8800197e7f28 ffff880016aa6480 00000000fffffff4
[ 1183.884386] <d> 0000000210000003 0000000000000000 ffff880018e162f0 ffff880020b39e28
[ 1183.884386] Call Trace:
[ 1183.884386]  [<ffffffff81061d90>] ? default_wake_function+0x0/0x20
[ 1183.884386]  [<ffffffffa0e47780>] ? osp_send_update_thread+0x0/0x584 [osp]
[ 1183.884386]  [<ffffffff8109e856>] kthread+0x96/0xa0
[ 1183.884386]  [<ffffffff8100c30a>] child_rip+0xa/0x20
[ 1183.884386]  [<ffffffff815562e0>] ? _spin_unlock_irq+0x30/0x40
[ 1183.884386]  [<ffffffff8100bb10>] ? restore_args+0x0/0x30
[ 1183.884386]  [<ffffffff8109e7c0>] ? kthread+0x0/0xa0


 Comments   
Comment by Oleg Drokin [ 11/Aug/16 ]

Possibly this was introduced by a recent landing of LU-8399

Comment by nasf (Inactive) [ 13/Aug/16 ]

Possibly this was introduced by a recent landing of LU-8399

The LU-8399 patch http://review.whamcloud.com/21330/ makes the cross-MDT RPC sponsor to hold reference on the object to avoid searching objects table for fail callback. Because the in parallel RPCs count is very limited (will NOT exceed the max RPC service threads count), the cached objects for such purpose should NOT be too much to exhaust the RAM. Another suspected case is that the logic may miss to release the object's reference under such ignored corner case(s) as to the accumulative objects exhausted the RAM. But if such case really exist, the most possible case is that the system cannot be cleanup because of (as long as one) object's reference leak. So we should have seen many failures on Maloo about unload module. But we did not hit that up to now. So LU-8399 patch should have NOT cause object's reference leak.

I have tested racer with 4 MDTs in my VM environment for many times, but cannot reproduce the trouble. So it will be helpful if you have more logs about that.

Comment by nasf (Inactive) [ 24/Aug/16 ]

Any further logs? Thanks!

Comment by Peter Jones [ 08/Sep/16 ]

As per John this has not reproduced

Comment by John Hammond [ 27/Sep/17 ]

Seeing this again while testing 2.10.1-RC1 (I doubt it ever went away). This is due a use after free of our in osp_send_update_thread() following an extra thandle put in osp_send_update_req() when the request status is -ENOMEM:

osp_send_update_req():
                if ((rc == -ENOMEM && req->rq_set == NULL) ||
                    (req->rq_transno == 0 && !req->rq_committed)) {
                        if (args->oaua_update != NULL) {
                                /* If osp_update_interpret is not being called,           
                                 * release the osp_thandle */
                                args->oaua_update = NULL;
                                osp_thandle_put(env, oth);
                        }

                        req->rq_cb_data = NULL;
                        rc = rc == 0 ? req->rq_status : rc;
                        osp_trans_callback(env, oth, rc);
                        osp_thandle_put(env, oth); /* HERE */
                        GOTO(out, rc);
                }
Comment by John Hammond [ 09/Nov/17 ]

The rc == -ENOMEM test is confusing local -ENOMEM (from ptlrpc_queue_wait() meaning that the request was never sent) with remote -ENOMEM (meaning that the request was sent and the interpreter was called). Note that the req->rq_set == NULL condition is always true after we leave ptlrpc_queue_wait().

Comment by Gerrit Updater [ 14/Nov/17 ]

John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/30083
Subject: LU-8497 osp: handle remote -ENOMEM from osp_send_update_req()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: df1bbd79d276df84717a905b1a34be7e2fe43081

Comment by Andreas Dilger [ 29/Nov/17 ]

I think this was hit again during mds-survey testing: https://testing.hpdd.intel.com/test_sets/245a0ee6-d0a5-11e7-8027-52540065bddc

{nofromat}

[33205.282827] mdt_out00_000: page allocation failure: order:4, mode:0x10c050
[33205.286444] CPU: 1 PID: 15315 Comm: mdt_out00_000 Tainted: G OE ------------ 3.10.0-693.5.2.el7_lustre.x86_64 #1
[33205.289410] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[33205.299931] Call Trace:
[33205.302118] [<ffffffff816a3e2d>] dump_stack+0x19/0x1b
[33205.304492] [<ffffffff81188820>] warn_alloc_failed+0x110/0x180
[33205.306852] [<ffffffff8169fe2a>] __alloc_pages_slowpath+0x6b6/0x724
[33205.309234] [<ffffffff8118cdb5>] __alloc_pages_nodemask+0x405/0x420
[33205.311580] [<ffffffff811d1078>] alloc_pages_current+0x98/0x110
[33205.313840] [<ffffffff8118761e>] __get_free_pages+0xe/0x40
[33205.316047] [<ffffffff811dca2e>] kmalloc_order_trace+0x2e/0xa0
[33205.318248] [<ffffffff811e05c1>] __kmalloc+0x211/0x230
[33205.322857] [<ffffffffc0b4fc5d>] out_handle+0xa5d/0x1920 [ptlrpc]
[33205.329424] [<ffffffffc0b4a9b5>] tgt_request_handle+0x925/0x13b0 [ptlrpc]
[33205.331567] [<ffffffffc0aeec3e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
[33205.335695] [<ffffffffc0af23e2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[33205.343358] [<ffffffff810b099f>] kthread+0xcf/0xe0
[33205.413346] LustreError: 11-0: lustre-MDT0002-osp-MDT0000: operation out_update to node 0@lo failed: rc = -12
[33205.417858] LustreError: 15331:0:(layout.c:2087:__req_capsule_get()) @@@ Wrong buffer for field `object_update_reply' (1 of 1) in format `OUT_UPDATE': 0 vs. 4096 (server)
[33205.417858] req@ffff88006fdc1500 x1584861919186832/t0(0) o1000->lustre-MDT0002-osp-MDT0000@0@lo:24/4 lens 376/192 e 0 to 0 dl 1511475148 ref 2 fl Interpret:ReM/0/0 rc -12/-12
[33205.427203] LustreError: 20691:0:(echo_client.c:2048:echo_md_destroy_internal()) Can not unlink child 1073743783: rc = -12
[33205.430025] LustreError: 20691:0:(echo_client.c:2109:echo_destroy_object()) Can not unlink child (null): rc = -12
[33205.433216] general protection fault: 0000 1 SMP
[33205.444034] CPU: 1 PID: 15331 Comm: osp_up2-0 Tainted: G OE ------------ 3.10.0-693.5.2.el7_lustre.x86_64 #1
[33205.444034] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[33205.444034] RIP: 0010:[<ffffffffc10938bd>] [<ffffffffc10938bd>] osp_send_update_thread+0x1dd/0x600 [osp]


Comment by Gerrit Updater [ 01/Dec/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30083/
Subject: LU-8497 osp: handle remote -ENOMEM from osp_send_update_req()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: cef8983c8b5bf51b58df23a779769cc4b8ca8db5

Comment by Peter Jones [ 01/Dec/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 04/Dec/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30354
Subject: LU-8497 osp: handle remote -ENOMEM from osp_send_update_req()
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: c72defc7b06499380e89c88b904ccf10d2041245

Comment by Gerrit Updater [ 06/Dec/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30354/
Subject: LU-8497 osp: handle remote -ENOMEM from osp_send_update_req()
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 2f7cf70d70db2e3a801644f6d7eaeb09311e322b

Generated at Sat Feb 10 02:18:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.