[LU-8497] GPF after page allocation failure in osp_send_update_thread() - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.11.0, Lustre 2.10.2
Affects Version/s: Lustre 2.9.0
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

This is seen running racer with MDSCOUNT=4 this morning (v2_8_56_0-24-g272c136):

[ 1181.303743] Lustre: DEBUG MARKER: == racer test 1: racer on clients: t DURATION=300 == 
09:28:27 (1470925707)
[ 1182.650010] LustreError: 27979:0:(llog_cat.c:712:llog_cat_cancel_records()) lustre-MDT0
000-osp-MDT0002: fail to cancel 1 of 1 llog-records: rc = -116
[ 1182.662231] LustreError: 27979:0:(llog_cat.c:712:llog_cat_cancel_records()) Skipped 2 p
revious similar messages
[ 1183.479062] mdt_out01_000: page allocation failure. order:4, mode:0x50
[ 1183.484771] Pid: 27608, comm: mdt_out01_000 Not tainted 2.6.32-431.29.2.el6.lustre.x86_
64 #1
[ 1183.492551] Call Trace:
[ 1183.494672]  [<ffffffff81142941>] ? __alloc_pages_nodemask+0x791/0x950
[ 1183.500338]  [<ffffffff81183e8e>] ? kmem_getpages+0x6e/0x170
[ 1183.505207]  [<ffffffff81186819>] ? fallback_alloc+0x1c9/0x2b0
[ 1183.510370]  [<ffffffff81186077>] ? cache_grow+0x4c7/0x530
[ 1183.515511]  [<ffffffff811864fb>] ? ____cache_alloc_node+0xab/0x200
[ 1183.521154]  [<ffffffff81186c53>] ? __kmalloc+0x273/0x310
[ 1183.526091]  [<ffffffffa1666f84>] ? out_handle+0xa84/0x1870 [ptlrpc]
[ 1183.531889]  [<ffffffffa1666f84>] ? out_handle+0xa84/0x1870 [ptlrpc]
[ 1183.537747]  [<ffffffff81554a7a>] ? __mutex_lock_common+0x2ca/0x400
[ 1183.543492]  [<ffffffffa1657650>] ? req_can_reconstruct+0x50/0x130 [ptlrpc]
[ 1183.550049]  [<ffffffffa1657650>] ? req_can_reconstruct+0x50/0x130 [ptlrpc]
[ 1183.556371]  [<ffffffffa15fa5bc>] ? lustre_msg_get_opc+0x8c/0x100 [ptlrpc]
[ 1183.562476]  [<ffffffff815545ee>] ? mutex_unlock+0xe/0x10
[ 1183.567858]  [<ffffffffa165766e>] ? req_can_reconstruct+0x6e/0x130 [ptlrpc]
[ 1183.574021]  [<ffffffffa165e8ff>] ? tgt_request_handle+0x90f/0x1470 [ptlrpc]
[ 1183.580210]  [<ffffffffa160c752>] ? ptlrpc_main+0xd02/0x1800 [ptlrpc]
[ 1183.586378]  [<ffffffffa160ba50>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
[ 1183.592153]  [<ffffffff8109e856>] ? kthread+0x96/0xa0
[ 1183.597098]  [<ffffffff8100c30a>] ? child_rip+0xa/0x20
[ 1183.602225]  [<ffffffff815562e0>] ? _spin_unlock_irq+0x30/0x40
[ 1183.607772]  [<ffffffff8100bb10>] ? restore_args+0x0/0x30
[ 1183.612791]  [<ffffffff8109e7c0>] ? kthread+0x0/0xa0
[ 1183.617476]  [<ffffffff8100c300>] ? child_rip+0x0/0x20
[ 1183.622146] Mem-Info:
[ 1183.624184] Node 0 DMA per-cpu:
[ 1183.626999] CPU    0: hi:    0, btch:   1 usd:   0
[ 1183.631320] CPU    1: hi:    0, btch:   1 usd:   0
[ 1183.635664] CPU    2: hi:    0, btch:   1 usd:   0
[ 1183.639960] CPU    3: hi:    0, btch:   1 usd:   0
[ 1183.644777] Node 0 DMA32 per-cpu:
[ 1183.647943] CPU    0: hi:  186, btch:  31 usd:  29
[ 1183.652328] CPU    1: hi:  186, btch:  31 usd:  47
[ 1183.656776] CPU    2: hi:  186, btch:  31 usd:  46
[ 1183.661341] CPU    3: hi:  186, btch:  31 usd:   0
[ 1183.665760] active_anon:52142 inactive_anon:6 isolated_anon:0
[ 1183.665761]  active_file:56776 inactive_file:58980 isolated_file:0
[ 1183.665762]  unevictable:0 dirty:248 writeback:0 unstable:0
[ 1183.665763]  free:63417 slab_reclaimable:18898 slab_unreclaimable:98982
[ 1183.665764]  mapped:2060 shmem:97 pagetables:20216 bounce:0
[ 1183.693292] Node 0 DMA free:8340kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15364kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:7204kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 1183.727158] lowmem_reserve[]: 0 2004 2004 2004
[ 1183.731294] Node 0 DMA32 free:288564kB min:44720kB low:55900kB high:67080kB active_anon:208856kB inactive_anon:24kB active_file:208264kB inactive_file:208424kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052312kB mlocked:0kB dirty:1088kB writeback:0kB mapped:8164kB shmem:388kB slab_reclaimable:77228kB slab_unreclaimable:389060kB kernel_stack:13232kB pagetables:81316kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 1183.768744] lowmem_reserve[]: 0 0 0 0
[ 1183.772294] Node 0 DMA: 1*4kB 0*8kB 1*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 1*4096kB = 8340kB
[ 1183.783726] Node 0 DMA32: 30239*4kB 17514*8kB 3395*16kB 863*32kB 215*64kB 101*128kB 28*256kB 1*512kB 4*1024kB 2*2048kB 3*4096kB = 397852kB
[ 1183.798187] 70451 total pagecache pages
[ 1183.802374] 0 pages in swap cache
[ 1183.806010] Swap cache stats: add 0, delete 0, find 0/0
[ 1183.811691] Free swap  = 0kB
[ 1183.814789] Total swap = 0kB
[ 1183.824529] 524285 pages RAM
[ 1183.827700] 49009 pages reserved
[ 1183.831149] 291298 pages shared
[ 1183.834550] 215587 pages non-shared
[ 1183.838902] LustreError: 11-0: lustre-MDT0001-osp-MDT0002: operation out_update to node[ 1183.814789] Total swap = 0kB
[ 1183.824529] 524285 pages RAM
[ 1183.827700] 49009 pages reserved
[ 1183.831149] 291298 pages shared
[ 1183.834550] 215587 pages non-shared
[ 1183.838902] LustreError: 11-0: lustre-MDT0001-osp-MDT0002: operation out_update to node
 0@lo failed: rc = -12
[ 1183.849680] LustreError: 27973:0:(layout.c:2074:__req_capsule_get()) @@@ Wrong buffer for field `object_update_reply' (1 of 1) in format `OUT_UPDATE': 0 vs. 4096 (server)
[ 1183.849682]   req@ffff88006df62088 x1542376881533200/t0(0) o1000->lustre-MDT0001-osp-MDT0002@0@lo:24/4 lens 424/192 e 0 to 0 dl 1470925716 ref 2 fl Interpret:ReM/0/0 rc -12/-12
[ 1183.883591] general protection fault: 0000 [#1] SMP 
[ 1183.884386] last sysfs file: /sys/devices/system/cpu/possible
[ 1183.884386] CPU 3 
[ 1183.884386] Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U) osd_ldiskfs(U) ldiskfs(U) lquota(U) lfsck(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc_gss(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) libcfs(U) exportfs sha512_generic sha256_generic autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 zfs(U) zcommon(U) znvpair(U) spl(U) zlib_deflate zavl(U) zunicode(U) microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
[ 1183.884386] 
[ 1183.884386] Pid: 27973, comm: osp_up1-2 Not tainted 2.6.32-431.29.2.el6.lustre.x86_64 #1 Bochs Bochs
[ 1183.884386] RIP: 0010:[<ffffffffa0e4793f>]  [<ffffffffa0e4793f>] osp_send_update_thread+0x1bf/0x584 [osp]
[ 1183.884386] RSP: 0000:ffff880020b39dd0  EFLAGS: 00010282
[ 1183.884386] RAX: ffff880065e6c3e0 RBX: ffff880020a18208 RCX: 0000000000000000
[ 1183.884386] RDX: 000000000000000d RSI: 6b6b6b6b6b6b6b6b RDI: 0000000000000282
[ 1183.884386] RBP: ffff880020b39eb0 R08: 0000000000000000 R09: ffff8800729b4508
[ 1183.884386] R10: 09f911029d74e35b R11: 0000000000000000 R12: ffff8800197e7ed8
[ 1183.884386] R13: ffff8800197e7ee8 R14: ffff880020b39e10 R15: ffff880020b39e78
[ 1183.884386] FS:  0000000000000000(0000) GS:ffff88000d800000(0000) knlGS:0000000000000000
[ 1183.884386] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 1183.884386] CR2: 0000003d046acc6d CR3: 0000000001a85000 CR4: 00000000000006e0
[ 1183.884386] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1183.884386] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1183.884386] Process osp_up1-2 (pid: 27973, threadinfo ffff880020b38000, task ffff880016aa6480)
[ 1183.884386] Stack:
[ 1183.884386]  0000000000000000 ffff880016aa6480 ffff880016aa6480 ffff880020a18460
[ 1183.884386] <d> ffff8800197e7ee8 ffff8800197e7f28 ffff880016aa6480 00000000fffffff4
[ 1183.884386] <d> 0000000210000003 0000000000000000 ffff880018e162f0 ffff880020b39e28
[ 1183.884386] Call Trace:
[ 1183.884386]  [<ffffffff81061d90>] ? default_wake_function+0x0/0x20
[ 1183.884386]  [<ffffffffa0e47780>] ? osp_send_update_thread+0x0/0x584 [osp]
[ 1183.884386]  [<ffffffff8109e856>] kthread+0x96/0xa0
[ 1183.884386]  [<ffffffff8100c30a>] child_rip+0xa/0x20
[ 1183.884386]  [<ffffffff815562e0>] ? _spin_unlock_irq+0x30/0x40
[ 1183.884386]  [<ffffffff8100bb10>] ? restore_args+0x0/0x30
[ 1183.884386]  [<ffffffff8109e7c0>] ? kthread+0x0/0xa0

GPF after page allocation failure in osp_send_update_thread()

Details

Description

Attachments

Activity

People

Dates