Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.9.0
-
None
-
3
-
9223372036854775807
Description
This is seen running racer with MDSCOUNT=4 this morning (v2_8_56_0-24-g272c136):
[ 1181.303743] Lustre: DEBUG MARKER: == racer test 1: racer on clients: t DURATION=300 == 09:28:27 (1470925707) [ 1182.650010] LustreError: 27979:0:(llog_cat.c:712:llog_cat_cancel_records()) lustre-MDT0 000-osp-MDT0002: fail to cancel 1 of 1 llog-records: rc = -116 [ 1182.662231] LustreError: 27979:0:(llog_cat.c:712:llog_cat_cancel_records()) Skipped 2 p revious similar messages [ 1183.479062] mdt_out01_000: page allocation failure. order:4, mode:0x50 [ 1183.484771] Pid: 27608, comm: mdt_out01_000 Not tainted 2.6.32-431.29.2.el6.lustre.x86_ 64 #1 [ 1183.492551] Call Trace: [ 1183.494672] [<ffffffff81142941>] ? __alloc_pages_nodemask+0x791/0x950 [ 1183.500338] [<ffffffff81183e8e>] ? kmem_getpages+0x6e/0x170 [ 1183.505207] [<ffffffff81186819>] ? fallback_alloc+0x1c9/0x2b0 [ 1183.510370] [<ffffffff81186077>] ? cache_grow+0x4c7/0x530 [ 1183.515511] [<ffffffff811864fb>] ? ____cache_alloc_node+0xab/0x200 [ 1183.521154] [<ffffffff81186c53>] ? __kmalloc+0x273/0x310 [ 1183.526091] [<ffffffffa1666f84>] ? out_handle+0xa84/0x1870 [ptlrpc] [ 1183.531889] [<ffffffffa1666f84>] ? out_handle+0xa84/0x1870 [ptlrpc] [ 1183.537747] [<ffffffff81554a7a>] ? __mutex_lock_common+0x2ca/0x400 [ 1183.543492] [<ffffffffa1657650>] ? req_can_reconstruct+0x50/0x130 [ptlrpc] [ 1183.550049] [<ffffffffa1657650>] ? req_can_reconstruct+0x50/0x130 [ptlrpc] [ 1183.556371] [<ffffffffa15fa5bc>] ? lustre_msg_get_opc+0x8c/0x100 [ptlrpc] [ 1183.562476] [<ffffffff815545ee>] ? mutex_unlock+0xe/0x10 [ 1183.567858] [<ffffffffa165766e>] ? req_can_reconstruct+0x6e/0x130 [ptlrpc] [ 1183.574021] [<ffffffffa165e8ff>] ? tgt_request_handle+0x90f/0x1470 [ptlrpc] [ 1183.580210] [<ffffffffa160c752>] ? ptlrpc_main+0xd02/0x1800 [ptlrpc] [ 1183.586378] [<ffffffffa160ba50>] ? ptlrpc_main+0x0/0x1800 [ptlrpc] [ 1183.592153] [<ffffffff8109e856>] ? kthread+0x96/0xa0 [ 1183.597098] [<ffffffff8100c30a>] ? child_rip+0xa/0x20 [ 1183.602225] [<ffffffff815562e0>] ? _spin_unlock_irq+0x30/0x40 [ 1183.607772] [<ffffffff8100bb10>] ? restore_args+0x0/0x30 [ 1183.612791] [<ffffffff8109e7c0>] ? kthread+0x0/0xa0 [ 1183.617476] [<ffffffff8100c300>] ? child_rip+0x0/0x20 [ 1183.622146] Mem-Info: [ 1183.624184] Node 0 DMA per-cpu: [ 1183.626999] CPU 0: hi: 0, btch: 1 usd: 0 [ 1183.631320] CPU 1: hi: 0, btch: 1 usd: 0 [ 1183.635664] CPU 2: hi: 0, btch: 1 usd: 0 [ 1183.639960] CPU 3: hi: 0, btch: 1 usd: 0 [ 1183.644777] Node 0 DMA32 per-cpu: [ 1183.647943] CPU 0: hi: 186, btch: 31 usd: 29 [ 1183.652328] CPU 1: hi: 186, btch: 31 usd: 47 [ 1183.656776] CPU 2: hi: 186, btch: 31 usd: 46 [ 1183.661341] CPU 3: hi: 186, btch: 31 usd: 0 [ 1183.665760] active_anon:52142 inactive_anon:6 isolated_anon:0 [ 1183.665761] active_file:56776 inactive_file:58980 isolated_file:0 [ 1183.665762] unevictable:0 dirty:248 writeback:0 unstable:0 [ 1183.665763] free:63417 slab_reclaimable:18898 slab_unreclaimable:98982 [ 1183.665764] mapped:2060 shmem:97 pagetables:20216 bounce:0 [ 1183.693292] Node 0 DMA free:8340kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15364kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:7204kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 1183.727158] lowmem_reserve[]: 0 2004 2004 2004 [ 1183.731294] Node 0 DMA32 free:288564kB min:44720kB low:55900kB high:67080kB active_anon:208856kB inactive_anon:24kB active_file:208264kB inactive_file:208424kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052312kB mlocked:0kB dirty:1088kB writeback:0kB mapped:8164kB shmem:388kB slab_reclaimable:77228kB slab_unreclaimable:389060kB kernel_stack:13232kB pagetables:81316kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 1183.768744] lowmem_reserve[]: 0 0 0 0 [ 1183.772294] Node 0 DMA: 1*4kB 0*8kB 1*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 1*4096kB = 8340kB [ 1183.783726] Node 0 DMA32: 30239*4kB 17514*8kB 3395*16kB 863*32kB 215*64kB 101*128kB 28*256kB 1*512kB 4*1024kB 2*2048kB 3*4096kB = 397852kB [ 1183.798187] 70451 total pagecache pages [ 1183.802374] 0 pages in swap cache [ 1183.806010] Swap cache stats: add 0, delete 0, find 0/0 [ 1183.811691] Free swap = 0kB [ 1183.814789] Total swap = 0kB [ 1183.824529] 524285 pages RAM [ 1183.827700] 49009 pages reserved [ 1183.831149] 291298 pages shared [ 1183.834550] 215587 pages non-shared [ 1183.838902] LustreError: 11-0: lustre-MDT0001-osp-MDT0002: operation out_update to node[ 1183.814789] Total swap = 0kB [ 1183.824529] 524285 pages RAM [ 1183.827700] 49009 pages reserved [ 1183.831149] 291298 pages shared [ 1183.834550] 215587 pages non-shared [ 1183.838902] LustreError: 11-0: lustre-MDT0001-osp-MDT0002: operation out_update to node 0@lo failed: rc = -12 [ 1183.849680] LustreError: 27973:0:(layout.c:2074:__req_capsule_get()) @@@ Wrong buffer for field `object_update_reply' (1 of 1) in format `OUT_UPDATE': 0 vs. 4096 (server) [ 1183.849682] req@ffff88006df62088 x1542376881533200/t0(0) o1000->lustre-MDT0001-osp-MDT0002@0@lo:24/4 lens 424/192 e 0 to 0 dl 1470925716 ref 2 fl Interpret:ReM/0/0 rc -12/-12 [ 1183.883591] general protection fault: 0000 [#1] SMP [ 1183.884386] last sysfs file: /sys/devices/system/cpu/possible [ 1183.884386] CPU 3 [ 1183.884386] Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U) osd_ldiskfs(U) ldiskfs(U) lquota(U) lfsck(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc_gss(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) libcfs(U) exportfs sha512_generic sha256_generic autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 zfs(U) zcommon(U) znvpair(U) spl(U) zlib_deflate zavl(U) zunicode(U) microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs] [ 1183.884386] [ 1183.884386] Pid: 27973, comm: osp_up1-2 Not tainted 2.6.32-431.29.2.el6.lustre.x86_64 #1 Bochs Bochs [ 1183.884386] RIP: 0010:[<ffffffffa0e4793f>] [<ffffffffa0e4793f>] osp_send_update_thread+0x1bf/0x584 [osp] [ 1183.884386] RSP: 0000:ffff880020b39dd0 EFLAGS: 00010282 [ 1183.884386] RAX: ffff880065e6c3e0 RBX: ffff880020a18208 RCX: 0000000000000000 [ 1183.884386] RDX: 000000000000000d RSI: 6b6b6b6b6b6b6b6b RDI: 0000000000000282 [ 1183.884386] RBP: ffff880020b39eb0 R08: 0000000000000000 R09: ffff8800729b4508 [ 1183.884386] R10: 09f911029d74e35b R11: 0000000000000000 R12: ffff8800197e7ed8 [ 1183.884386] R13: ffff8800197e7ee8 R14: ffff880020b39e10 R15: ffff880020b39e78 [ 1183.884386] FS: 0000000000000000(0000) GS:ffff88000d800000(0000) knlGS:0000000000000000 [ 1183.884386] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 1183.884386] CR2: 0000003d046acc6d CR3: 0000000001a85000 CR4: 00000000000006e0 [ 1183.884386] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1183.884386] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1183.884386] Process osp_up1-2 (pid: 27973, threadinfo ffff880020b38000, task ffff880016aa6480) [ 1183.884386] Stack: [ 1183.884386] 0000000000000000 ffff880016aa6480 ffff880016aa6480 ffff880020a18460 [ 1183.884386] <d> ffff8800197e7ee8 ffff8800197e7f28 ffff880016aa6480 00000000fffffff4 [ 1183.884386] <d> 0000000210000003 0000000000000000 ffff880018e162f0 ffff880020b39e28 [ 1183.884386] Call Trace: [ 1183.884386] [<ffffffff81061d90>] ? default_wake_function+0x0/0x20 [ 1183.884386] [<ffffffffa0e47780>] ? osp_send_update_thread+0x0/0x584 [osp] [ 1183.884386] [<ffffffff8109e856>] kthread+0x96/0xa0 [ 1183.884386] [<ffffffff8100c30a>] child_rip+0xa/0x20 [ 1183.884386] [<ffffffff815562e0>] ? _spin_unlock_irq+0x30/0x40 [ 1183.884386] [<ffffffff8100bb10>] ? restore_args+0x0/0x30 [ 1183.884386] [<ffffffff8109e7c0>] ? kthread+0x0/0xa0
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30354/
Subject:
LU-8497osp: handle remote -ENOMEM from osp_send_update_req()Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 2f7cf70d70db2e3a801644f6d7eaeb09311e322b