Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17384

OOMkiller invoked on lustre OSS nodes under IOR

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.14.0, Lustre 2.15.0
    • Clients : Lustre 2.12
      servers: Lustre 2.14 and 2.15 (both tested and reproduced)
      toss. : 4.6-6 and 4.7-2 (both reproduced)
    • 4
    • 9223372036854775807

    Description

      During SWL for toss 4.6-6rc3 and also 4,7-2rc2, we found that an IOR run could trigger an OOM on an OSS node.

      We were able to reproduce this issue using IOR under srun.

      The following srun/ior command was used:

      srun -N 70 -n 7840 /g/g0/carbonne/ior/src/ior -a MPIIO -i 5 -b 256MB -t 128MB -v -g -F -C -w -W -r -o /p/lflood/carbonne/oomtest/ior_1532/ior
      

      Example at 2023-10-17 12:31:28 on garter5, see console log.

      Mem-info from one oom-killer console log message set is:

       

      Mem-Info:
      active_anon:22868 inactive_anon:69168 isolated_anon:0
       active_file:357 inactive_file:770 isolated_file:250
       unevictable:10785 dirty:0 writeback:0
       slab_reclaimable:185039 slab_unreclaimable:2082954
       mapped:12536 shmem:46663 pagetables:2485 bounce:0
       free:134668 free_pcp:203 free_cma:0
      
      Node 0 active_anon:75888kB inactive_anon:87304kB active_file:1840kB
       inactive_file:1464kB  unevictable:43080kB isolated(anon):0kB
       isolated(file):208kB mapped:19680kB dirty:0kB writeback:0kB
       shmem:127712kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 26624kB
       writeback_tmp:0kB kernel_stack:31416kB pagetables:3896kB
       all_unreclaimable? no
      
      Node 0 DMA free:11264kB min:4kB low:16kB high:28kB active_anon:0kB
       inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
       writepending:0kB present: 15996kB managed:15360kB mlocked:0kB bounce:0kB
       free_pcp:0kB local_pcp:0kB free_cma:0kB
       lowmem_reserve[]: 0 1183 94839 94839 94839
      
      Node 0 DMA32 free:375156kB min:556kB low:1764kB high:2972kB
       active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:4kB
       unevictable:0kB writepending:0kB present:1723228kB managed:1325704kB
       mlocked:0kB bounce:0kB free_pcp:260kB local_pcp:0kB free_cma:0kB
       lowmem_reserve[]: 0 0 93655 93655 93655
      
      Node 0 Normal free:46072kB min:44044kB low:139944kB high:235844kB
       active_anon:75888kB inactive_anon:87304kB active_file:1860kB
       inactive_file:1584kB unevictable: 43080kB writepending:0kB
       present:97517568kB managed:95912024kB mlocked:43080kB bounce:0kB
       free_pcp:372kB local_pcp:0kB free_cma:0kB lowmem_reserve[]: 0 0 0 0 0
      
      Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
       1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
      
      Node 0 DMA32: 3*4kB (M) 66*8kB (UM) 202*16kB (UM) 152*32kB (UM)
       168*64kB (UM) 85*128kB (UM) 24*256kB (UM) 20*512kB (UM) 11*1024kB (UM)
       7*2048kB (UM) 74*4096kB (# M) = 375356kB
      
      Node 0 Normal: 151*4kB (MEH) 853*8kB (UMEH) 640*16kB (MEH) 412*32kB (MEH)
       132*64kB (ME) 33*128kB (UE) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 43524kB
      
      Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      
      Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      
      Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      
      Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      
      53515 total pagecache pages
      0 pages in swap cache
      Swap cache stats: add 0, delete 0, find 0/0
      Free swap  = 0kB
      Total swap = 0kB
      49980022 pages RAM
      0 pages HighMem/MovableOnly
      896433 pages reserved
      0 pages hwpoisoned
      

      =============================================================

      local Jira ticket:  TOSS-6158

      Attachments

        Activity

          [LU-17384] OOMkiller invoked on lustre OSS nodes under IOR

          I retested the issue on lustre 2.15.4_4 with ZFS 2.2.3/4 and I can no longer reproduce the issue.
          At this point we can close this issues and explain that starting at ZFS 2.2.3 the issue is fixed.

          carbonneau Eric Carbonneau (Inactive) added a comment - I retested the issue on lustre 2.15.4_4 with ZFS 2.2.3/4 and I can no longer reproduce the issue. At this point we can close this issues and explain that starting at ZFS 2.2.3 the issue is fixed.

          Adding the arc_summary for the oom:
          arc_summary

          ------------------------------------------------------------------------
          ZFS Subsystem Report Tue Feb 13 12:22:54 2024
          Linux 4.18.0-513.11.1.1toss.t4.x86_64 2.1.14_1llnl-1
          Machine: garter5 (x86_64) 2.1.14_1llnl-1

          ARC status: HEALTHY
          Memory throttle count: 0

          ARC size (current): 123.8 % 115.9 GiB
          Target size (adaptive): 100.0 % 93.6 GiB
          Min size (hard limit): 6.2 % 5.9 GiB
          Max size (high water): 16:1 93.6 GiB
          Most Frequently Used (MFU) cache size: 66.7 % 6.0 GiB
          Most Recently Used (MRU) cache size: 33.3 % 3.0 GiB
          Metadata cache size (hard limit): 75.0 % 70.2 GiB
          Metadata cache size (current): 1.1 % 794.2 MiB
          Dnode cache size (hard limit): 10.0 % 7.0 GiB
          Dnode cache size (current): 0.2 % 16.5 MiB

          ARC hash breakdown:
          Elements max: 394.5k
          Elements current: 95.8 % 377.8k
          Collisions: 10.1k
          Chain max: 2
          Chains: 2.2k

          ARC misc:
          Deleted: 731.6k
          Mutex misses: 4.3M
          Eviction skips: 325.9M
          Eviction skips due to L2 writes: 0
          L2 cached evictions: 0 Bytes
          L2 eligible evictions: 200.2 GiB
          L2 eligible MFU evictions: 72.7 % 145.5 GiB
          L2 eligible MRU evictions: 27.3 % 54.7 GiB
          L2 ineligible evictions: 716.4 GiB

          ARC total accesses (hits + misses): 7.5M
          Cache hit ratio: 81.0 % 6.0M
          Cache miss ratio: 19.0 % 1.4M
          Actual hit ratio (MFU + MRU hits): 80.9 % 6.0M
          Data demand efficiency: 58.3 % 319.8k
          Data prefetch efficiency: 0.4 % 1.0M

          Cache hits by cache type:
          Most frequently used (MFU): 87.3 % 5.3M
          Most recently used (MRU): 12.7 % 763.4k
          Most frequently used (MFU) ghost: 2.8 % 171.0k
          Most recently used (MRU) ghost: 2.3 % 139.8k

          Cache hits by data type:
          Demand data: 3.1 % 186.3k
          Prefetch data: 0.1 % 3.9k
          Demand metadata: 96.8 % 5.8M
          Prefetch metadata: < 0.1 % 469

          Cache misses by data type:
          Demand data: 9.4 % 133.4k
          Prefetch data: 71.3 % 1.0M
          Demand metadata: 18.9 % 268.9k
          Prefetch metadata: 0.3 % 4.6k

          DMU prefetch efficiency: 926.8k
          Hit ratio: 86.2 % 799.0k
          Miss ratio: 13.8 % 127.8k

          L2ARC not detected, skipping section

          Solaris Porting Layer (SPL):
          spl_hostid 0
          spl_hostid_path /etc/hostid
          spl_kmem_alloc_max 1048576
          spl_kmem_alloc_warn 65536
          spl_kmem_cache_kmem_threads 4
          spl_kmem_cache_magazine_size 0
          spl_kmem_cache_max_size 32
          spl_kmem_cache_obj_per_slab 8
          spl_kmem_cache_reclaim 0
          spl_kmem_cache_slab_limit 16384
          spl_max_show_tasks 512
          spl_panic_halt 0
          spl_schedule_hrtimeout_slack_us 0
          spl_taskq_kick 0
          spl_taskq_thread_bind 0
          spl_taskq_thread_dynamic 0
          spl_taskq_thread_priority 1
          spl_taskq_thread_sequential 4

          Tunables:
          dbuf_cache_hiwater_pct 10
          dbuf_cache_lowater_pct 10
          dbuf_cache_max_bytes 18446744073709551615
          dbuf_cache_shift 5
          dbuf_metadata_cache_max_bytes 18446744073709551615
          dbuf_metadata_cache_shift 6
          dbuf_mutex_cache_shift 0
          dmu_object_alloc_chunk_shift 7
          dmu_prefetch_max 134217728
          ignore_hole_birth 1
          l2arc_exclude_special 0
          l2arc_feed_again 1
          l2arc_feed_min_ms 200
          l2arc_feed_secs 1
          l2arc_headroom 2
          l2arc_headroom_boost 200
          l2arc_meta_percent 33
          l2arc_mfuonly 0
          l2arc_noprefetch 1
          l2arc_norw 0
          l2arc_rebuild_blocks_min_l2size 1073741824
          l2arc_rebuild_enabled 1
          l2arc_trim_ahead 0
          l2arc_write_boost 8388608
          l2arc_write_max 8388608
          metaslab_aliquot 1048576
          metaslab_bias_enabled 1
          metaslab_debug_load 0
          metaslab_debug_unload 0
          metaslab_df_max_search 16777216
          metaslab_df_use_largest_segment 0
          metaslab_force_ganging 16777217
          metaslab_fragmentation_factor_enabled 1
          metaslab_lba_weighting_enabled 1
          metaslab_preload_enabled 1
          metaslab_unload_delay 32
          metaslab_unload_delay_ms 600000
          send_holes_without_birth_time 1
          spa_asize_inflation 24
          spa_config_path /etc/zfs/zpool.cache
          spa_load_print_vdev_tree 0
          spa_load_verify_data 1
          spa_load_verify_metadata 1
          spa_load_verify_shift 4
          spa_slop_shift 5
          vdev_file_logical_ashift 9
          vdev_file_physical_ashift 9
          vdev_removal_max_span 32768
          vdev_validate_skip 0
          zap_iterate_prefetch 1
          zfetch_array_rd_sz 67108864
          zfetch_max_distance 67108864
          zfetch_max_idistance 67108864
          zfetch_max_sec_reap 2
          zfetch_max_streams 500
          zfetch_min_distance 4194304
          zfetch_min_sec_reap 1
          zfs_abd_scatter_enabled 1
          zfs_abd_scatter_max_order 10
          zfs_abd_scatter_min_size 1536
          zfs_admin_snapshot 0
          zfs_allow_redacted_dataset_mount 0
          zfs_arc_average_blocksize 8192
          zfs_arc_dnode_limit 0
          zfs_arc_dnode_limit_percent 10
          zfs_arc_dnode_reduce_percent 10
          zfs_arc_evict_batch_limit 10
          zfs_arc_eviction_pct 200
          zfs_arc_grow_retry 0
          zfs_arc_lotsfree_percent 10
          zfs_arc_max 0
          zfs_arc_meta_adjust_restarts 4096
          zfs_arc_meta_limit 0
          zfs_arc_meta_limit_percent 75
          zfs_arc_meta_min 0
          zfs_arc_meta_prune 10000
          zfs_arc_meta_strategy 1
          zfs_arc_min 0
          zfs_arc_min_prefetch_ms 0
          zfs_arc_min_prescient_prefetch_ms 0
          zfs_arc_p_dampener_disable 1
          zfs_arc_p_min_shift 0
          zfs_arc_pc_percent 0
          zfs_arc_prune_task_threads 1
          zfs_arc_shrink_shift 0
          zfs_arc_shrinker_limit 10000
          zfs_arc_sys_free 0
          zfs_async_block_max_blocks 18446744073709551615
          zfs_autoimport_disable 1
          zfs_btree_verify_intensity 0
          zfs_checksum_events_per_second 20
          zfs_commit_timeout_pct 5
          zfs_compressed_arc_enabled 1
          zfs_condense_indirect_commit_entry_delay_ms 0
          zfs_condense_indirect_obsolete_pct 25
          zfs_condense_indirect_vdevs_enable 1
          zfs_condense_max_obsolete_bytes 1073741824
          zfs_condense_min_mapping_bytes 131072
          zfs_dbgmsg_enable 1
          zfs_dbgmsg_maxsize 4194304
          zfs_dbuf_state_index 0
          zfs_ddt_data_is_special 1
          zfs_deadman_checktime_ms 60000
          zfs_deadman_enabled 1
          zfs_deadman_failmode wait
          zfs_deadman_synctime_ms 600000
          zfs_deadman_ziotime_ms 300000
          zfs_dedup_prefetch 0
          zfs_default_bs 9
          zfs_default_ibs 17
          zfs_delay_min_dirty_percent 60
          zfs_delay_scale 500000
          zfs_delete_blocks 20480
          zfs_dirty_data_max 68719476736
          zfs_dirty_data_max_max 180388626432
          zfs_dirty_data_max_max_percent 25
          zfs_dirty_data_max_percent 10
          zfs_dirty_data_sync_percent 20
          zfs_disable_ivset_guid_check 0
          zfs_dmu_offset_next_sync 1
          zfs_embedded_slog_min_ms 64
          zfs_expire_snapshot 300
          zfs_fallocate_reserve_percent 110
          zfs_flags 0
          zfs_free_bpobj_enabled 1
          zfs_free_leak_on_eio 0
          zfs_free_min_time_ms 1000
          zfs_history_output_max 1048576
          zfs_immediate_write_sz 32768
          zfs_initialize_chunk_size 1048576
          zfs_initialize_value 16045690984833335022
          zfs_keep_log_spacemaps_at_export 0
          zfs_key_max_salt_uses 400000000
          zfs_livelist_condense_new_alloc 0
          zfs_livelist_condense_sync_cancel 0
          zfs_livelist_condense_sync_pause 0
          zfs_livelist_condense_zthr_cancel 0
          zfs_livelist_condense_zthr_pause 0
          zfs_livelist_max_entries 500000
          zfs_livelist_min_percent_shared 75
          zfs_lua_max_instrlimit 100000000
          zfs_lua_max_memlimit 104857600
          zfs_max_async_dedup_frees 100000
          zfs_max_log_walking 5
          zfs_max_logsm_summary_length 10
          zfs_max_missing_tvds 0
          zfs_max_nvlist_src_size 0
          zfs_max_recordsize 16777216
          zfs_metaslab_find_max_tries 100
          zfs_metaslab_fragmentation_threshold 70
          zfs_metaslab_max_size_cache_sec 3600
          zfs_metaslab_mem_limit 25
          zfs_metaslab_segment_weight_enabled 1
          zfs_metaslab_switch_threshold 2
          zfs_metaslab_try_hard_before_gang 0
          zfs_mg_fragmentation_threshold 95
          zfs_mg_noalloc_threshold 0
          zfs_min_metaslabs_to_flush 1
          zfs_multihost_fail_intervals 0
          zfs_multihost_history 1000
          zfs_multihost_import_intervals 20
          zfs_multihost_interval 1000
          zfs_multilist_num_sublists 0
          zfs_no_scrub_io 0
          zfs_no_scrub_prefetch 0
          zfs_nocacheflush 0
          zfs_nopwrite_enabled 1
          zfs_object_mutex_size 64
          zfs_obsolete_min_time_ms 500
          zfs_override_estimate_recordsize 0
          zfs_pd_bytes_max 52428800
          zfs_per_txg_dirty_frees_percent 30
          zfs_prefetch_disable 0
          zfs_read_history 0
          zfs_read_history_hits 0
          zfs_rebuild_max_segment 1048576
          zfs_rebuild_scrub_enabled 1
          zfs_rebuild_vdev_limit 67108864
          zfs_reconstruct_indirect_combinations_max 4096
          zfs_recover 0
          zfs_recv_queue_ff 20
          zfs_recv_queue_length 16777216
          zfs_recv_write_batch_size 1048576
          zfs_removal_ignore_errors 0
          zfs_removal_suspend_progress 0
          zfs_remove_max_segment 16777216
          zfs_resilver_disable_defer 0
          zfs_resilver_min_time_ms 3000
          zfs_scan_blkstats 0
          zfs_scan_checkpoint_intval 7200
          zfs_scan_fill_weight 3
          zfs_scan_ignore_errors 0
          zfs_scan_issue_strategy 0
          zfs_scan_legacy 0
          zfs_scan_max_ext_gap 2097152
          zfs_scan_mem_lim_fact 15
          zfs_scan_mem_lim_soft_fact 20
          zfs_scan_report_txgs 0
          zfs_scan_strict_mem_lim 0
          zfs_scan_suspend_progress 0
          zfs_scan_vdev_limit 16777216
          zfs_scrub_min_time_ms 1000
          zfs_send_corrupt_data 0
          zfs_send_no_prefetch_queue_ff 20
          zfs_send_no_prefetch_queue_length 1048576
          zfs_send_queue_ff 20
          zfs_send_queue_length 16777216
          zfs_send_unmodified_spill_blocks 1
          zfs_slow_io_events_per_second 20
          zfs_spa_discard_memory_limit 16777216
          zfs_special_class_metadata_reserve_pct 25
          zfs_sync_pass_deferred_free 2
          zfs_sync_pass_dont_compress 8
          zfs_sync_pass_rewrite 2
          zfs_sync_taskq_batch_pct 75
          zfs_traverse_indirect_prefetch_limit 32
          zfs_trim_extent_bytes_max 134217728
          zfs_trim_extent_bytes_min 32768
          zfs_trim_metaslab_skip 0
          zfs_trim_queue_limit 10
          zfs_trim_txg_batch 32
          zfs_txg_history 100
          zfs_txg_timeout 5
          zfs_unflushed_log_block_max 131072
          zfs_unflushed_log_block_min 1000
          zfs_unflushed_log_block_pct 400
          zfs_unflushed_log_txg_max 1000
          zfs_unflushed_max_mem_amt 1073741824
          zfs_unflushed_max_mem_ppm 1000
          zfs_unlink_suspend_progress 0
          zfs_user_indirect_is_special 1
          zfs_vdev_aggregate_trim 0
          zfs_vdev_aggregation_limit 1048576
          zfs_vdev_aggregation_limit_non_rotating 131072
          zfs_vdev_async_read_max_active 3
          zfs_vdev_async_read_min_active 1
          zfs_vdev_async_write_active_max_dirty_percent 60
          zfs_vdev_async_write_active_min_dirty_percent 30
          zfs_vdev_async_write_max_active 10
          zfs_vdev_async_write_min_active 2
          zfs_vdev_cache_bshift 16
          zfs_vdev_cache_max 16384
          zfs_vdev_cache_size 0
          zfs_vdev_default_ms_count 200
          zfs_vdev_default_ms_shift 29
          zfs_vdev_initializing_max_active 1
          zfs_vdev_initializing_min_active 1
          zfs_vdev_max_active 1000
          zfs_vdev_max_auto_ashift 14
          zfs_vdev_min_auto_ashift 9
          zfs_vdev_min_ms_count 16
          zfs_vdev_mirror_non_rotating_inc 0
          zfs_vdev_mirror_non_rotating_seek_inc 1
          zfs_vdev_mirror_rotating_inc 0
          zfs_vdev_mirror_rotating_seek_inc 5
          zfs_vdev_mirror_rotating_seek_offset 1048576
          zfs_vdev_ms_count_limit 131072
          zfs_vdev_nia_credit 5
          zfs_vdev_nia_delay 5
          zfs_vdev_open_timeout_ms 1000
          zfs_vdev_queue_depth_pct 1000
          zfs_vdev_raidz_impl cycle [fastest] original scalar sse2 ssse3 avx2 avx512f avx512bw
          zfs_vdev_read_gap_limit 32768
          zfs_vdev_rebuild_max_active 3
          zfs_vdev_rebuild_min_active 1
          zfs_vdev_removal_max_active 2
          zfs_vdev_removal_min_active 1
          zfs_vdev_scheduler unused
          zfs_vdev_scrub_max_active 3
          zfs_vdev_scrub_min_active 1
          zfs_vdev_sync_read_max_active 10
          zfs_vdev_sync_read_min_active 10
          zfs_vdev_sync_write_max_active 10
          zfs_vdev_sync_write_min_active 10
          zfs_vdev_trim_max_active 2
          zfs_vdev_trim_min_active 1
          zfs_vdev_write_gap_limit 4096
          zfs_vnops_read_chunk_size 1048576
          zfs_wrlog_data_max 137438953472
          zfs_zevent_len_max 512
          zfs_zevent_retain_expire_secs 900
          zfs_zevent_retain_max 2000
          zfs_zil_clean_taskq_maxalloc 1048576
          zfs_zil_clean_taskq_minalloc 1024
          zfs_zil_clean_taskq_nthr_pct 100
          zil_maxblocksize 131072
          zil_min_commit_timeout 5000
          zil_nocacheflush 0
          zil_replay_disable 0
          zil_slog_bulk 786432
          zio_deadman_log_all 0
          zio_dva_throttle_enabled 0
          zio_requeue_io_start_cut_in_line 1
          zio_slow_io_ms 30000
          zio_taskq_batch_pct 80
          zio_taskq_batch_tpq 0
          zvol_inhibit_dev 0
          zvol_major 230
          zvol_max_discard_blocks 16384
          zvol_prefetch_bytes 131072
          zvol_request_sync 0
          zvol_threads 32
          zvol_volmode 1

          VDEV cache disabled, skipping section

          ZIL committed transactions: 0
          Commit requests: 10
          Flushes to stable storage: 10
          Transactions to SLOG storage pool: 0 Bytes 0
          Transactions to non-SLOG storage pool: 0 Bytes 0

          carbonneau Eric Carbonneau (Inactive) added a comment - Adding the arc_summary for the oom: arc_summary ------------------------------------------------------------------------ ZFS Subsystem Report Tue Feb 13 12:22:54 2024 Linux 4.18.0-513.11.1.1toss.t4.x86_64 2.1.14_1llnl-1 Machine: garter5 (x86_64) 2.1.14_1llnl-1 ARC status: HEALTHY Memory throttle count: 0 ARC size (current): 123.8 % 115.9 GiB Target size (adaptive): 100.0 % 93.6 GiB Min size (hard limit): 6.2 % 5.9 GiB Max size (high water): 16:1 93.6 GiB Most Frequently Used (MFU) cache size: 66.7 % 6.0 GiB Most Recently Used (MRU) cache size: 33.3 % 3.0 GiB Metadata cache size (hard limit): 75.0 % 70.2 GiB Metadata cache size (current): 1.1 % 794.2 MiB Dnode cache size (hard limit): 10.0 % 7.0 GiB Dnode cache size (current): 0.2 % 16.5 MiB ARC hash breakdown: Elements max: 394.5k Elements current: 95.8 % 377.8k Collisions: 10.1k Chain max: 2 Chains: 2.2k ARC misc: Deleted: 731.6k Mutex misses: 4.3M Eviction skips: 325.9M Eviction skips due to L2 writes: 0 L2 cached evictions: 0 Bytes L2 eligible evictions: 200.2 GiB L2 eligible MFU evictions: 72.7 % 145.5 GiB L2 eligible MRU evictions: 27.3 % 54.7 GiB L2 ineligible evictions: 716.4 GiB ARC total accesses (hits + misses): 7.5M Cache hit ratio: 81.0 % 6.0M Cache miss ratio: 19.0 % 1.4M Actual hit ratio (MFU + MRU hits): 80.9 % 6.0M Data demand efficiency: 58.3 % 319.8k Data prefetch efficiency: 0.4 % 1.0M Cache hits by cache type: Most frequently used (MFU): 87.3 % 5.3M Most recently used (MRU): 12.7 % 763.4k Most frequently used (MFU) ghost: 2.8 % 171.0k Most recently used (MRU) ghost: 2.3 % 139.8k Cache hits by data type: Demand data: 3.1 % 186.3k Prefetch data: 0.1 % 3.9k Demand metadata: 96.8 % 5.8M Prefetch metadata: < 0.1 % 469 Cache misses by data type: Demand data: 9.4 % 133.4k Prefetch data: 71.3 % 1.0M Demand metadata: 18.9 % 268.9k Prefetch metadata: 0.3 % 4.6k DMU prefetch efficiency: 926.8k Hit ratio: 86.2 % 799.0k Miss ratio: 13.8 % 127.8k L2ARC not detected, skipping section Solaris Porting Layer (SPL): spl_hostid 0 spl_hostid_path /etc/hostid spl_kmem_alloc_max 1048576 spl_kmem_alloc_warn 65536 spl_kmem_cache_kmem_threads 4 spl_kmem_cache_magazine_size 0 spl_kmem_cache_max_size 32 spl_kmem_cache_obj_per_slab 8 spl_kmem_cache_reclaim 0 spl_kmem_cache_slab_limit 16384 spl_max_show_tasks 512 spl_panic_halt 0 spl_schedule_hrtimeout_slack_us 0 spl_taskq_kick 0 spl_taskq_thread_bind 0 spl_taskq_thread_dynamic 0 spl_taskq_thread_priority 1 spl_taskq_thread_sequential 4 Tunables: dbuf_cache_hiwater_pct 10 dbuf_cache_lowater_pct 10 dbuf_cache_max_bytes 18446744073709551615 dbuf_cache_shift 5 dbuf_metadata_cache_max_bytes 18446744073709551615 dbuf_metadata_cache_shift 6 dbuf_mutex_cache_shift 0 dmu_object_alloc_chunk_shift 7 dmu_prefetch_max 134217728 ignore_hole_birth 1 l2arc_exclude_special 0 l2arc_feed_again 1 l2arc_feed_min_ms 200 l2arc_feed_secs 1 l2arc_headroom 2 l2arc_headroom_boost 200 l2arc_meta_percent 33 l2arc_mfuonly 0 l2arc_noprefetch 1 l2arc_norw 0 l2arc_rebuild_blocks_min_l2size 1073741824 l2arc_rebuild_enabled 1 l2arc_trim_ahead 0 l2arc_write_boost 8388608 l2arc_write_max 8388608 metaslab_aliquot 1048576 metaslab_bias_enabled 1 metaslab_debug_load 0 metaslab_debug_unload 0 metaslab_df_max_search 16777216 metaslab_df_use_largest_segment 0 metaslab_force_ganging 16777217 metaslab_fragmentation_factor_enabled 1 metaslab_lba_weighting_enabled 1 metaslab_preload_enabled 1 metaslab_unload_delay 32 metaslab_unload_delay_ms 600000 send_holes_without_birth_time 1 spa_asize_inflation 24 spa_config_path /etc/zfs/zpool.cache spa_load_print_vdev_tree 0 spa_load_verify_data 1 spa_load_verify_metadata 1 spa_load_verify_shift 4 spa_slop_shift 5 vdev_file_logical_ashift 9 vdev_file_physical_ashift 9 vdev_removal_max_span 32768 vdev_validate_skip 0 zap_iterate_prefetch 1 zfetch_array_rd_sz 67108864 zfetch_max_distance 67108864 zfetch_max_idistance 67108864 zfetch_max_sec_reap 2 zfetch_max_streams 500 zfetch_min_distance 4194304 zfetch_min_sec_reap 1 zfs_abd_scatter_enabled 1 zfs_abd_scatter_max_order 10 zfs_abd_scatter_min_size 1536 zfs_admin_snapshot 0 zfs_allow_redacted_dataset_mount 0 zfs_arc_average_blocksize 8192 zfs_arc_dnode_limit 0 zfs_arc_dnode_limit_percent 10 zfs_arc_dnode_reduce_percent 10 zfs_arc_evict_batch_limit 10 zfs_arc_eviction_pct 200 zfs_arc_grow_retry 0 zfs_arc_lotsfree_percent 10 zfs_arc_max 0 zfs_arc_meta_adjust_restarts 4096 zfs_arc_meta_limit 0 zfs_arc_meta_limit_percent 75 zfs_arc_meta_min 0 zfs_arc_meta_prune 10000 zfs_arc_meta_strategy 1 zfs_arc_min 0 zfs_arc_min_prefetch_ms 0 zfs_arc_min_prescient_prefetch_ms 0 zfs_arc_p_dampener_disable 1 zfs_arc_p_min_shift 0 zfs_arc_pc_percent 0 zfs_arc_prune_task_threads 1 zfs_arc_shrink_shift 0 zfs_arc_shrinker_limit 10000 zfs_arc_sys_free 0 zfs_async_block_max_blocks 18446744073709551615 zfs_autoimport_disable 1 zfs_btree_verify_intensity 0 zfs_checksum_events_per_second 20 zfs_commit_timeout_pct 5 zfs_compressed_arc_enabled 1 zfs_condense_indirect_commit_entry_delay_ms 0 zfs_condense_indirect_obsolete_pct 25 zfs_condense_indirect_vdevs_enable 1 zfs_condense_max_obsolete_bytes 1073741824 zfs_condense_min_mapping_bytes 131072 zfs_dbgmsg_enable 1 zfs_dbgmsg_maxsize 4194304 zfs_dbuf_state_index 0 zfs_ddt_data_is_special 1 zfs_deadman_checktime_ms 60000 zfs_deadman_enabled 1 zfs_deadman_failmode wait zfs_deadman_synctime_ms 600000 zfs_deadman_ziotime_ms 300000 zfs_dedup_prefetch 0 zfs_default_bs 9 zfs_default_ibs 17 zfs_delay_min_dirty_percent 60 zfs_delay_scale 500000 zfs_delete_blocks 20480 zfs_dirty_data_max 68719476736 zfs_dirty_data_max_max 180388626432 zfs_dirty_data_max_max_percent 25 zfs_dirty_data_max_percent 10 zfs_dirty_data_sync_percent 20 zfs_disable_ivset_guid_check 0 zfs_dmu_offset_next_sync 1 zfs_embedded_slog_min_ms 64 zfs_expire_snapshot 300 zfs_fallocate_reserve_percent 110 zfs_flags 0 zfs_free_bpobj_enabled 1 zfs_free_leak_on_eio 0 zfs_free_min_time_ms 1000 zfs_history_output_max 1048576 zfs_immediate_write_sz 32768 zfs_initialize_chunk_size 1048576 zfs_initialize_value 16045690984833335022 zfs_keep_log_spacemaps_at_export 0 zfs_key_max_salt_uses 400000000 zfs_livelist_condense_new_alloc 0 zfs_livelist_condense_sync_cancel 0 zfs_livelist_condense_sync_pause 0 zfs_livelist_condense_zthr_cancel 0 zfs_livelist_condense_zthr_pause 0 zfs_livelist_max_entries 500000 zfs_livelist_min_percent_shared 75 zfs_lua_max_instrlimit 100000000 zfs_lua_max_memlimit 104857600 zfs_max_async_dedup_frees 100000 zfs_max_log_walking 5 zfs_max_logsm_summary_length 10 zfs_max_missing_tvds 0 zfs_max_nvlist_src_size 0 zfs_max_recordsize 16777216 zfs_metaslab_find_max_tries 100 zfs_metaslab_fragmentation_threshold 70 zfs_metaslab_max_size_cache_sec 3600 zfs_metaslab_mem_limit 25 zfs_metaslab_segment_weight_enabled 1 zfs_metaslab_switch_threshold 2 zfs_metaslab_try_hard_before_gang 0 zfs_mg_fragmentation_threshold 95 zfs_mg_noalloc_threshold 0 zfs_min_metaslabs_to_flush 1 zfs_multihost_fail_intervals 0 zfs_multihost_history 1000 zfs_multihost_import_intervals 20 zfs_multihost_interval 1000 zfs_multilist_num_sublists 0 zfs_no_scrub_io 0 zfs_no_scrub_prefetch 0 zfs_nocacheflush 0 zfs_nopwrite_enabled 1 zfs_object_mutex_size 64 zfs_obsolete_min_time_ms 500 zfs_override_estimate_recordsize 0 zfs_pd_bytes_max 52428800 zfs_per_txg_dirty_frees_percent 30 zfs_prefetch_disable 0 zfs_read_history 0 zfs_read_history_hits 0 zfs_rebuild_max_segment 1048576 zfs_rebuild_scrub_enabled 1 zfs_rebuild_vdev_limit 67108864 zfs_reconstruct_indirect_combinations_max 4096 zfs_recover 0 zfs_recv_queue_ff 20 zfs_recv_queue_length 16777216 zfs_recv_write_batch_size 1048576 zfs_removal_ignore_errors 0 zfs_removal_suspend_progress 0 zfs_remove_max_segment 16777216 zfs_resilver_disable_defer 0 zfs_resilver_min_time_ms 3000 zfs_scan_blkstats 0 zfs_scan_checkpoint_intval 7200 zfs_scan_fill_weight 3 zfs_scan_ignore_errors 0 zfs_scan_issue_strategy 0 zfs_scan_legacy 0 zfs_scan_max_ext_gap 2097152 zfs_scan_mem_lim_fact 15 zfs_scan_mem_lim_soft_fact 20 zfs_scan_report_txgs 0 zfs_scan_strict_mem_lim 0 zfs_scan_suspend_progress 0 zfs_scan_vdev_limit 16777216 zfs_scrub_min_time_ms 1000 zfs_send_corrupt_data 0 zfs_send_no_prefetch_queue_ff 20 zfs_send_no_prefetch_queue_length 1048576 zfs_send_queue_ff 20 zfs_send_queue_length 16777216 zfs_send_unmodified_spill_blocks 1 zfs_slow_io_events_per_second 20 zfs_spa_discard_memory_limit 16777216 zfs_special_class_metadata_reserve_pct 25 zfs_sync_pass_deferred_free 2 zfs_sync_pass_dont_compress 8 zfs_sync_pass_rewrite 2 zfs_sync_taskq_batch_pct 75 zfs_traverse_indirect_prefetch_limit 32 zfs_trim_extent_bytes_max 134217728 zfs_trim_extent_bytes_min 32768 zfs_trim_metaslab_skip 0 zfs_trim_queue_limit 10 zfs_trim_txg_batch 32 zfs_txg_history 100 zfs_txg_timeout 5 zfs_unflushed_log_block_max 131072 zfs_unflushed_log_block_min 1000 zfs_unflushed_log_block_pct 400 zfs_unflushed_log_txg_max 1000 zfs_unflushed_max_mem_amt 1073741824 zfs_unflushed_max_mem_ppm 1000 zfs_unlink_suspend_progress 0 zfs_user_indirect_is_special 1 zfs_vdev_aggregate_trim 0 zfs_vdev_aggregation_limit 1048576 zfs_vdev_aggregation_limit_non_rotating 131072 zfs_vdev_async_read_max_active 3 zfs_vdev_async_read_min_active 1 zfs_vdev_async_write_active_max_dirty_percent 60 zfs_vdev_async_write_active_min_dirty_percent 30 zfs_vdev_async_write_max_active 10 zfs_vdev_async_write_min_active 2 zfs_vdev_cache_bshift 16 zfs_vdev_cache_max 16384 zfs_vdev_cache_size 0 zfs_vdev_default_ms_count 200 zfs_vdev_default_ms_shift 29 zfs_vdev_initializing_max_active 1 zfs_vdev_initializing_min_active 1 zfs_vdev_max_active 1000 zfs_vdev_max_auto_ashift 14 zfs_vdev_min_auto_ashift 9 zfs_vdev_min_ms_count 16 zfs_vdev_mirror_non_rotating_inc 0 zfs_vdev_mirror_non_rotating_seek_inc 1 zfs_vdev_mirror_rotating_inc 0 zfs_vdev_mirror_rotating_seek_inc 5 zfs_vdev_mirror_rotating_seek_offset 1048576 zfs_vdev_ms_count_limit 131072 zfs_vdev_nia_credit 5 zfs_vdev_nia_delay 5 zfs_vdev_open_timeout_ms 1000 zfs_vdev_queue_depth_pct 1000 zfs_vdev_raidz_impl cycle [fastest] original scalar sse2 ssse3 avx2 avx512f avx512bw zfs_vdev_read_gap_limit 32768 zfs_vdev_rebuild_max_active 3 zfs_vdev_rebuild_min_active 1 zfs_vdev_removal_max_active 2 zfs_vdev_removal_min_active 1 zfs_vdev_scheduler unused zfs_vdev_scrub_max_active 3 zfs_vdev_scrub_min_active 1 zfs_vdev_sync_read_max_active 10 zfs_vdev_sync_read_min_active 10 zfs_vdev_sync_write_max_active 10 zfs_vdev_sync_write_min_active 10 zfs_vdev_trim_max_active 2 zfs_vdev_trim_min_active 1 zfs_vdev_write_gap_limit 4096 zfs_vnops_read_chunk_size 1048576 zfs_wrlog_data_max 137438953472 zfs_zevent_len_max 512 zfs_zevent_retain_expire_secs 900 zfs_zevent_retain_max 2000 zfs_zil_clean_taskq_maxalloc 1048576 zfs_zil_clean_taskq_minalloc 1024 zfs_zil_clean_taskq_nthr_pct 100 zil_maxblocksize 131072 zil_min_commit_timeout 5000 zil_nocacheflush 0 zil_replay_disable 0 zil_slog_bulk 786432 zio_deadman_log_all 0 zio_dva_throttle_enabled 0 zio_requeue_io_start_cut_in_line 1 zio_slow_io_ms 30000 zio_taskq_batch_pct 80 zio_taskq_batch_tpq 0 zvol_inhibit_dev 0 zvol_major 230 zvol_max_discard_blocks 16384 zvol_prefetch_bytes 131072 zvol_request_sync 0 zvol_threads 32 zvol_volmode 1 VDEV cache disabled, skipping section ZIL committed transactions: 0 Commit requests: 10 Flushes to stable storage: 10 Transactions to SLOG storage pool: 0 Bytes 0 Transactions to non-SLOG storage pool: 0 Bytes 0

          We've done more testing and gathered more information for your review:

          As a starter  Version of ZFS and Lustre required to reproduce the OOM:

          ZFS version.   : 2.1.14_1llnl-1
          Lustre Version: lustre-2.15.4_1.llnl
          Ram total available: 187 Gib

          FIRST RUN:

          zfs_arc_max was set to default: 0

          I also set the kernel to slab_nomerge to pinpoint the culprit slab if any.
          There was no culprit slab found as nothing was jumping at the top of the in slabtop.
          But we also check arcstat to see what was going on in the arc during testing.

          command used arcstat 1:

           time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  size     c  avail
          11:29:21   19K  6.8K     34   569    4  6.3K   99     0    0   93G   93G    25G
          11:29:22   19K  6.9K     34   550    4  6.3K   99     0    0   93G   93G    25G
          11:29:23   20K  7.0K     34   546    3  6.5K   99     0    0   93G   93G    25G
          11:29:24   20K  7.2K     34   600    4  6.6K   99     0    0   93G   93G    25G
          11:29:25   21K  7.5K     35   633    4  6.9K   99     0    0   94G   93G    24G
          11:29:26   21K  7.6K     34   602    4  7.0K   99     0    0   93G   93G    25G
          11:29:27   21K  7.8K     36   633    4  7.1K   99     0    0   94G   93G    25G
          11:29:28   22K  8.0K     35   616    4  7.4K   99     0    0   94G   93G    24G
          11:29:29   23K  9.0K     38  1.2K    7  7.8K   99   503    3   94G   93G    24G
          11:29:30   22K   10K     45  2.7K   17  7.8K   99  2.0K   13   95G   93G    23G
          11:29:31   23K   10K     46  2.7K   17  8.2K  100  2.0K   13   97G   93G    21G
          11:29:32   24K   11K     46  2.8K   17  8.3K  100  2.1K   13   99G   93G    20G
          11:29:33   24K   11K     46  2.8K   17  8.8K  100  2.1K   13  101G   93G    18G
          11:29:34   24K   11K     46  2.7K   17  8.7K  100  2.0K   13  103G   93G    16G
          11:29:35   26K   12K     47  2.9K   17  9.5K  100  2.1K   13  105G   93G    13G
          11:29:36   26K   12K     47  2.8K   16  9.6K  100  2.0K   12  108G   93G    11G
          11:29:37   26K   12K     47  2.8K   16  9.7K  100  2.0K   12  110G   93G   8.7G
          11:29:38   27K   13K     47  2.8K   16   10K  100  2.0K   12  113G   93G   5.9G
          11:29:39   27K   13K     47  2.9K   16   10K  100  2.0K   12  116G   93G   3.1G
          11:29:40   27K   13K     48  2.7K   16   10K  100  1.9K   12  118G   93G    10M
          11:29:41   41K   15K     37  5.7K   17  9.9K  100  4.9K   15  121G   93G  -3.6G
          

          At that point we were OOMed.
          --------------------------------------------------------------------------------------------------------------------------

          SECOND RUN:

          for the second run we set the zfs_arc_max to 47 Gib

          keeping monitoring the arcstat we can see it going right through the limit set to 47Gib:

          arcstat 1:

              time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  size     c  avail
          11:57:19   19K  9.0K     47  3.1K   23  5.9K  100  2.5K   20   76G   47G    41G
          11:57:20   19K  9.1K     47  3.1K   23  6.1K  100  2.5K   19   77G   47G    40G
          11:57:21   20K  9.4K     47  3.0K   22  6.4K  100  2.4K   18   78G   47G    39G
          11:57:22   20K  9.6K     47  3.1K   22  6.5K  100  2.5K   18   79G   47G    38G
          11:57:23   21K  10.0K     47  3.2K   22  6.8K  100  2.5K   18   80G   47G    37G
          11:57:24   21K  10.0K     47  3.1K   21  6.9K  100  2.4K   17   81G   47G    36G
          11:57:25   21K   10K     47  3.2K   21  7.2K  100  2.5K   18   82G   47G    35G
          11:57:26   22K   10K     47  3.2K   21  7.4K  100  2.5K   17   84G   47G    34G
          11:57:27   22K   10K     47  3.0K   20  7.8K  100  2.4K   16   85G   47G    32G
          11:57:28   23K   11K     47  3.2K   20  8.0K  100  2.5K   16   87G   47G    30G
          11:57:29   23K   11K     47  3.1K   19  8.3K  100  2.4K   16   89G   47G    29G
          11:57:30   24K   11K     48  3.2K   20  8.7K  100  2.5K   16   91G   47G    27G
          11:57:31   24K   12K     48  3.2K   19  8.9K  100  2.5K   16   93G   47G    24G
          11:57:32   25K   12K     48  3.1K   18  9.4K  100  2.4K   15   95G   47G    22G
          11:57:33   26K   12K     48  3.2K   19  9.6K  100  2.4K   15   98G   47G    20G
          11:57:34   26K   13K     49  3.2K   19  9.9K  100  2.5K   15  100G   47G    17G
          11:57:35   27K   13K     49  3.3K   18   10K  100  2.5K   15  103G   47G    14G
          11:57:36   28K   14K     49  3.4K   19   10K  100  2.6K   15  106G   47G    11G
          11:57:37   28K   14K     49  3.3K   18   11K  100  2.5K   14  109G   47G   8.4G
          11:57:38   29K   14K     50  3.3K   18   11K  100  2.6K   14  113G   47G   5.0G
          11:57:39   30K   15K     50  3.3K   17   11K  100  2.5K   14  116G   47G   669M
          11:57:40   44K   17K     39  6.1K   18   11K  100  5.3K   16  120G   47G  -3.9G
          

          ------------------------------------------------------------------------------------------------------------

          I will look into zfs with our zfs developers and update ticket.

           

           

          carbonneau Eric Carbonneau (Inactive) added a comment - - edited We've done more testing and gathered more information for your review: As a starter  Version of ZFS and Lustre required to reproduce the OOM: ZFS version.   : 2.1.14_1llnl-1 Lustre Version: lustre-2.15.4_1.llnl Ram total available: 187 Gib FIRST RUN: zfs_arc_max was set to default: 0 I also set the kernel to slab_nomerge to pinpoint the culprit slab if any. There was no culprit slab found as nothing was jumping at the top of the in slabtop. But we also check arcstat to see what was going on in the arc during testing. command used arcstat 1: time read miss miss% dmis dm% pmis pm% mmis mm% size c avail 11:29:21 19K 6.8K 34 569 4 6.3K 99 0 0 93G 93G 25G 11:29:22 19K 6.9K 34 550 4 6.3K 99 0 0 93G 93G 25G 11:29:23 20K 7.0K 34 546 3 6.5K 99 0 0 93G 93G 25G 11:29:24 20K 7.2K 34 600 4 6.6K 99 0 0 93G 93G 25G 11:29:25 21K 7.5K 35 633 4 6.9K 99 0 0 94G 93G 24G 11:29:26 21K 7.6K 34 602 4 7.0K 99 0 0 93G 93G 25G 11:29:27 21K 7.8K 36 633 4 7.1K 99 0 0 94G 93G 25G 11:29:28 22K 8.0K 35 616 4 7.4K 99 0 0 94G 93G 24G 11:29:29 23K 9.0K 38 1.2K 7 7.8K 99 503 3 94G 93G 24G 11:29:30 22K 10K 45 2.7K 17 7.8K 99 2.0K 13 95G 93G 23G 11:29:31 23K 10K 46 2.7K 17 8.2K 100 2.0K 13 97G 93G 21G 11:29:32 24K 11K 46 2.8K 17 8.3K 100 2.1K 13 99G 93G 20G 11:29:33 24K 11K 46 2.8K 17 8.8K 100 2.1K 13 101G 93G 18G 11:29:34 24K 11K 46 2.7K 17 8.7K 100 2.0K 13 103G 93G 16G 11:29:35 26K 12K 47 2.9K 17 9.5K 100 2.1K 13 105G 93G 13G 11:29:36 26K 12K 47 2.8K 16 9.6K 100 2.0K 12 108G 93G 11G 11:29:37 26K 12K 47 2.8K 16 9.7K 100 2.0K 12 110G 93G 8.7G 11:29:38 27K 13K 47 2.8K 16 10K 100 2.0K 12 113G 93G 5.9G 11:29:39 27K 13K 47 2.9K 16 10K 100 2.0K 12 116G 93G 3.1G 11:29:40 27K 13K 48 2.7K 16 10K 100 1.9K 12 118G 93G 10M 11:29:41 41K 15K 37 5.7K 17 9.9K 100 4.9K 15 121G 93G -3.6G At that point we were OOMed. -------------------------------------------------------------------------------------------------------------------------- SECOND RUN: for the second run we set the zfs_arc_max to 47 Gib keeping monitoring the arcstat we can see it going right through the limit set to 47Gib: arcstat 1: time read miss miss% dmis dm% pmis pm% mmis mm% size c avail 11:57:19 19K 9.0K 47 3.1K 23 5.9K 100 2.5K 20 76G 47G 41G 11:57:20 19K 9.1K 47 3.1K 23 6.1K 100 2.5K 19 77G 47G 40G 11:57:21 20K 9.4K 47 3.0K 22 6.4K 100 2.4K 18 78G 47G 39G 11:57:22 20K 9.6K 47 3.1K 22 6.5K 100 2.5K 18 79G 47G 38G 11:57:23 21K 10.0K 47 3.2K 22 6.8K 100 2.5K 18 80G 47G 37G 11:57:24 21K 10.0K 47 3.1K 21 6.9K 100 2.4K 17 81G 47G 36G 11:57:25 21K 10K 47 3.2K 21 7.2K 100 2.5K 18 82G 47G 35G 11:57:26 22K 10K 47 3.2K 21 7.4K 100 2.5K 17 84G 47G 34G 11:57:27 22K 10K 47 3.0K 20 7.8K 100 2.4K 16 85G 47G 32G 11:57:28 23K 11K 47 3.2K 20 8.0K 100 2.5K 16 87G 47G 30G 11:57:29 23K 11K 47 3.1K 19 8.3K 100 2.4K 16 89G 47G 29G 11:57:30 24K 11K 48 3.2K 20 8.7K 100 2.5K 16 91G 47G 27G 11:57:31 24K 12K 48 3.2K 19 8.9K 100 2.5K 16 93G 47G 24G 11:57:32 25K 12K 48 3.1K 18 9.4K 100 2.4K 15 95G 47G 22G 11:57:33 26K 12K 48 3.2K 19 9.6K 100 2.4K 15 98G 47G 20G 11:57:34 26K 13K 49 3.2K 19 9.9K 100 2.5K 15 100G 47G 17G 11:57:35 27K 13K 49 3.3K 18 10K 100 2.5K 15 103G 47G 14G 11:57:36 28K 14K 49 3.4K 19 10K 100 2.6K 15 106G 47G 11G 11:57:37 28K 14K 49 3.3K 18 11K 100 2.5K 14 109G 47G 8.4G 11:57:38 29K 14K 50 3.3K 18 11K 100 2.6K 14 113G 47G 5.0G 11:57:39 30K 15K 50 3.3K 17 11K 100 2.5K 14 116G 47G 669M 11:57:40 44K 17K 39 6.1K 18 11K 100 5.3K 16 120G 47G -3.9G ------------------------------------------------------------------------------------------------------------ I will look into zfs with our zfs developers and update ticket.    

          Originally I thought this was related to cgroups, which is a client side issue, but I didn't notice the "OSS" in the summary.

          The majority of memory usage looks to be in "slab_reclaimable:185039 slab_unreclaimable:2082954" or at least I can't see anything else reported in the meminfo dump.  Are you able to capture /proc/slabinfo or slabtop from the OSS while the IOR is running, and see what is using the majority of memory?

          This might relate to the use of deferred fput on the server, which can accumulate over time if the server was running a long time? There were two recent patches related to this that landed on master, but these may only be relevant for osd-ldiskfs and not osd-zfs (which I assume is the case here):

          adilger Andreas Dilger added a comment - Originally I thought this was related to cgroups, which is a client side issue, but I didn't notice the "OSS" in the summary. The majority of memory usage looks to be in " slab_reclaimable:185039 slab_unreclaimable:2082954 " or at least I can't see anything else reported in the meminfo dump.  Are you able to capture /proc/slabinfo or slabtop from the OSS while the IOR is running, and see what is using the majority of memory? This might relate to the use of deferred fput on the server, which can accumulate over time if the server was running a long time? There were two recent patches related to this that landed on master, but these may only be relevant for osd-ldiskfs and not osd-zfs (which I assume is the case here): https://review.whamcloud.com/51731 " LU-16973 osd: adds SB_KERNMOUNT flag " https://review.whamcloud.com/51805 " LU-16973 ptlrpc: flush delayed file desc if idle "

          carbonneau, ofaaland,

          it would be useful to include the actual stack traces from the OSS when the OOM is hit, not just the meminfo.  Otherwise it is difficult to know what is actually allocating the memory.  Sometimes it is just an innocent bystander process, but in many cases the actual offender is caught because it is the one allocating memory the most frequently...

          adilger Andreas Dilger added a comment - carbonneau , ofaaland , it would be useful to include the actual stack traces from the OSS when the OOM is hit, not just the meminfo.  Otherwise it is difficult to know what is actually allocating the memory.  Sometimes it is just an innocent bystander process, but in many cases the actual offender is caught because it is the one allocating memory the most frequently...

          I forgot to mention the issue occurs during read operations on the OSS. During the write operations the OSS memory was consnt.

           

          carbonneau Eric Carbonneau (Inactive) added a comment - I forgot to mention the issue occurs during read operations on the OSS. During the write operations the OSS memory was consnt.  

          People

            pjones Peter Jones
            carbonneau Eric Carbonneau (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: