      During SWL for toss 4.6-6rc3 and also 4,7-2rc2, we found that an IOR run could trigger an OOM on an OSS node.

      We were able to reproduce this issue using IOR under srun.

      The following srun/ior command was used:

      srun -N 70 -n 7840 /g/g0/carbonne/ior/src/ior -a MPIIO -i 5 -b 256MB -t 128MB -v -g -F -C -w -W -r -o /p/lflood/carbonne/oomtest/ior_1532/ior

      Example at 2023-10-17 12:31:28 on garter5, see console log.

      Mem-info from one oom-killer console log message set is:


      active_anon:22868 inactive_anon:69168 isolated_anon:0
       active_file:357 inactive_file:770 isolated_file:250
       unevictable:10785 dirty:0 writeback:0
       slab_reclaimable:185039 slab_unreclaimable:2082954
       mapped:12536 shmem:46663 pagetables:2485 bounce:0
       free:134668 free_pcp:203 free_cma:0
      Node 0 active_anon:75888kB inactive_anon:87304kB active_file:1840kB
       inactive_file:1464kB  unevictable:43080kB isolated(anon):0kB
       isolated(file):208kB mapped:19680kB dirty:0kB writeback:0kB
       shmem:127712kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 26624kB
       writeback_tmp:0kB kernel_stack:31416kB pagetables:3896kB
       all_unreclaimable? no
      Node 0 DMA free:11264kB min:4kB low:16kB high:28kB active_anon:0kB
       inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
       writepending:0kB present: 15996kB managed:15360kB mlocked:0kB bounce:0kB
       free_pcp:0kB local_pcp:0kB free_cma:0kB
       lowmem_reserve[]: 0 1183 94839 94839 94839
      Node 0 DMA32 free:375156kB min:556kB low:1764kB high:2972kB
       active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:4kB
       unevictable:0kB writepending:0kB present:1723228kB managed:1325704kB
       mlocked:0kB bounce:0kB free_pcp:260kB local_pcp:0kB free_cma:0kB
       lowmem_reserve[]: 0 0 93655 93655 93655
      Node 0 Normal free:46072kB min:44044kB low:139944kB high:235844kB
       active_anon:75888kB inactive_anon:87304kB active_file:1860kB
       inactive_file:1584kB unevictable: 43080kB writepending:0kB
       present:97517568kB managed:95912024kB mlocked:43080kB bounce:0kB
       free_pcp:372kB local_pcp:0kB free_cma:0kB lowmem_reserve[]: 0 0 0 0 0
      Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
       1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
      Node 0 DMA32: 3*4kB (M) 66*8kB (UM) 202*16kB (UM) 152*32kB (UM)
       168*64kB (UM) 85*128kB (UM) 24*256kB (UM) 20*512kB (UM) 11*1024kB (UM)
       7*2048kB (UM) 74*4096kB (# M) = 375356kB
      Node 0 Normal: 151*4kB (MEH) 853*8kB (UMEH) 640*16kB (MEH) 412*32kB (MEH)
       132*64kB (ME) 33*128kB (UE) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 43524kB
      Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      53515 total pagecache pages
      0 pages in swap cache
      Swap cache stats: add 0, delete 0, find 0/0
      Free swap  = 0kB
      Total swap = 0kB
      49980022 pages RAM
      0 pages HighMem/MovableOnly
      896433 pages reserved
      0 pages hwpoisoned


      local Jira ticket:  TOSS-6158




