[LU-3897] Hang up in ldlm_pools_shrink under OOM Created: 06/Sep/13  Updated: 05/Dec/13  Resolved: 05/Dec/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.3
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Sebastien Buisson (Inactive) Assignee: Zhenyu Xu
Resolution: Not a Bug Votes: 0
Labels: None

Attachments: File UJF_crash_foreach_bt.gz     Zip Archive UJF_dmesg.zip    
Severity: 3
Rank (Obsolete): 10187

 Description   

Hi,

Several Bull customers running lustre 2.1.x had a hang in ldlm_pools_shrink on a lustre client (login node).
The system was hung and a dump was initiated from the the bmc, by sending a NMI.
The dump shows there was no more activity on the system. The 12 CPUs are idle (swapper).
A lot of processes are in page_fault(), blocked in ldlm_pools_shrink().
I have attached the output of the "foreach bt" crash command. Let me know if you need the vmcore file.

Each time, we can see a lot of OOM messages in the syslog of the dump files.

This issue looks like LU-2468.

Thanks,
Sebastien.



 Comments   
Comment by Peter Jones [ 06/Sep/13 ]

Bobijam

What do you recommend here?

Thanks

Peter

Comment by Zhenyu Xu [ 09/Sep/13 ]

Oleg,

Would you consider http://review.whamcloud.com/#/c/4954/ ?I don't know whether Vitaly has addressed the issue somewhere.

Comment by Oleg Drokin [ 09/Sep/13 ]

I dout that patch would help.

Sebastien, can we get kernel dmesg log please, I wonder what's there/

It seems that something forgot to release namespace lock on the client (at least I don't see anything that's in an area guarded by this lock, everybody seems to wait to acquire it), possibly there's some place that forgets to release it on an error exit path and here's hope there's some clue in the dmesg.

Comment by Sebastien Buisson (Inactive) [ 09/Sep/13 ]

Hi,

This is the dmesg requested by Oleg, taken from the crash dump.

Sebastien.

Comment by Zhenyu Xu [ 08/Nov/13 ]

hmm, why the dmesg does not have any lustre log informations? I just saw call trace and mem-info report in it.

Comment by Sebastien Buisson (Inactive) [ 08/Nov/13 ]

Probably there is so many information dumped in the dmesg regarding the OOM issue that it replaces the Lustre logs.

From the dump, do you know how we can access the memory containing the Lustre debug logs?

Comment by Zhenyu Xu [ 13/Nov/13 ]

I haven't find relevant info to get the root cause.

What's the memory usage on the clients? I wonder whether it is the client uses too much memory for lock or for data cache.

what's the "lctl get_param ldlm.namespaces.*.lru_size" values?

Comment by Sebastien Buisson (Inactive) [ 14/Nov/13 ]

Hi,

Not too easy to get the lru_size from the crash dump.
We dumped the 'struct obd_device' in the obd_devs array, then the 'struct obd_namespace" to reach the 'ns_nr_unused' field corresponding to the lru_size information.

All values are 0.

HTH,
Sebastien.

Comment by Zhenyu Xu [ 27/Nov/13 ]

If lru_size is 0, it means lock does not take much memory, what is the "lctl get_param llite.*.max_cached_mb" output? Can you set it to a smaller value to keep less data cache on the client side?

Comment by Sebastien Buisson (Inactive) [ 29/Nov/13 ]

Hi,

In the crash dump I looked at the value of ll_async_page_max, which is 12365796, so max_cached_mb is around 48 GB, ie 3/4 of total node memory.

But in any case, remember there is a regression in CLIO in 2.1 causing this max_cached_mb value to be never used!

So there is no means to limit Lustre data cache

Sebastien.

Comment by Jinshan Xiong (Inactive) [ 03/Dec/13 ]

Hi Sebastien, can you please show me the output of `slabtop'?

Comment by Sebastien Buisson (Inactive) [ 03/Dec/13 ]

Hi,

I have no idea how I can run slabtop inside crash :/
I have uploaded a tarball to the ftp server, under uploads/LU-3897. It contains the vmcore, the vmlinux, and the Lustre modules.

Maybe you will be able to get the information you are looking for.

Thanks,
Sebastien.

Comment by Jinshan Xiong (Inactive) [ 03/Dec/13 ]

`kmem -s' will print out slabinfo in this case. Thanks for the crashdump, and I will take a look.

Comment by Jinshan Xiong (Inactive) [ 03/Dec/13 ]

the system ran out of memory, from the output of `kmem -i':

crash> kmem -i 
              PAGES        TOTAL      PERCENTAGE
 TOTAL MEM  16487729      62.9 GB         ----
      FREE    37266     145.6 MB    0% of TOTAL MEM
      USED  16450463      62.8 GB   99% of TOTAL MEM
    SHARED      401       1.6 MB    0% of TOTAL MEM
   BUFFERS       99       396 KB    0% of TOTAL MEM
    CACHED       57       228 KB    0% of TOTAL MEM
      SLAB    22513      87.9 MB    0% of TOTAL MEM

TOTAL SWAP   255857     999.4 MB         ----
 SWAP USED   255856     999.4 MB   99% of TOTAL SWAP
 SWAP FREE        1         4 KB    0% of TOTAL SWAP

Both system memory and swap space were used up. Page cache and slab cache were normal, no excessive usage at all.

It turned out most of the memory were used for anonymous memory mapping.

crash> kmem -V
  VM_STAT:
          NR_FREE_PAGES: 37266
       NR_INACTIVE_ANON: 956162
         NR_ACTIVE_ANON: 15158596
       NR_INACTIVE_FILE: 0
         NR_ACTIVE_FILE: 99
         NR_UNEVICTABLE: 0
               NR_MLOCK: 0
          NR_ANON_PAGES: 16115037

there was 16115037 NR_ANON_PAGES which is 61G in size. I guess the application is probably using too much memory or badly written. I don't think we can help in this case.

Comment by Jinshan Xiong (Inactive) [ 05/Dec/13 ]

Please reopen this ticket if you have more questions

Generated at Sat Feb 10 01:37:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.