Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1325

loading large enough binary from lustre trigger OOM killer during page_fault while a large amount of memory is available

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.1.0
    • None
    • 3
    • 6414

    Description

      While loading a large enough binary, we hit OOM during page_fault while the system have still a lot of free memory available (in our case we still have 60 GB of free memory on a node with 64 GB installed).

      The problem doesn't popup is the binary is not big enough and if there isn't enough concurrency. A simple ls works, a small program too, but if the size increase to few MB with some DSO around and the binary is run with mpirun, the page_fault looks interrupted by a signal into cl_lock_state_wait then the error code return up to ll_fault0 where is it replaced by a VM_FAULT_ERROR which trigger the OOM.

      Here is the extract from the trace collected (and attached) :
      (cl_lock.c:986:cl_lock_state_wait()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)
      (cl_lock.c:1310:cl_enqueue_locked()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)
      (cl_lock.c:2175:cl_lock_request()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)
      (cl_io.c:393:cl_lockset_lock_one()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)
      (cl_io.c:444:cl_lockset_lock()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)
      (cl_io.c:479:cl_io_lock()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)
      (cl_io.c:1033:cl_io_loop()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)
      (llite_mmap.c:298:ll_fault0()) Process leaving (rc=51 : 51 : 33)

      We are able to reproduce the problem at will, by scheduling through the batch scheduler a mpi job of 32 cores, 2 nodes (16 cores per nodes) on the customer system. I hasn't been able to reproduce it on an another system.

      I also tried to retrieve the culprit signal by setting panic_on_oom, but unfortunately it seems to have been cleared during the oom handling. Strac'ing is too complicated with the mpi layer.

      Alex.

      Attachments

        Activity

          People

            jay Jinshan Xiong (Inactive)
            louveta Alexandre Louvet
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: