Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16674

read contention on "job_stats" "/proc" file

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      At the CEA, we observed a lot of contention on job_stats with a lot of jobs.

      In some critical cases (incorrect job_name pattern or a lot of read accesses on job_stats), this could lead the target to freeze.

      When reading the proc file "job_stats" a read lock "ojs_lock" is taken to read the list of job. While reading this file, no job stat entry can be added or removed, so the target processes must wait for the write lock.

      I think we can avoid those kinds of contention:

      Save the last job read

      For each read, the processes go through the entire list of jobs to get the entry corresponding to the file offset.

      static void *lprocfs_jobstats_seq_start(struct seq_file *p, loff_t *pos) 
      .....
      off--;                                                                  
      list_for_each_entry(job, &stats->ojs_list, js_list) {                   
              if (!off--)                                                     
                      return job;                                             
      

      This could be improved by saving the last job accessed and its corresponding offset.

      Use RCU lock instead of rwlock

      RCU locking to protect the job stat list should be contention free for the read accesses.

      Attachments

        Activity

          People

            eaujames Etienne Aujames
            eaujames Etienne Aujames
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: