Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13490

readahead thread breaks read stats in jobstats

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0
    • Lustre 2.14.0
    • None
    • 2
    • 9223372036854775807

    Description

      Parallel readahead introcued after LU-12043 (commit c279167) and kernel threads does readahead in parallel, but there is a regression that broek read stats in jobstats.
      Here is a reproducer.

      [root@mgs ~]# lctl conf_param vLustre.sys.jobid_var=procname_uid
      
      [root@client ~]# ior -w -t 1m -b 1g -e -o /vLustre/out/file -k
      [root@client ~]# echo 3 > /proc/sys/vm/drop_caches 
      [root@client ~]# ior -r -t 1m -b 1g -e -o /vLustre/out/file -k
      
      [root@oss1 ~]# lctl get_param obdfilter.*.job_stats
      obdfilter.vLustre-OST0000.job_stats=
      job_stats:
      - job_id:          ior.0
        snapshot_time:   1588138284
        read_bytes:      { samples:          16, unit: bytes, min: 1048576, max: 4194304, sum:        62914560 }
        write_bytes:     { samples:         256, unit: bytes, min: 4194304, max: 4194304, sum:      1073741824 }
        getattr:         { samples:           0, unit:  reqs }
        setattr:         { samples:           0, unit:  reqs }
        punch:           { samples:           0, unit:  reqs }
        sync:            { samples:           1, unit:  reqs }
        destroy:         { samples:           0, unit:  reqs }
        create:          { samples:           0, unit:  reqs }
        statfs:          { samples:           0, unit:  reqs }
        get_info:        { samples:           0, unit:  reqs }
        set_info:        { samples:           0, unit:  reqs }
        quotactl:        { samples:           0, unit:  reqs }
      - job_id:          kworker/u4:1.0
        snapshot_time:   1588138285
        read_bytes:      { samples:         135, unit: bytes, min: 4194304, max: 4194304, sum:       566231040 }
        write_bytes:     { samples:           0, unit: bytes, min:       0, max:       0, sum:               0 }
        getattr:         { samples:           0, unit:  reqs }
        setattr:         { samples:           0, unit:  reqs }
        punch:           { samples:           0, unit:  reqs }
        sync:            { samples:           0, unit:  reqs }
        destroy:         { samples:           0, unit:  reqs }
        create:          { samples:           0, unit:  reqs }
        statfs:          { samples:           0, unit:  reqs }
        get_info:        { samples:           0, unit:  reqs }
        set_info:        { samples:           0, unit:  reqs }
        quotactl:        { samples:           0, unit:  reqs }
      - job_id:          kworker/u4:3.0
        snapshot_time:   1588138284
        read_bytes:      { samples:         106, unit: bytes, min: 4194304, max: 4194304, sum:       444596224 }
        write_bytes:     { samples:           0, unit: bytes, min:       0, max:       0, sum:               0 }
        getattr:         { samples:           0, unit:  reqs }
        setattr:         { samples:           0, unit:  reqs }
        punch:           { samples:           0, unit:  reqs }
        sync:            { samples:           0, unit:  reqs }
        destroy:         { samples:           0, unit:  reqs }
        create:          { samples:           0, unit:  reqs }
        statfs:          { samples:           0, unit:  reqs }
        get_info:        { samples:           0, unit:  reqs }
        set_info:        { samples:           0, unit:  reqs }
        quotactl:        { samples:           0, unit:  reqs }
      

      it's bad idea of tracking read stats per kernel thread rathar than real application pid. it won't be able to see read stats per job id.

      Attachments

        Activity

          [LU-13490] readahead thread breaks read stats in jobstats
          adilger Andreas Dilger made changes -
          Link New: This issue is related to DDN-2473 [ DDN-2473 ]
          lixi_wc Li Xi made changes -
          Labels Original: exap
          pjones Peter Jones made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]
          wshilong Wang Shilong (Inactive) made changes -
          Labels New: exap
          pjones Peter Jones made changes -
          Fix Version/s New: Lustre 2.14.0 [ 14490 ]
          pjones Peter Jones made changes -
          Assignee Original: WC Triage [ wc-triage ] New: Wang Shilong [ wshilong ]
          sihara Shuichi Ihara made changes -
          Link New: This issue is related to EX-1134 [ EX-1134 ]
          sihara Shuichi Ihara made changes -
          Description Original: Parallel readahead introcued after commit and kernel threads does readahead in parallel, but there is a regression that broek read stats in jobstats.
           Here is a reproducer.
          {noformat}
          [root@mgs ~]# lctl conf_param vLustre.sys.jobid_var=procname_uid

          [root@client ~]# ior -w -t 1m -b 1g -e -o /vLustre/out/file -k
          [root@client ~]# echo 3 > /proc/sys/vm/drop_caches
          [root@client ~]# ior -r -t 1m -b 1g -e -o /vLustre/out/file -k

          [root@oss1 ~]# lctl get_param obdfilter.*.job_stats
          obdfilter.vLustre-OST0000.job_stats=
          job_stats:
          - job_id: ior.0
            snapshot_time: 1588138284
            read_bytes: { samples: 16, unit: bytes, min: 1048576, max: 4194304, sum: 62914560 }
            write_bytes: { samples: 256, unit: bytes, min: 4194304, max: 4194304, sum: 1073741824 }
            getattr: { samples: 0, unit: reqs }
            setattr: { samples: 0, unit: reqs }
            punch: { samples: 0, unit: reqs }
            sync: { samples: 1, unit: reqs }
            destroy: { samples: 0, unit: reqs }
            create: { samples: 0, unit: reqs }
            statfs: { samples: 0, unit: reqs }
            get_info: { samples: 0, unit: reqs }
            set_info: { samples: 0, unit: reqs }
            quotactl: { samples: 0, unit: reqs }
          - job_id: kworker/u4:1.0
            snapshot_time: 1588138285
            read_bytes: { samples: 135, unit: bytes, min: 4194304, max: 4194304, sum: 566231040 }
            write_bytes: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0 }
            getattr: { samples: 0, unit: reqs }
            setattr: { samples: 0, unit: reqs }
            punch: { samples: 0, unit: reqs }
            sync: { samples: 0, unit: reqs }
            destroy: { samples: 0, unit: reqs }
            create: { samples: 0, unit: reqs }
            statfs: { samples: 0, unit: reqs }
            get_info: { samples: 0, unit: reqs }
            set_info: { samples: 0, unit: reqs }
            quotactl: { samples: 0, unit: reqs }
          - job_id: kworker/u4:3.0
            snapshot_time: 1588138284
            read_bytes: { samples: 106, unit: bytes, min: 4194304, max: 4194304, sum: 444596224 }
            write_bytes: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0 }
            getattr: { samples: 0, unit: reqs }
            setattr: { samples: 0, unit: reqs }
            punch: { samples: 0, unit: reqs }
            sync: { samples: 0, unit: reqs }
            destroy: { samples: 0, unit: reqs }
            create: { samples: 0, unit: reqs }
            statfs: { samples: 0, unit: reqs }
            get_info: { samples: 0, unit: reqs }
            set_info: { samples: 0, unit: reqs }
            quotactl: { samples: 0, unit: reqs }
          {noformat}
          it's bad idea of tracking read stats per kernel thread rathar than real application pid. it won't be able to see read stats per job id.
          New: Parallel readahead introcued after LU-12043 (commit c279167) and kernel threads does readahead in parallel, but there is a regression that broek read stats in jobstats.
           Here is a reproducer.
          {noformat}
          [root@mgs ~]# lctl conf_param vLustre.sys.jobid_var=procname_uid

          [root@client ~]# ior -w -t 1m -b 1g -e -o /vLustre/out/file -k
          [root@client ~]# echo 3 > /proc/sys/vm/drop_caches
          [root@client ~]# ior -r -t 1m -b 1g -e -o /vLustre/out/file -k

          [root@oss1 ~]# lctl get_param obdfilter.*.job_stats
          obdfilter.vLustre-OST0000.job_stats=
          job_stats:
          - job_id: ior.0
            snapshot_time: 1588138284
            read_bytes: { samples: 16, unit: bytes, min: 1048576, max: 4194304, sum: 62914560 }
            write_bytes: { samples: 256, unit: bytes, min: 4194304, max: 4194304, sum: 1073741824 }
            getattr: { samples: 0, unit: reqs }
            setattr: { samples: 0, unit: reqs }
            punch: { samples: 0, unit: reqs }
            sync: { samples: 1, unit: reqs }
            destroy: { samples: 0, unit: reqs }
            create: { samples: 0, unit: reqs }
            statfs: { samples: 0, unit: reqs }
            get_info: { samples: 0, unit: reqs }
            set_info: { samples: 0, unit: reqs }
            quotactl: { samples: 0, unit: reqs }
          - job_id: kworker/u4:1.0
            snapshot_time: 1588138285
            read_bytes: { samples: 135, unit: bytes, min: 4194304, max: 4194304, sum: 566231040 }
            write_bytes: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0 }
            getattr: { samples: 0, unit: reqs }
            setattr: { samples: 0, unit: reqs }
            punch: { samples: 0, unit: reqs }
            sync: { samples: 0, unit: reqs }
            destroy: { samples: 0, unit: reqs }
            create: { samples: 0, unit: reqs }
            statfs: { samples: 0, unit: reqs }
            get_info: { samples: 0, unit: reqs }
            set_info: { samples: 0, unit: reqs }
            quotactl: { samples: 0, unit: reqs }
          - job_id: kworker/u4:3.0
            snapshot_time: 1588138284
            read_bytes: { samples: 106, unit: bytes, min: 4194304, max: 4194304, sum: 444596224 }
            write_bytes: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0 }
            getattr: { samples: 0, unit: reqs }
            setattr: { samples: 0, unit: reqs }
            punch: { samples: 0, unit: reqs }
            sync: { samples: 0, unit: reqs }
            destroy: { samples: 0, unit: reqs }
            create: { samples: 0, unit: reqs }
            statfs: { samples: 0, unit: reqs }
            get_info: { samples: 0, unit: reqs }
            set_info: { samples: 0, unit: reqs }
            quotactl: { samples: 0, unit: reqs }
          {noformat}
          it's bad idea of tracking read stats per kernel thread rathar than real application pid. it won't be able to see read stats per job id.
          sihara Shuichi Ihara created issue -

          People

            wshilong Wang Shilong (Inactive)
            sihara Shuichi Ihara
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: