[LU-13490] readahead thread breaks read stats in jobstats Created: 29/Apr/20 Updated: 23/Sep/21 Resolved: 14/May/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0 |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Shuichi Ihara | Assignee: | Wang Shilong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 2 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
Parallel readahead introcued after [root@mgs ~]# lctl conf_param vLustre.sys.jobid_var=procname_uid
[root@client ~]# ior -w -t 1m -b 1g -e -o /vLustre/out/file -k
[root@client ~]# echo 3 > /proc/sys/vm/drop_caches
[root@client ~]# ior -r -t 1m -b 1g -e -o /vLustre/out/file -k
[root@oss1 ~]# lctl get_param obdfilter.*.job_stats
obdfilter.vLustre-OST0000.job_stats=
job_stats:
- job_id: ior.0
snapshot_time: 1588138284
read_bytes: { samples: 16, unit: bytes, min: 1048576, max: 4194304, sum: 62914560 }
write_bytes: { samples: 256, unit: bytes, min: 4194304, max: 4194304, sum: 1073741824 }
getattr: { samples: 0, unit: reqs }
setattr: { samples: 0, unit: reqs }
punch: { samples: 0, unit: reqs }
sync: { samples: 1, unit: reqs }
destroy: { samples: 0, unit: reqs }
create: { samples: 0, unit: reqs }
statfs: { samples: 0, unit: reqs }
get_info: { samples: 0, unit: reqs }
set_info: { samples: 0, unit: reqs }
quotactl: { samples: 0, unit: reqs }
- job_id: kworker/u4:1.0
snapshot_time: 1588138285
read_bytes: { samples: 135, unit: bytes, min: 4194304, max: 4194304, sum: 566231040 }
write_bytes: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0 }
getattr: { samples: 0, unit: reqs }
setattr: { samples: 0, unit: reqs }
punch: { samples: 0, unit: reqs }
sync: { samples: 0, unit: reqs }
destroy: { samples: 0, unit: reqs }
create: { samples: 0, unit: reqs }
statfs: { samples: 0, unit: reqs }
get_info: { samples: 0, unit: reqs }
set_info: { samples: 0, unit: reqs }
quotactl: { samples: 0, unit: reqs }
- job_id: kworker/u4:3.0
snapshot_time: 1588138284
read_bytes: { samples: 106, unit: bytes, min: 4194304, max: 4194304, sum: 444596224 }
write_bytes: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0 }
getattr: { samples: 0, unit: reqs }
setattr: { samples: 0, unit: reqs }
punch: { samples: 0, unit: reqs }
sync: { samples: 0, unit: reqs }
destroy: { samples: 0, unit: reqs }
create: { samples: 0, unit: reqs }
statfs: { samples: 0, unit: reqs }
get_info: { samples: 0, unit: reqs }
set_info: { samples: 0, unit: reqs }
quotactl: { samples: 0, unit: reqs }
it's bad idea of tracking read stats per kernel thread rathar than real application pid. it won't be able to see read stats per job id. |
| Comments |
| Comment by Andreas Dilger [ 29/Apr/20 ] |
|
There is an exception table for jobid that skips specific thread names, but I don't think it can work in this case. Maybe it would be better to check if the thread has PF_KERNTHREAD set? Also, I think it is possible to cache the jobid in struct ll_jnode_info, so if that was set then it would report the correct jobid to the OSS. |
| Comment by Peter Jones [ 29/Apr/20 ] |
|
Shilong Could you please advise? Thanks Peter |
| Comment by Wang Shilong (Inactive) [ 30/Apr/20 ] |
|
One of possible way to solve the issue could be we pass original task_struct to job_id, and use passed task_struct to pass jobid information which should fix this problem. |
| Comment by Gerrit Updater [ 30/Apr/20 ] |
|
Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/38426 |
| Comment by Gerrit Updater [ 14/May/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38426/ |
| Comment by Peter Jones [ 14/May/20 ] |
|
Landed for 2.14 |