[LU-14111] Report per-target eviction count Created: 03/Nov/20  Updated: 03/Feb/24  Resolved: 14/Feb/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Improvement Priority: Minor
Reporter: Aurelien Degremont (Inactive) Assignee: Aurelien Degremont (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Rank (Obsolete): 9223372036854775807

 Description   

Eviction is a standard mechanism for Lustre targets to protect themselves against dead or misbehaving clients.

On a live filesystem, evictions happen, eventually and it could be useful for sysadmin to have an exact counter to monitor them and take action if needed.

I will propose a patch where an eviction counter is added to obd_device, increased when an eviction occurs and exposed through lctl get_param.



 Comments   
Comment by Gerrit Updater [ 03/Nov/20 ]

Aurelien Degremont (degremoa@amazon.com) uploaded a new patch: https://review.whamcloud.com/40528
Subject: LU-14111 obdclass: count eviction per obd_device
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ecfa119aafe911e5ee8949d5b082fd4dc742e35f

Comment by Andreas Dilger [ 07/Nov/20 ]

Actually, this information is already in osc.*.import:

    connection:
       failover_nids: [ 192.168.20.1@tcp ]
       current_connection: 192.168.20.1@tcp
       connection_attempts: 2623
       generation: 2623
       in-progress_invalidations: 0
       idle: 0 sec

so it makes sense to include this information there.

Comment by Aurelien Degremont (Inactive) [ 09/Nov/20 ]

But the patch is tracking that server-side, not client side.

 

I was wondering where was a good place to report that data on server-side and as I understood that the move away from /proc is pushing a direction where new data should have its own /sys entries, rather than adding it to a more complex output one.

 

Comment by Aurelien Degremont (Inactive) [ 10/Nov/20 ]

(moving this out of the patch review as this is not related)

The upstream kernel folks are just starting to come to this realization, and trying to do crazy things like adding a syscall to read from an array of fd's at the same time, but it is just more efficient to have all of the values in a single file that is formatted for easy parsing (YAML).

I was thinking for a while that lots of get_param entries has a yaml-like syntax or almost yaml-compatible and that this was a good path forward, but I've never seen any commitment or official recommendation that's the way to go and that those params should be made YAML compatible as mush as possible?

 

Comment by Andreas Dilger [ 10/Nov/20 ]

I don't know if there was ever an "official" documentation to that effect, but at WC the use of YAML has definitely been adopted as the standard format for new "complex" parameter files ever since IML started to be developed.

From my experience, it is possible to create YAML-compliant files (I use http://yaml-online-parser.appspot.com/ to verify this) that are both machine readable and human readable. Examples of "new" complex files include osc.*.import and obdfilter.*.exports.*.export, obdfilter.*.job_stats, obdfilter.*.lfsck_status, and others. Also, there is a "lfs getstripe --yaml" option for dumping file layouts in YAML format, and "lctl --device MGS llog_print $fsname-client" and "lctl --device MGS llog_print params" to dump the config records.

Ideally, we could also convert old "complex" files (e.g. brw_stats, but with a new filename) over to YAML format as well, but that hasn't happened yet.

Comment by Gerrit Updater [ 14/Feb/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/40528/
Subject: LU-14111 obdclass: count eviction per obd_device
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3c69d46e1766480c0ffd1bef840b4e167b4cf88e

Comment by Peter Jones [ 14/Feb/23 ]

Landed for 2.16

Comment by James A Simmons [ 25/Aug/23 ]

New maloo test fails in interop with 2.15

Comment by James A Simmons [ 25/Aug/23 ]

I see a discussion of YAML output for various parameters. I'm working on a YAML netlink version for the debugfs issue. It should provide the ability to express any stats in YAMl format when requested.

Comment by Gerrit Updater [ 25/Aug/23 ]

"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52098
Subject: LU-14111 tests: only support recovery-small test 146 for 2.15.54+
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 77f9f3232f685b10518596ac69f2961ba7c342fa

Comment by Gerrit Updater [ 06/Sep/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52098/
Subject: LU-14111 tests: only support recovery-small test 146 for 2.15.54+
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b034dd27dd39483e40f91ea82d3f5c62b514ec54

Generated at Sat Feb 10 03:06:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.