[LU-8674] A lightweight internal policy engine of Lustre for HSM, OST pool migration, file heat, inotify and so on Created: 07/Oct/16  Updated: 02/Jun/17  Resolved: 02/Jun/17

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor
Reporter: Li Xi (Inactive) Assignee: Li Xi (Inactive)
Resolution: Fixed Votes: 0
Labels: cea

Rank (Obsolete): 9223372036854775807

 Description   

Features of Lustre like HSM and OST pool based on SSD have enabled a lot of
new use cases, which makes data management of Lustre file system a new daily
work. The Robinhood Policy Engine is able to do various kinds of data
management based on pre-configured rules and has been confirmed as a versatile
tool to manage large Lustre file systems. However, using Robinhood requires
external machine with stronge CPU, memory and storage. And setuping and
configuring Robinhood properly requires extra efforts from users. That is why
we(DDN) are proposing a policy engine which is implemented completely inside
Lustre. In order to avoid performance regression and complexity, this policy
engine is implemented in a very lightweight way. That means, it can only
support a limited part of use cases, which might be much less than what
Robinhood can do. However, this new policy engine could still be useful for
a lot of use cases, especially for the ones which are relatively simpler.

The core component of policy engine is an arithmetic unit which can calculate
the value of a rule that can be configured by users in run time. The rule is an
arithmetic expression. An expression is either 1) an number, or 2) a constant
name, or 3) a system attribute name, or 4) an object attribute name, or 5) two
expression that are combined together by an operator.

The arithmetic values of all expressions are calculated as unsigned 64 bit
number, so all unsigned 64 bit numbers can be used in the expression.

A constant name is only an alias of an 64 bit number, which should already
been pre-defined in Lustre codes. The value of the constant, thus, is already
been pre-defined. An typical rule for HSM might use constants hsma_bit_[
archive|restore|remove|cancel] to indicate the HSM actions that
should be taken after evaluating the rule.

A system attribute is the system wide attribute of Lustre or the kernel. The
free space is an typical example of system state on OST. And free inode number
on MDT is another example. Date time is an example of the system attribute
which is independent of Lustre. When evaluating the value of the expression,
the value of the system attribute will be used. And since the arithmetic value
is 64 bit, all the attribute values will be 64 bit numbers.

An object attribute name could be any attribute name that is available from
the corresponding Lustre objects, usually MDT objects and OST objects.
Avaliable object attribute names include but not limited to the attributes
that can be read by getattr() syscall, such as atime, mtime, ctime, size,
mode, uid, gid, blocks, type, flags, nlink, rdev, blksize, etc.

An operator could be almost any integer operations that can be used in C
language, including arithmetic operators (+, -, *, /, %), relational and
logical operators (==, !=, >, >=, <, <=), and bitwise operators\
(&, |, ^, <<, >>).

In order to simplify the parsing of the expression in Lustre, the expression
of the rule should be configured in the form of Polish notation
(https://en.wikipedia.org/wiki/Polish_notation). An rule that will trigger
HSM archive action if the modify timestamp of the file is 1 minute ealier than
the system time could be set by the following command:

echo -n "& - >= mtime - sys_time 60 1 hsma_bit_archive" > /proc/fs/lustre/mdt/lustre-MDT0000/hsm_policy_rule

Currently, no optimization of the expression will be done by Lustre when being
set, even the way to optimize it is obvious. For example, "&& 0 expression1" is
essentially equal to "0", however, the value of "expression1" will still be
evaluated when getting the value of the entire expression. That means, before
setting the expression rule, it should be optimized either manually or through
external tool. An external userspace tool which can transfer normal notation
with parenthesis to Polish notation and at the same time optimize the
expression could be really helpful for the users.

The configured rule could be evaluated either synchronously or asynchronously.
Synchronous evaluation means to evaluate the expression in the context of a
service thread. For example, when a file is being accessed, the expression of
the rule will be calculated in the service thread. Corresponding actions will
be triggered by the policy engine if the value of the expression matches a
predefined pattern. In order to avoid performance regression, the speed of
the synchronous evaluation is ciritial. And that is the reason why only one
rule is supported by synchronous evaluation.

However, after synchronous evaluation triggers action job, asynchronous
evaluation could be done on multiple pre-configured rules when handling
the action job. Asynchrouse evaluation is done in a dedicated thread pool
of policy engine so no performance regression will be caused by asynchrous
evaluation. And the service thread pool could scan the whole OST/MDT from
time to time to find the objects that match the rules.

A set of rules like "condition1 -> action1", "condition2 -> action2",
and "condition3 -> action3" could be configured to asynchronous evaluation
of policy engine. And in order to trigger job properly by synchronous
evaluation, a rule that equal to but more optimized than
"|| condition1 || condition2 condition3" should be set.

Obviously, this policy engine has some limitations. And all the things that
this policy engine could do on HSM should be able be accomplished by using
Robinhood.

Even though the current codes only have HSM support, this policy engine could
be potentially used for other features which need configurable policies.
Following is a list of the features:

  • Data migration between SSD OST pool and normal OST pool. The policy engine
    could use a new feature named file heat to decide which data to move to SSD
    pool.
  • RPC classfication in NRS TBF policy. Currently, NRS TBF policy classify RPCs
    based on NID/JobID. By using the expression of this policy engine, the TBF
    policy could classify RPCs based on an expression of RPC attributes which can
    be configured by users. This could enable much more use cases than existing
    classfication.
  • Inotify is a useful feature when montoring the events of file systems. But
    Lustre itself doesn't support system wide inotify. By using the lightweight
    policy engine, a notification mechanism that might be more powerful and
    efficient than inotify could be implemented for Lustre. In order to act like
    inotify, when the pre-configured rule is matched, instead of applying
    background actions, this policy engine could send a notification to the
    watching application. Because an expression could be used to filter the
    desired events from the original source, the extra overhead such as RPCs
    caused by notification could be minimized.
  • Cache management tuning on different levels. The policy engine could be used
    in cache management systems in order to make the decision of data prefetching
    or cache eviction.


 Comments   
Comment by Gerrit Updater [ 07/Oct/16 ]

Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/23014
Subject: LU-8674 obd: Add a general policy engine
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d6add7688c92d3ac6be218cc14e29977c49b73d9

Comment by Peter Jones [ 07/Oct/16 ]

Thanks for the suggestion Li Xi

Comment by Li Xi (Inactive) [ 02/Jun/17 ]

I am closing this because user-space policy engine looks much flexible than this.

Generated at Sat Feb 10 02:19:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.