[LU-5615] Lustre 2.5.2 with CGROUP Created: 12/Sep/14  Updated: 09/Oct/21  Resolved: 09/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.2
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Atul Yadav Assignee: WC Triage
Resolution: Won't Do Votes: 0
Labels: None
Environment:

Lustre 2.5.2 CentOS 6.5 CGROUP


Severity: 3
Rank (Obsolete): 15706

 Description   

Dear Team,

We are trying to setup CGROUP in our lustre environment.
In our setup lustre
MDT
OST

Please share the guidance or simple setup example of cgroup with lustre .

Thank You
Atul Yadav



 Comments   
Comment by Richard Henwood (Inactive) [ 12/Sep/14 ]

Hi Atul,

Cgroups allow you to allocate resources—such as CPU time, system memory, network bandwidth, or combinations of these resources—among user-defined groups of tasks (processes) running on a system [1]. Given there is wide scope of Cgroups, please provide a use case for your setup to focus the discussion.

thanks,
Richard

1. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch01.html

Comment by Atul Yadav [ 15/Sep/14 ]

Dear Team,

Thanks for the info.
But after going through document, we came to know cgroup require pid to bind with cpu.

In lustre we are unable to locate the PID.

Please guide us to identify the PID for the lustre service.

Thank you
Atul Yadav

Comment by Richard Henwood (Inactive) [ 15/Sep/14 ]

Hi Atul,

To get an idea of the different threads that Lustre employs, I suggest searching the Operations Manual for 'thread':
https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml

You will see that Lustre uses multiple threads across multiple machines. Identifying the PID of the Lustre service is therefore different depending on which machine you are on.

I suggest it will be more helpful for you to describe the system behaviour intending to satisfy by using cgroups. i.e. why are you looking into cgroups on Lustre?

best regards,
Richard

Comment by Atul Yadav [ 15/Sep/14 ]

Dear Team,

We want to configure cgroup in such a way that mds and oss services should run on cpu0 and cpu1 exclusively
For that we want to identify MDS and OSS service.

Please guide to complete this activity .

Regards,
Atul Yadav

Comment by Robert Read (Inactive) [ 15/Sep/14 ]

There is no PID for an "MDS" or "OSS" service because all Lustre services are running in the kernel. As Richard pointed out, there are numerous kernel threads that implement aspects of the services, and we also have some support for binding some service threads to specific CPUs (https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#dbdoclet.mdsbinding), though that is primarily to optimize rpc handling and not for isolation. I believe the only way to isolate multiple Lustre services on a single physical node is to run them in virtual machines.

Comment by Atul Yadav [ 16/Sep/14 ]

Dear Team,

Thanks for the information and guidance.
But Were we can set "mds_num_cpt" for binding the CPU0 with MDS.

Thank You
Atul Yadav

Comment by Richard Henwood (Inactive) [ 16/Sep/14 ]

I believe the section you need on configuring thread counts is immediately above in the Operations Manual:
https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#dbdoclet.mdstuning

Please share your experiences.
best regards,
Richard

Comment by Atul Yadav [ 16/Sep/14 ]

Dear Admin,

As per the info we added parameter under module file:
options lnet networks=tcp0(eth0)
options mdt mdt_num_cpts=[0-1]

But when we load the lustre module, our mdt parameter is not coming like lnet we are getting.
[root@io1 ~]# modprobe -v lustre
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/net/lustre/libcfs.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/lvfs.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/net/lustre/lnet.ko networks=tcp0(eth0)
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/obdclass.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/ptlrpc.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/fld.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/fid.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/mdc.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/osc.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/lov.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/lustre.ko

Please guide us.

Thank YOu
Atul Yadav

Comment by Richard Henwood (Inactive) [ 16/Sep/14 ]

in the documentation, the example given is::

options mds ...

can you repeat, this time using 'mds' instead of 'mdt'?

Comment by Atul Yadav [ 17/Sep/14 ]

Dear Admin,

Still same output, after changing "mdt" to "mds" .

Thank You
Atul Yadav

Comment by Atul Yadav [ 17/Sep/14 ]

Dear team,

Thanks now its working fine .....

We will check and update you.

Thank You
Atul Yadav

Comment by Atul Yadav [ 17/Sep/14 ]

Dear Team,

The output of the commands are given below:-
[root@IO1 ~]# modprobe -v mdt
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/net/lustre/libcfs.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/lvfs.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/net/lustre/lnet.ko networks=tcp0(em1),o2ib0(ib0),o2ib1(ib1) forwarding="enabled"
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/obdclass.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/ptlrpc.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/lquota.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/fld.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/fid.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/mdt.ko mds_num_cpts="[0-3]"
[root@IO1 ~]# modprobe -v lustre
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/mdc.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/osc.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/lov.ko
insmod /lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/extra/kernel/fs/lustre/lustre.ko

Thank You
Atul Yadav

Comment by Richard Henwood (Inactive) [ 17/Sep/14 ]

Thanks for working through this. I have couple of requests:

1. Please share your progress and any learnings.

2. Your work has identified an error in the manual. I have corrected this error. Can you please review my proposed change here:
http://review.whamcloud.com/#/c/11938/

Generated at Sat Feb 10 01:53:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.