[LU-11211] Performance degradation in mdtest Created: 04/Aug/18  Updated: 12/Aug/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Abe Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11213 DNE3: remote mkdir() in ROOT/ by default Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We are observing performance degradation in mutest IOPs testing at around 50K IOPS with zfs. 

Configuration has DNE and DOM, however it seems there is only one MDT that is being utilized out of 3 MDTs:

 

steps for the configuration in place:

 

root@mds-201 ~]# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
mdtpool 744G 2.35G 742G - 0% 0% 1.00x ONLINE -
mdtpool1 744G 12.1M 744G - 0% 0% 1.00x ONLINE -
mdtpool2 372G 8.96M 372G - 0% 0% 1.00x ONLINE -
[root@mds-201 ~]#

 

lfs setstripe -E 1M -L mdt -E -1 /mnt/lustre/domdir
lfs setstripe -L mdt --component-end 1M /mnt/lustre/domdir/MDT0
lfs setstripe -L mdt --component-end 1M /mnt/lustre/domdir/MDT1
lfs setstripe -L mdt --component-end 1M /mnt/lustre/domdir/MDT5

 

 

snapshots of the results:

Command line used: ./mdtest -d /mnt/lustre/domdir-mdts/testdir-4 -n 47662 -F -e -u -i 1
Path: /mnt/lustre/domdir-mdts
FS: 351.4 TiB Used FS: 0.0% Inodes: 184.4 Mi Used Inodes: 0.0%

44 tasks, 2097128 files

SUMMARY: (of 1 iterations)
Operation Max Min Mean Std Dev
--------- — --- ---- -------
File creation : 18572.644 18572.644 18572.644 0.000
File stat : 22206.703 22206.703 22206.703 0.000
File read : 66375.392 66375.392 66375.392 0.000
File removal : 11525.913 11525.913 11525.913 0.000
Tree creation : 1016.260 1016.260 1016.260 0.000
Tree removal : 2.698 2.698 2.698 0.000

– finished at 08/03/2018 09:41:01 –
================= 11 client END =======================

 

thanks,

Abe

 

 



 Comments   
Comment by Andreas Dilger [ 04/Aug/18 ]

There are two independent layout parameters that you need to configure for your testing - the directory layout, which controls the MDT where the files will be created, and the file layout, which controls where the data will be located. The lfs mkdir command works very similarly to the lfs setstripe command, and can also be used as lfs setdirstripe to create new subdirectories with non-default parameters.

There are two possible options for distributing files across MDTs:

  • remote directories, which split the namespace at directory boundaries across MDTs
  • striped directories, which distribute a single (typically very large or active) directory across multiple MDTs

If you want to create a set of directories for multiple threads/jobs, create one or more remote directories on multiple MDTs as below:

lfs setstripe -E 1M -L mdt -E -1 /mnt/lustre/domdir
for mdt_idx in {1..4}; do
    lfs mkdir -i $mdt_idx /mnt/lustre/domdir/dir-$mdt_idx
done

The file layout will be inherited by the new subdirectories below domdir, and the files/directories created within each dir-N subdirectory will be created on the specific MDT.

If you want to create a single directory that distributes files within the directory across multiple MDTs:

lfs mkdir -c 4 /mnt/lustre/domdir
lfs setstripe -E 1M -L mdt -E -1 /mnt/lustre/domdir

The files/directories created within the domdir directory will be distributed across the MDTs based on the filename hash, so is not guaranteed to be a uniform distribution if you are testing with e.g. 4 threads and 4 MDTs. With larger numbers of threads, the distribution of files and directories created within this specific directory will be spread relatively evenly between MDTs. For lower-level subdirectories, they will typically be created on the same MDT as the "top level" subdirectory itself, unless you use the "-D" option, which will cause the default lfs mkdir settings to be inherited by newly-created subdirectories as well.

Note that the creation of remote or striped directories are themselves fairly slow, files created within the striped or remote subdirectories scales fairly well.

Comment by Abe [ 04/Aug/18 ]

HI Andreas,

 

I tried the lfs cli, the stripe dir are not being created:

 

do these have to be run on the client or ads server ?

root@client1-221 ~]# lfs setstripe -E 1M -L mdt -E -1 /mnt/lustre/domdir

[root@client1-221 ~]# for mdt_idx in {1..3}; do
> lfs mkdir i $mdt_idx /mnt/lustre/domdir/dir$mdt_idx done
> done
lfs mkdir: unable to open '/mnt/lustre/domdir/dir-1': Not a directory (20)
lfs setdirstripe: cannot create stripe dir '/mnt/lustre/domdir/dir-1': Not a directory
lfs mkdir: unable to open '/mnt/lustre/domdir/dir-2': Not a directory (20)
lfs setdirstripe: cannot create stripe dir '/mnt/lustre/domdir/dir-2': Not a directory

lfs mkdir: unable to open '/mnt/lustre/domdir/dir-3': Not a directory (20)
lfs setdirstripe: cannot create stripe dir '/mnt/lustre/domdir/dir-3': Not a directory
[root@client1-221 ~]# ls -l /mnt/lustre/domdir
rw-rr- 1 root root 0 Aug 3 07:58 /mnt/lustre/domdir

 

thanks,

Abe

 

 

Comment by Andreas Dilger [ 04/Aug/18 ]

And, what does "lfs df -I" and "lctl dl" on the client show? It almost seems like the client is not connected to the MDTs.

Comment by Abe [ 04/Aug/18 ]

Hi Andreas,

This is what is shown on the client:

[root@client1-221 ~]# lfs df -i
UUID Inodes IUsed IFree IUse% Mounted on
tempAA-MDT0000_UUID 5849109 665 5848444 0% /mnt/lustre[MDT:0]
tempAA-MDT0001_UUID 121145392 454 121144938 0% /mnt/lustre[MDT:1]
tempAA-MDT0005_UUID 11049225 695 11048530 0% /mnt/lustre[MDT:5]
tempAA-OST0001_UUID 1474016764 5948 1474010816 0% /mnt/lustre[OST:1]
tempAA-OST0002_UUID 1474016697 5881 1474010816 0% /mnt/lustre[OST:2]
tempAA-OST0003_UUID 1474016698 5882 1474010816 0% /mnt/lustre[OST:3]
tempAA-OST0004_UUID 1474016889 6073 1474010816 0% /mnt/lustre[OST:4]
tempAA-OST0005_UUID 1474013785 2937 1474010848 0% /mnt/lustre[OST:5]
tempAA-OST0006_UUID 1474016025 5241 1474010784 0% /mnt/lustre[OST:6]
tempAA-OST0007_UUID 1474021210 10490 1474010720 0% /mnt/lustre[OST:7]
tempAA-OST0008_UUID 1474021849 11129 1474010720 0% /mnt/lustre[OST:8]

filesystem_summary: 138043726 1814 138041912 0% /mnt/lustre

[root@client1-221 ~]# lctl dl
0 UP mgc MGC10.10.10.200@o2ib 0273e362-9f5c-c316-8ecc-9d49c6d30d7e 4
1 UP lov tempAA-clilov-ffff880f8daf0000 dbfabe5b-94f9-0e2b-627d-73d1797b90b1 3
2 UP lmv tempAA-clilmv-ffff880f8daf0000 dbfabe5b-94f9-0e2b-627d-73d1797b90b1 4
3 UP mdc tempAA-MDT0000-mdc-ffff880f8daf0000 dbfabe5b-94f9-0e2b-627d-73d1797b90b1 4
4 UP mdc tempAA-MDT0001-mdc-ffff880f8daf0000 dbfabe5b-94f9-0e2b-627d-73d1797b90b1 4
5 UP mdc tempAA-MDT0005-mdc-ffff880f8daf0000 dbfabe5b-94f9-0e2b-627d-73d1797b90b1 4
6 UP osc tempAA-OST0001-osc-ffff880f8daf0000 dbfabe5b-94f9-0e2b-627d-73d1797b90b1 4
7 UP osc tempAA-OST0002-osc-ffff880f8daf0000 dbfabe5b-94f9-0e2b-627d-73d1797b90b1 4
8 UP osc tempAA-OST0003-osc-ffff880f8daf0000 dbfabe5b-94f9-0e2b-627d-73d1797b90b1 4
9 UP osc tempAA-OST0004-osc-ffff880f8daf0000 dbfabe5b-94f9-0e2b-627d-73d1797b90b1 4
10 UP osc tempAA-OST0005-osc-ffff880f8daf0000 dbfabe5b-94f9-0e2b-627d-73d1797b90b1 4
11 UP osc tempAA-OST0006-osc-ffff880f8daf0000 dbfabe5b-94f9-0e2b-627d-73d1797b90b1 4
12 UP osc tempAA-OST0007-osc-ffff880f8daf0000 dbfabe5b-94f9-0e2b-627d-73d1797b90b1 4
13 UP osc tempAA-OST0008-osc-ffff880f8daf0000 dbfabe5b-94f9-0e2b-627d-73d1797b90b1 4
[root@client1-221 ~]#

 

thanks,

Abe

Comment by Andreas Dilger [ 04/Aug/18 ]

How did you manage to get MDT0005 without any of the intervening MDTs? It might be that we don't handle discontiguous MDT indices very well, but that wouldn't explain why MDT0001 is failing. Also, you appear to be missing OST0000. Not sure if that is related, but not standard in any case.

It might be the problem is an omission on my part. I thought the "domdir" already existed, but if not then the first "lfs setstripe ... /mnt/lustre/domdir" command will create it as a regular file. Sorry for the confusion. Instead, please remove that directory first and run "mkdir /mnt/lustre/domdir", or use some other new directory for testing.

Comment by Abe [ 04/Aug/18 ]

Hi Andreas,
I went a ahead and removed and created the stripe on domdir but
lfs mkdir -I 1 /mnt/lustre/domdir/dir-1 is still failing!!

1. rm -rf /mnt/lustre/domdir
2. lfs setstripe -E 1M -L mdt -E -1 /mnt/lustre/domdir/dir-1
3. lfs mkdir -i 1 /mnt/lustre/domdir/dir-1
lfs mkdir: unable to open '/mnt/lustre/domdir/dir-1': Not a directory (20)
lfs setdirstripe: cannot create stripe dir '/mnt/lustre/domdir/dir-1': Not a directory

thanks,
Abe

Comment by Abe [ 05/Aug/18 ]

Hi Andreas,

I went ahead and rebuild the fs and it looks cleaner now, however I'm still getting the error:lfs mkdir -i 3 /mnt/lustre/domdir/dir-3

root@client1-221 ~]# ls -l /mnt/lustre/domdir
rw-rr- 1 root root 0 Aug 4 11:52 /mnt/lustre/domdir

lfs mkdir: unable to open '/mnt/lustre/domdir/dir-3': Not a directory (20)
lfs setdirstripe: cannot create stripe dir '/mnt/lustre/domdir/dir-3': Not a directory

[root@client1-221 ~]# lfs df -i
UUID Inodes IUsed IFree IUse% Mounted on
tempAA-MDT0000_UUID 132450214 336 132449878 0% /mnt/lustre[MDT:0]
tempAA-MDT0001_UUID 134052115 321 134051794 0% /mnt/lustre[MDT:1]
tempAA-MDT0002_UUID 67025428 321 67025107 0% /mnt/lustre[MDT:2]
tempAA-OST0001_UUID 1474012554 490 1474012064 0% /mnt/lustre[OST:1]
tempAA-OST0002_UUID 1474012554 490 1474012064 0% /mnt/lustre[OST:2]
tempAA-OST0003_UUID 1474012554 490 1474012064 0% /mnt/lustre[OST:3]
tempAA-OST0004_UUID 1474012554 490 1474012064 0% /mnt/lustre[OST:4]
tempAA-OST0005_UUID 1474012554 490 1474012064 0% /mnt/lustre[OST:5]
tempAA-OST0006_UUID 1474012554 490 1474012064 0% /mnt/lustre[OST:6]
tempAA-OST0007_UUID 1474012554 490 1474012064 0% /mnt/lustre[OST:7]
tempAA-OST0008_UUID 1474012554 490 1474012064 0% /mnt/lustre[OST:8]

filesystem_summary: 333527757 978 333526779 0% /mnt/lustre

[root@client1-221 ~]# lctl dl
0 UP mgc MGC10.10.10.200@o2ib e3cd38f8-4513-cd31-7a60-c5e06e69a891 4
1 UP lov tempAA-clilov-ffff88020f67b000 5f494a9f-147d-a145-4dc6-1e9ee03117b6 3
2 UP lmv tempAA-clilmv-ffff88020f67b000 5f494a9f-147d-a145-4dc6-1e9ee03117b6 4
3 UP mdc tempAA-MDT0000-mdc-ffff88020f67b000 5f494a9f-147d-a145-4dc6-1e9ee03117b6 4
4 UP mdc tempAA-MDT0001-mdc-ffff88020f67b000 5f494a9f-147d-a145-4dc6-1e9ee03117b6 4
5 UP mdc tempAA-MDT0002-mdc-ffff88020f67b000 5f494a9f-147d-a145-4dc6-1e9ee03117b6 4
6 UP osc tempAA-OST0005-osc-ffff88020f67b000 5f494a9f-147d-a145-4dc6-1e9ee03117b6 4
7 UP osc tempAA-OST0006-osc-ffff88020f67b000 5f494a9f-147d-a145-4dc6-1e9ee03117b6 4
8 UP osc tempAA-OST0007-osc-ffff88020f67b000 5f494a9f-147d-a145-4dc6-1e9ee03117b6 4
9 UP osc tempAA-OST0008-osc-ffff88020f67b000 5f494a9f-147d-a145-4dc6-1e9ee03117b6 4
10 UP osc tempAA-OST0001-osc-ffff88020f67b000 5f494a9f-147d-a145-4dc6-1e9ee03117b6 4
11 UP osc tempAA-OST0002-osc-ffff88020f67b000 5f494a9f-147d-a145-4dc6-1e9ee03117b6 4
12 UP osc tempAA-OST0003-osc-ffff88020f67b000 5f494a9f-147d-a145-4dc6-1e9ee03117b6 4
13 UP osc tempAA-OST0004-osc-ffff88020f67b000 5f494a9f-147d-a145-4dc6-1e9ee03117b6 4
[root@client1-221 ~]#

 

Thanks,

Abe

Comment by Andreas Dilger [ 05/Aug/18 ]

Abe, to be clear, you must create the directory before the "lfs setstripe" command:

mkdir /mnt/lustre/domdir
lfs setstripe -E 1M -L mdt -E -1 /mnt/lustre/domdir
lfs mkdir -i ...
Comment by Abe [ 05/Aug/18 ]

This seems to have worked after removing and creating the dir.

how do we issue mdtest command with -d directory.

do we specify all the created 3 directories for the 3 mats as in:

 

./mdtest -d /mnt/lustre/domdir/dir-0 /mnt/lustre/domdir/dir-1 /mnt/lustre/domdir/dir-2

 

thanks,

Abe

 

Comment by Abe [ 05/Aug/18 ]

Hi Andreas,

It seems to only use one mdt and it is not using the other mdts.

when we issue the mdt command do we need to specify all the directories :

e.g : ./mdtest -d /mnt/lustre/domdir/dir-0 /mnt/lustre/domdir/dir-1 /mnt/lustre/domdir/dir-2 ???

 

[root@mds-201 ~]# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
mdtpool 744G 2.31G 742G - 0% 0% 1.00x ONLINE -
mdtpool1 744G 8.99M 744G - 0% 0% 1.00x ONLINE -
mdtpool2 372G 8.99M 372G - 0% 0% 1.00x ONLINE -
mdtpool3 372G 8.88M 372G - 0% 0% 1.00x ONLINE -

 

thanks,

Abe

Comment by Andreas Dilger [ 05/Aug/18 ]

If you want all of the subdirectories under domdir to also be created as remote directories, you could try the following:

lfs mkdir -c -1 /mnt/lustre/domdir
lfs setdirstripe -D -c -1 /mnt/lustre/domdir
lfs setstripe -E 1M -L mdt -E -1 /mnt/lustre/domdir

This will make every subdirectory striped across all MDTs, and will inherit this setting for further subdirectories. That is not necessarily ideal for every directory,, but it will allow you to distribute the mdtest workload across multiple MDTs more easily. We are working on better ways to achieve this goal, but this may be sufficient for your current needs.

Comment by Abe [ 07/Aug/18 ]

Hi Andreas,

this seems to have worked, the workload got distributed across the mdts:

root@mds-201 ~]# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
mdtpool 744G 124G 620G - 43% 16% 1.00x ONLINE -
mdtpool1 744G 124G 620G - 43% 16% 1.00x ONLINE -
mdtpool2 372G 124G 248G - 65% 33% 1.00x ONLINE -
mdtpool3 372G 123G 249G - 66% 33% 1.00x ONLINE -
mdtpool4 372G 123G 249G - 66% 33% 1.00x ONLINE -
[root@mds-201 ~]#

 

Performance have gone up by 10% until the cpu util went up to 100%,

if we are to add another mds server will DNE work and the workload get distributed across the 2 MDS servers using the same namespace ?

 

thanks,

Abe

 

 

Comment by Andreas Dilger [ 07/Aug/18 ]

In our testing in the past, adding a second MDT on the same MDS improves performance by 50%, but with enough clients the increase in performance with a separate MDS per MDT was much better, about 90%.

Comment by Abe [ 07/Aug/18 ]

Hi Andreas,

 

I'm adding a 2nd MDS server with its own mdts, how do the clients mount to the same namespace to two separate servers having different ip addresses ?

e.g:

On the clients servers: ( They will have 2 separate mounts ?)

1st Mds server mount:

mount -t lustre 10.10.10.200@o2ib:10.10.10.201@o2ib:/tempAA /mnt/lustre

And the 2nd Mds server mount:

mount -t lustre 10.10.10.200@o2ib:10.10.10.202@o2ib:/tempAA /mnt/lustre

thanks,

Abe

 

Comment by Andreas Dilger [ 07/Aug/18 ]

Abe, the IP address (or more correctly "Lustre NID") listed on the client mount is the address and failover for the MGS. This will typically be located on MDS0 with MDT0000. It is preferred to have the MGS use a separate device so that it can be failed over independently of MDT0000.

In any case, the clients do not need to change anything for their mount command, since there is only a single MGS for the filesystem. The actual connections to the MDT(s) are handled internally by the Lustre configuration log in the same way as with OSTs.

Comment by Abe [ 10/Aug/18 ]

Hi Andreas,

Below is the config for 2 mds servers, 1 msg server and 1 client..

when I mount the client, access to the fs /mnt/lustre is very slow:

is there something wrong with the way, I'm mounting the client ?

do I need specify the kids for the Mgc and the 2 mds servers in the mount command?

I do see an error when I tried to mount the client ..

 

root@client1-221 ~]# mount -t lustre 10.10.10.251@o2ib:10.10.10.201@o2ib:10.10.10.200@o2ib:/tempAA /mnt/lustre

[root@client1-221 ~]#

134.067195] LNet: 1355:0:(o2iblnd.c:943:kiblnd_create_conn()) peer 10.10.10.251@o2ib - queue depth reduced from 128 to 63  to allow for qp creation

[  134.237519] LustreError: 1935:0:(mgc_request.c:1576:mgc_apply_recover_logs()) mgc: cannot find uuid by nid 10.10.10.200@o2ib

[  134.239654] Lustre: 1935:0:(mgc_request.c:1802:mgc_process_recover_nodemap_log()) MGC10.10.10.251@o2ib: error processing recovery log tempAA-cliir: rc = -2

[  134.239726] LustreError: 1935:0:(mgc_request.c:2132:mgc_process_log()) MGC10.10.10.251@o2ib: recover log tempAA-cliir failed, not fatal: rc = -2

[  134.251092] Lustre: Mounted tempAA-client

ot@client1-221 ~]# ls -l /mnt/lustre

^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C

 

 

2 mds config:

mds 1;

[root@mds-201 ~]#

[root@mds-201 ~]# lctl dl

  0 UP osd-zfs tempAA-MDT0000-osd tempAA-MDT0000-osd_UUID 18

  1 UP mgc MGC10.10.10.251@o2ib ba5bb4ef-c13e-30b7-318b-ba23b06f65bd 4

  2 UP mds MDS MDS_uuid 2

  3 UP lod tempAA-MDT0000-mdtlov tempAA-MDT0000-mdtlov_UUID 3

  4 UP mdt tempAA-MDT0000 tempAA-MDT0000_UUID 56

  5 UP mdd tempAA-MDD0000 tempAA-MDD0000_UUID 3

  6 UP qmt tempAA-QMT0000 tempAA-QMT0000_UUID 3

  7 UP lwp tempAA-MDT0000-lwp-MDT0000 tempAA-MDT0000-lwp-MDT0000_UUID 4

  8 UP osd-zfs tempAA-MDT0001-osd tempAA-MDT0001-osd_UUID 17

  9 UP lod tempAA-MDT0001-mdtlov tempAA-MDT0001-mdtlov_UUID 3

10 UP mdt tempAA-MDT0001 tempAA-MDT0001_UUID 34

11 UP mdd tempAA-MDD0001 tempAA-MDD0001_UUID 3

12 UP osp tempAA-MDT0000-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4

13 UP lwp tempAA-MDT0000-lwp-MDT0001 tempAA-MDT0000-lwp-MDT0001_UUID 4

14 UP osd-zfs tempAA-MDT0002-osd tempAA-MDT0002-osd_UUID 17

15 UP lod tempAA-MDT0002-mdtlov tempAA-MDT0002-mdtlov_UUID 3

16 UP mdt tempAA-MDT0002 tempAA-MDT0002_UUID 32

17 UP mdd tempAA-MDD0002 tempAA-MDD0002_UUID 3

18 UP osp tempAA-MDT0000-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4

19 UP osp tempAA-MDT0001-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4

20 UP lwp tempAA-MDT0000-lwp-MDT0002 tempAA-MDT0000-lwp-MDT0002_UUID 4

21 UP osd-zfs tempAA-MDT0003-osd tempAA-MDT0003-osd_UUID 17

22 UP lod tempAA-MDT0003-mdtlov tempAA-MDT0003-mdtlov_UUID 3

23 UP mdt tempAA-MDT0003 tempAA-MDT0003_UUID 32

24 UP mdd tempAA-MDD0003 tempAA-MDD0003_UUID 3

25 UP osp tempAA-MDT0000-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4

26 UP osp tempAA-MDT0001-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4

27 UP osp tempAA-MDT0002-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4

28 UP lwp tempAA-MDT0000-lwp-MDT0003 tempAA-MDT0000-lwp-MDT0003_UUID 4

29 UP osd-zfs tempAA-MDT0004-osd tempAA-MDT0004-osd_UUID 17

30 UP lod tempAA-MDT0004-mdtlov tempAA-MDT0004-mdtlov_UUID 3

31 UP mdt tempAA-MDT0004 tempAA-MDT0004_UUID 32

32 UP mdd tempAA-MDD0004 tempAA-MDD0004_UUID 3

33 UP osp tempAA-MDT0000-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4

34 UP osp tempAA-MDT0001-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4

35 UP osp tempAA-MDT0002-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4

36 UP osp tempAA-MDT0003-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4

37 UP lwp tempAA-MDT0000-lwp-MDT0004 tempAA-MDT0000-lwp-MDT0004_UUID 4

38 UP osp tempAA-MDT0004-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4

39 UP osp tempAA-MDT0003-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4

40 UP osp tempAA-MDT0004-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4

41 UP osp tempAA-MDT0002-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4

42 UP osp tempAA-MDT0003-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4

43 UP osp tempAA-MDT0004-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4

44 UP osp tempAA-MDT0001-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

45 UP osp tempAA-MDT0002-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

46 UP osp tempAA-MDT0003-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

47 UP osp tempAA-MDT0004-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

48 UP osp tempAA-OST0005-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4

49 UP osp tempAA-OST0006-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4

50 UP osp tempAA-OST0005-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4

51 UP osp tempAA-OST0006-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4

52 UP osp tempAA-OST0005-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4

53 UP osp tempAA-OST0006-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4

54 UP osp tempAA-OST0005-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4

55 UP osp tempAA-OST0006-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4

56 UP osp tempAA-OST0005-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

57 UP osp tempAA-OST0006-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

58 UP osp tempAA-OST0007-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4

59 UP osp tempAA-OST0007-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4

60 UP osp tempAA-OST0007-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4

61 UP osp tempAA-OST0007-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4

62 UP osp tempAA-OST0007-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

63 UP osp tempAA-OST0008-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4

64 UP osp tempAA-OST0008-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4

65 UP osp tempAA-OST0008-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4

66 UP osp tempAA-OST0008-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4

67 UP osp tempAA-OST0008-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

68 UP osp tempAA-OST0001-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4

69 UP osp tempAA-OST0002-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4

70 UP osp tempAA-OST0003-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4

71 UP osp tempAA-OST0001-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4

72 UP osp tempAA-OST0002-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4

73 UP osp tempAA-OST0003-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4

74 UP osp tempAA-OST0001-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4

75 UP osp tempAA-OST0002-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4

76 UP osp tempAA-OST0003-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4

77 UP osp tempAA-OST0001-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4

78 UP osp tempAA-OST0002-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4

79 UP osp tempAA-OST0003-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4

80 UP osp tempAA-OST0001-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

81 UP osp tempAA-OST0002-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

82 UP osp tempAA-OST0003-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

83 UP osp tempAA-OST0004-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4

84 UP osp tempAA-OST0004-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4

85 UP osp tempAA-OST0004-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4

86 UP osp tempAA-OST0004-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4

87 UP osp tempAA-OST0004-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

[root@mds-201 ~]# mount

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)

proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)

devtmpfs on /dev type devtmpfs (rw,nosuid,size=65200336k,nr_inodes=16300084,mode=755)

securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)

tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)

devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)

tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)

tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)

cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)

pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)

cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)

cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)

cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)

cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)

cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)

cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)

cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls)

cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)

cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)

cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)

configfs on /sys/kernel/config type configfs (rw,relatime)

/dev/mapper/centos-root on / type xfs (rw,relatime,attr2,inode64,noquota)

systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=35,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=785)

mqueue on /dev/mqueue type mqueue (rw,relatime)

debugfs on /sys/kernel/debug type debugfs (rw,relatime)

hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)

binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)

/dev/sda2 on /boot type xfs (rw,relatime,attr2,inode64,noquota)

/dev/mapper/centos-home on /home type xfs (rw,relatime,attr2,inode64,noquota)

tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=13042812k,mode=700)

mdtpool/mdt on /mnt/lustre/mdt type lustre (ro,svname=tempAA-MDT0000,mgsnode=10.10.10.251@o2ib:10.10.10.201@o2ib,osd=osd-zfs)

mdtpool1/mdt1 on /mnt/lustre/mdt1 type lustre (ro,svname=tempAA-MDT0001,mgsnode=10.10.10.251@o2ib:10.10.10.201@o2ib,osd=osd-zfs)

mdtpool2/mdt2 on /mnt/lustre/mdt2 type lustre (ro,svname=tempAA-MDT0002,mgsnode=10.10.10.251@o2ib:10.10.10.201@o2ib,osd=osd-zfs)

mdtpool3/mdt3 on /mnt/lustre/mdt3 type lustre (ro,svname=tempAA-MDT0003,mgsnode=10.10.10.251@o2ib:10.10.10.201@o2ib,osd=osd-zfs)

mdtpool4/mdt4 on /mnt/lustre/mdt4 type lustre (ro,svname=tempAA-MDT0004,mgsnode=10.10.10.251@o2ib:10.10.10.201@o2ib,osd=osd-zfs)

[root@mds-201 ~]#

 

mds 2:

 ** 

[root@mgs-200 ~]# lctl dl

  0 UP osd-zfs tempAA-MDT0000-osd tempAA-MDT0000-osd_UUID 18

  1 UP mgc MGC10.10.10.251@o2ib 4dfe3fbf-5953-a8e6-3fb6-9eebff8592e3 4

  2 UP mds MDS MDS_uuid 2

  3 UP lod tempAA-MDT0000-mdtlov tempAA-MDT0000-mdtlov_UUID 3

  4 UP mdt tempAA-MDT0000 tempAA-MDT0000_UUID 2

  5 UP mdd tempAA-MDD0000 tempAA-MDD0000_UUID 3

  6 UP qmt tempAA-QMT0000 tempAA-QMT0000_UUID 3

  7 UP osp tempAA-MDT0001-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

 8 UP osp tempAA-MDT0002-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

  9 UP osp tempAA-MDT0003-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

10 UP osp tempAA-MDT0004-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

11 UP lwp tempAA-MDT0000-lwp-MDT0000 tempAA-MDT0000-lwp-MDT0000_UUID 4

12 UP osp tempAA-OST0005-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

13 UP osp tempAA-OST0006-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

14 UP osp tempAA-OST0007-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

15 UP osp tempAA-OST0008-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

16 UP osp tempAA-OST0001-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

17 UP osp tempAA-OST0002-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

18 UP osp tempAA-OST0003-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

19 UP osp tempAA-OST0004-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4

 

 

 

  1. Mgc:

root@sbb-client1 ~]# lctl dl

  0 UP osd-zfs MGS-osd MGS-osd_UUID 4

  1 UP mgs MGS MGS 18

  2 UP mgc MGC10.10.10.251@o2ib 6f0306ce-d9bf-1556-1288-9800b8b62090 4

[root@sbb-client1 ~]# moun

-bash: moun: command not found

[root@sbb-client1 ~]# mount

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)

proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)

devtmpfs on /dev type devtmpfs (rw,nosuid,size=32833268k,nr_inodes=8208317,mode=755)

securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)

tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)

devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)

tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)

tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)

cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)

pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)

cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)

cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)

cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)

cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)

cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)

cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls)

cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)

cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)

cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)

cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)

configfs on /sys/kernel/config type configfs (rw,relatime)

/dev/mapper/rhel-root on / type xfs (rw,relatime,attr2,inode64,noquota)

systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=34,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=17604)

debugfs on /sys/kernel/debug type debugfs (rw,relatime)

hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)

mqueue on /dev/mqueue type mqueue (rw,relatime)

binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)

/dev/mapper/rhel-home on /home type xfs (rw,relatime,attr2,inode64,noquota)

/dev/sda2 on /boot type xfs (rw,relatime,attr2,inode64,noquota)

sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)

tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=6569624k,mode=700)

mgspool/mgt on /mnt/lustre/mgt type lustre (ro,svname=MGS,nosvc,mgs,osd=osd-zfs)

[root@sbb-client1 ~]#

 

 

client1 config:

 

root@client1-221 ~]# lctl dl

  0 UP mgc MGC10.10.10.251@o2ib 8d56ab9b-2220-9262-99f0-7558b40523ba 4

  1 UP lov tempAA-clilov-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 3

  2 UP lmv tempAA-clilmv-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4

  3 UP mdc tempAA-MDT0000-mdc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4

  4 UP mdc tempAA-MDT0001-mdc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4

  5 UP mdc tempAA-MDT0002-mdc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4

  6 UP mdc tempAA-MDT0003-mdc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4

  7 UP mdc tempAA-MDT0004-mdc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4

  8 UP osc tempAA-OST0005-osc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4

  9 UP osc tempAA-OST0006-osc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4

10 UP osc tempAA-OST0007-osc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4

11 UP osc tempAA-OST0008-osc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4

12 UP osc tempAA-OST0001-osc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4

13 UP osc tempAA-OST0002-osc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4

14 UP osc tempAA-OST0003-osc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4

15 UP osc tempAA-OST0004-osc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4

[root@client1-221 ~]# mount

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)

proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)

devtmpfs on /dev type devtmpfs (rw,nosuid,size=32836108k,nr_inodes=8209027,mode=755)

securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)

tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)

devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)

tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)

tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)

cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)

pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)

efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)

cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)

cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)

cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls)

cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)

cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)

cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)

cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)

cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)

cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)

cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)

configfs on /sys/kernel/config type configfs (rw,relatime)

/dev/mapper/centos-root on / type xfs (rw,relatime,attr2,inode64,noquota)

systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=33,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)

mqueue on /dev/mqueue type mqueue (rw,relatime)

debugfs on /sys/kernel/debug type debugfs (rw,relatime)

hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)

/dev/sda2 on /boot type xfs (rw,relatime,attr2,inode64,noquota)

/dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro)

/dev/mapper/centos-home on /home type xfs (rw,relatime,attr2,inode64,noquota)

tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=6569420k,mode=700)

10.10.10.251@o2ib:10.10.10.201@o2ib:/tempAA on /mnt/lustre type lustre (rw,lazystatfs)

[root@client1-221 ~]#

 

 thanks,

Abe

 

 

 

 

 

Comment by Abe [ 10/Aug/18 ]

also, a note here when I try to mount the fs on the client

mount -t lustre 10.10.10.251@o2ib:10.10.10.201@o2ib:/tempAA /mnt/lustre

get this error on the client server domes:

[ 1810.256672] LustreError: 2179:0:(mgc_request.c:1576:mgc_apply_recover_logs()) mgc: cannot find uuid by nid 10.10.10.200@o2ib
[ 1810.257342] Lustre: 2179:0:(mgc_request.c:1802:mgc_process_recover_nodemap_log()) MGC10.10.10.251@o2ib: error processing recovery log tempAA-cliir: rc = -2
[ 1810.257396] LustreError: 2179:0:(mgc_request.c:2132:mgc_process_log()) MGC10.10.10.251@o2ib: recover log tempAA-cliir failed, not fatal: rc = -2
[ 1810.267968] Lustre: Mounted tempAA-client

 

thanks,

Abe

 

Comment by Andreas Dilger [ 10/Aug/18 ]

As I previously mentioned, you should NOT specify the MDS NID on the mount command line. Only the MGS NID (primary and backup) should be on the mount command line. Sometimes the MGS is on the same node as the MDS, but with DNE there may be many MDS nodes, and they should definitely NOT be listed. This is likely causing the slow mount as the client is trying to contact the MGS on each of the listed NIDs.

I also see that in your example, you have an "MDT0000" listed on both MDS1 and MDS2. That is not a valid configuration, as each MDT needs to have a different index. This would cause severed corruption of the filesystem to have multiple MDT0000 devices in the same filesystem.

Comment by Abe [ 10/Aug/18 ]

Hi Andreas,

 

I have modified mds200 to have only MDT006 and not MDT000.

Also, we are only using one MGS (10.10.10.251) without a backup & 2 MDS servers (MDS200 (10.10.10.200) & MDS201 (10.10.10.201))

and the command used on the client to mount fs: 

mount -t lustre 10.10.10.251@o2ib:/tempAA /mnt/lustre

 

but access the filesystem is still slow!!!

root@client1-221 ~]# mkdir /mnt/lustre/aadomdir

hangs!!

 

 

mds #1

mkfs --mgs --fsname=tempAA --reformat --servicenode=10.10.10.200@o2ib --servicenode=10.10.10.200@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.251@o2ib --backfstype=zfs mgspool6/mdt6
mkfs.lustre --mdt --fsname=tempAA --reformat --index=0 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.200@o2ib --mgsnode=10.10.10.10.251@o2ib --mgsnode=10.10.10.200@o2ib --backfstype=zfs mdtpool6/mdt6
 mkfs.lustre --mdt --fsname=tempAA --reformat --index=0 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.200@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.200@o2ib --backfstype=zfs mdtpool6/mdt6
mkfs.lustre --mdt --fsname=tempAA --reformat --index=0 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.200@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.200@o2ib --backfstype=zfs mdtpool6/mdt6
mkfs.lustre --mdt --fsname=tempAA --reformat --index=0 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.200@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.200@o2ib --backfstype=zfs mdtpool7/mdt7

mds-200 ~]# lctl dl
0 UP osd-zfs tempAA-MDT0006-osd tempAA-MDT0006-osd_UUID 18
1 UP mgc MGC10.10.10.251@o2ib 560e021a-03ca-b753-fc54-483243dda809 4
2 UP mds MDS MDS_uuid 2
3 UP lod tempAA-MDT0006-mdtlov tempAA-MDT0006-mdtlov_UUID 3
4 UP mdt tempAA-MDT0006 tempAA-MDT0006_UUID 34
5 UP mdd tempAA-MDD0006 tempAA-MDD0006_UUID 3
6 UP osp tempAA-MDT0000-osp-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
7 UP osp tempAA-MDT0001-osp-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
8 UP osp tempAA-MDT0002-osp-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
9 UP osp tempAA-MDT0003-osp-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
10 UP osp tempAA-MDT0004-osp-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
11 UP osp tempAA-OST0005-osc-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
12 UP osp tempAA-OST0006-osc-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
13 UP osp tempAA-OST0007-osc-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
14 UP osp tempAA-OST0008-osc-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
15 UP osp tempAA-OST0001-osc-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
16 UP osp tempAA-OST0002-osc-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
17 UP osp tempAA-OST0003-osc-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
18 UP osp tempAA-OST0004-osc-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
19 UP lwp tempAA-MDT0000-lwp-MDT0006 tempAA-MDT0000-lwp-MDT0006_UUID 4

 

mds #2

mkfs.lustre --mdt --fsname=$NAME --reformat --index=0 --servicenode=$MGS_NID --servicenode=$MDT_NID --mgsnode=$MGS_NID --mgsnode=$MDT_NID --backfstype=zfs mdtpool/mdt
+ mkfs.lustre --mdt --fsname=tempAA --reformat --index=0 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.201@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.201@o2ib --backfstype=zfs mdtpool/mdt

mkfs.lustre --mdt --fsname=$NAME --reformat --index=1 --servicenode=$MGS_NID --servicenode=$MDT_NID --mgsnode=$MGS_NID --mgsnode=$MDT_NID --backfstype=zfs mdtpool1/mdt1
+ mkfs.lustre --mdt --fsname=tempAA --reformat --index=1 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.201@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.201@o2ib --backfstype=zfs mdtpool1/mdt1

 mkfs.lustre --mdt --fsname=$NAME --reformat --index=3 --servicenode=$MGS_NID --servicenode=$MDT_NID --mgsnode=$MGS_NID --mgsnode=$MDT_NID --backfstype=zfs mdtpool3/mdt3
+ mkfs.lustre --mdt --fsname=tempAA --reformat --index=3 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.201@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.201@o2ib --backfstype=zfs mdtpool3/mdt3

mkfs.lustre --mdt --fsname=$NAME --reformat --index=4 --servicenode=$MGS_NID --servicenode=$MDT_NID --mgsnode=$MGS_NID --mgsnode=$MDT_NID --backfstype=zfs mdtpool4/mdt4
+ mkfs.lustre --mdt --fsname=tempAA --reformat --index=4 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.201@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.201@o2ib --backfstype=zfs mdtpool4/mdt4

mount -t lustre mdtpool/mdt /mnt/lustre/mdt
+ mount -t lustre mdtpool/mdt /mnt/lustre/mdt

[root@mds-201 ~]# lctl dl
0 UP osd-zfs tempAA-MDT0000-osd tempAA-MDT0000-osd_UUID 19
1 UP mgc MGC10.10.10.251@o2ib ba5bb4ef-c13e-30b7-318b-ba23b06f65bd 4
2 UP mds MDS MDS_uuid 2
3 UP lod tempAA-MDT0000-mdtlov tempAA-MDT0000-mdtlov_UUID 3
4 UP mdt tempAA-MDT0000 tempAA-MDT0000_UUID 66
5 UP mdd tempAA-MDD0000 tempAA-MDD0000_UUID 3
6 UP qmt tempAA-QMT0000 tempAA-QMT0000_UUID 3
7 UP lwp tempAA-MDT0000-lwp-MDT0000 tempAA-MDT0000-lwp-MDT0000_UUID 4
8 UP osd-zfs tempAA-MDT0001-osd tempAA-MDT0001-osd_UUID 18
9 UP lod tempAA-MDT0001-mdtlov tempAA-MDT0001-mdtlov_UUID 3
10 UP mdt tempAA-MDT0001 tempAA-MDT0001_UUID 36
11 UP mdd tempAA-MDD0001 tempAA-MDD0001_UUID 3
12 UP osp tempAA-MDT0000-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
13 UP lwp tempAA-MDT0000-lwp-MDT0001 tempAA-MDT0000-lwp-MDT0001_UUID 4
14 UP osd-zfs tempAA-MDT0002-osd tempAA-MDT0002-osd_UUID 18
15 UP lod tempAA-MDT0002-mdtlov tempAA-MDT0002-mdtlov_UUID 3
16 UP mdt tempAA-MDT0002 tempAA-MDT0002_UUID 34
17 UP mdd tempAA-MDD0002 tempAA-MDD0002_UUID 3
18 UP osp tempAA-MDT0000-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
19 UP osp tempAA-MDT0001-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
20 UP lwp tempAA-MDT0000-lwp-MDT0002 tempAA-MDT0000-lwp-MDT0002_UUID 4
21 UP osd-zfs tempAA-MDT0003-osd tempAA-MDT0003-osd_UUID 18
22 UP lod tempAA-MDT0003-mdtlov tempAA-MDT0003-mdtlov_UUID 3
23 UP mdt tempAA-MDT0003 tempAA-MDT0003_UUID 34
24 UP mdd tempAA-MDD0003 tempAA-MDD0003_UUID 3
25 UP osp tempAA-MDT0000-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
26 UP osp tempAA-MDT0001-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
27 UP osp tempAA-MDT0002-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
28 UP lwp tempAA-MDT0000-lwp-MDT0003 tempAA-MDT0000-lwp-MDT0003_UUID 4
29 UP osd-zfs tempAA-MDT0004-osd tempAA-MDT0004-osd_UUID 18
30 UP lod tempAA-MDT0004-mdtlov tempAA-MDT0004-mdtlov_UUID 3
31 UP mdt tempAA-MDT0004 tempAA-MDT0004_UUID 34
32 UP mdd tempAA-MDD0004 tempAA-MDD0004_UUID 3
33 UP osp tempAA-MDT0000-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
34 UP osp tempAA-MDT0001-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
35 UP osp tempAA-MDT0002-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
36 UP osp tempAA-MDT0003-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
37 UP lwp tempAA-MDT0000-lwp-MDT0004 tempAA-MDT0000-lwp-MDT0004_UUID 4
38 UP osp tempAA-MDT0004-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
39 UP osp tempAA-MDT0003-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
40 UP osp tempAA-MDT0004-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
41 UP osp tempAA-MDT0002-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
42 UP osp tempAA-MDT0003-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
43 UP osp tempAA-MDT0004-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
44 UP osp tempAA-MDT0001-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
45 UP osp tempAA-MDT0002-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
46 UP osp tempAA-MDT0003-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
47 UP osp tempAA-MDT0004-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
48 UP osp tempAA-OST0005-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
49 UP osp tempAA-OST0006-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
50 UP osp tempAA-OST0005-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
51 UP osp tempAA-OST0006-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
52 UP osp tempAA-OST0005-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
53 UP osp tempAA-OST0006-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
54 UP osp tempAA-OST0005-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
55 UP osp tempAA-OST0006-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
56 UP osp tempAA-OST0005-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
57 UP osp tempAA-OST0006-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
58 UP osp tempAA-OST0007-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
59 UP osp tempAA-OST0007-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
60 UP osp tempAA-OST0007-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
61 UP osp tempAA-OST0007-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
62 UP osp tempAA-OST0007-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
63 UP osp tempAA-OST0008-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
64 UP osp tempAA-OST0008-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
65 UP osp tempAA-OST0008-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
66 UP osp tempAA-OST0008-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
67 UP osp tempAA-OST0008-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
68 UP osp tempAA-OST0001-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
69 UP osp tempAA-OST0002-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
70 UP osp tempAA-OST0003-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
71 UP osp tempAA-OST0001-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
72 UP osp tempAA-OST0002-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
73 UP osp tempAA-OST0003-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
74 UP osp tempAA-OST0001-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
75 UP osp tempAA-OST0002-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
76 UP osp tempAA-OST0003-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
77 UP osp tempAA-OST0001-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
78 UP osp tempAA-OST0002-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
79 UP osp tempAA-OST0003-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
80 UP osp tempAA-OST0001-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
81 UP osp tempAA-OST0002-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
82 UP osp tempAA-OST0003-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
83 UP osp tempAA-OST0004-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
84 UP osp tempAA-OST0004-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
85 UP osp tempAA-OST0004-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
86 UP osp tempAA-OST0004-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
87 UP osp tempAA-OST0004-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
88 UP osp tempAA-MDT0006-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
89 UP osp tempAA-MDT0006-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
90 UP osp tempAA-MDT0006-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
91 UP osp tempAA-MDT0006-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
92 UP osp tempAA-MDT0006-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
[root@mds-201 ~]#

 

mgs config:

zpool create -f -O canmount=off -o cachefile=none mgspool sdb

mkfs.lustre --mgs --fsname=$NAME --reformat --servicenode=$MGS_NID --servicenode=$MDT_NID --mgsnode=$MGS_NID --mgsnode=$MDT_NID --backfstype=zfs mgspool/mgt
+ mkfs.lustre --mgs --fsname=tempAA --reformat --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.201@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.201@o2ib --backfstype=zfs mgspool/mgt

lctl dl
0 UP osd-zfs MGS-osd MGS-osd_UUID 4
1 UP mgs MGS MGS 18
2 UP mgc MGC10.10.10.251@o2ib 6f0306ce-d9bf-1556-1288-9800b8b62090 4

zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
mgspool 5.44T 9.02M 5.44T - 0% 0% 1.00x ONLINE -

 

client mount:

root@client1-221 ~]# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=32836108k,nr_inodes=8209027,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/mapper/centos-root on / type xfs (rw,relatime,attr2,inode64,noquota)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=33,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
/dev/sda2 on /boot type xfs (rw,relatime,attr2,inode64,noquota)
/dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro)
/dev/mapper/centos-home on /home type xfs (rw,relatime,attr2,inode64,noquota)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=6569420k,mode=700)
10.10.10.251@o2ib:/tempAA on /mnt/lustre type lustre (rw,lazystatfs)
[root@client1-221 ~]#

thanks,

Abe

Comment by Andreas Dilger [ 10/Aug/18 ]

Abe,
it looks like you are doing something wrong with your formatting commands. You show:

mds #1
mkfs.lustre --mdt --fsname=tempAA --reformat --index=0 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.200@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.200@o2ib --backfstype=zfs mdtpool6/mdt6
mkfs.lustre --mdt --fsname=tempAA --reformat --index=0 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.200@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.200@o2ib --backfstype=zfs mdtpool7/mdt7

which means you have two "tempAA-MDT0000" (same --index=0 option for both MDTs) formatted on mds #1, but on two different ZFS datasets. That is not good. You also show:

mds #2

mkfs.lustre --mdt --fsname=tempAA --reformat --index=0 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.201@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.201@o2ib --backfstype=zfs mdtpool/mdt

That means you have another "tempAA-MDT0000" on mds #2. There should only be a single MDT0000 in the whole filesystem. You need to use a unique --index=N option for each MDT, so --index=6 for mdtpool6/mdt6 and --index=7 for mdtpool7/mdt7 at format time.

It is likely that the current filesystem is corrupted, so I would suggest reformatting it from scratch, since it is only a test filesystem.

the command used on the client to mount fs: 

mount -t lustre 10.10.10.251@o2ib:/tempAA /mnt/lustre

This appears to be correct for a single MGS node, once the other issues are fixed up.

Comment by Abe [ 12/Aug/18 ]

Hi Andreas,

The fs is more accessible now after making sure the index for mdtpool6 & 7 are using index 6,7.

The MDTs are participating in the mdt test workload except for mdtpool7 not sure why is this the case since they are all configured the same.

Also, the zfs pool go to degraded state after starting the mdt test for 5min

Any insight on this ?

[root@mds-201 ~]# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
mdtpool 744G 235M 744G - 0% 0% 1.00x ONLINE -
mdtpool1 744G 238M 744G - 0% 0% 1.00x ONLINE -
mdtpool2 372G 239M 372G - 0% 0% 1.00x ONLINE -
mdtpool3 372G 235M 372G - 0% 0% 1.00x ONLINE -
mdtpool4 372G 239M 372G - 0% 0% 1.00x DEGRADED ---> degraded

[root@mgs-200 ~]# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
mdtpool6 372G 146M 372G - 0% 0% 1.00x DEGRADED -.> degraded
mdtpool7 744G 8.95M 744G - 0% 0% 1.00x ONLINE - --> not participating..

pool: mdtpool4
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: none requested
config:

NAME STATE READ WRITE CKSUM
mdtpool4 DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
sdx ONLINE 0 0 0
sdz FAULTED 0 0 0 corrupted data

 

 

 

thanks,

Abe

 

Comment by Andreas Dilger [ 12/Aug/18 ]

If the pool is degraded like this, it means there is some problem with the devices below the Lustre level. One possibility if you are seeing problems with two zpools is that the devices are configured incorrectly and a disk is shared between the two pools? Alternately, it is possible there is a marginal cable or power supply that has problems under heavy load?

Comment by Abe [ 12/Aug/18 ]

Hi Andreas,

definitely there is a problem on the power supply for one of the clients 

which I will replace the power supplies tomorrow:

[root@client1-221 ~]#
Message from syslogd@client1-221 at Aug 12 03:49:37 ...
kernel:NMI watchdog: BUG: soft lockup - CPU#13 stuck for 22s! [ldlm_bl_07:5391]

Message from syslogd@client1-221 at Aug 12 03:49:41 ...
kernel:NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [ldlm_bl_01:5003]

Message from syslogd@client1-221 at Aug 12 03:49:41 ...
kernel:NMI watchdog: BUG: soft lockup - CPU#16 stuck for 23s! [ldlm_bl_13:10002]

Message from syslogd@client1-221 at Aug 12 03:49:41 ...
kernel:NMI watchdog: BUG: soft lockup - CPU#25 stuck for 22s! [ldlm_bl_08:5498]

 

Not sure about the ssd drives being shared as the root cause of the degradation, I think zfs does not allow pool configuration with shared ssds..

wonder if there is a way to check whether the ssds are shared across the mds servers ..

output of lsblk on the ssd bod:

[root@mds-201 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 5.7T 0 disk
\u251c\u2500sda1 8:1 0 1M 0 part
\u251c\u2500sda2 8:2 0 1G 0 part /boot
\u2514\u2500sda3 8:3 0 5.7T 0 part
\u251c\u2500centos-root 253:0 0 50G 0 lvm /
\u251c\u2500centos-swap 253:1 0 4G 0 lvm [SWAP]
\u2514\u2500centos-home 253:2 0 5.6T 0 lvm /home
sdb 8:16 0 372.6G 0 disk
\u251c\u2500sdb1 8:17 0 372.6G 0 part
\u2514\u2500sdb9 8:25 0 8M 0 part
sdc 8:32 0 372.6G 0 disk
\u251c\u2500sdc1 8:33 0 372.6G 0 part
\u2514\u2500sdc9 8:41 0 8M 0 part
sdd 8:48 0 745.2G 0 disk
\u251c\u2500sdd1 8:49 0 745.2G 0 part
\u2514\u2500sdd9 8:57 0 8M 0 part
sde 8:64 0 745.2G 0 disk
\u251c\u2500sde1 8:65 0 745.2G 0 part
\u2514\u2500sde9 8:73 0 8M 0 part
sdf 8:80 0 745.2G 0 disk
\u251c\u2500sdf1 8:81 0 745.2G 0 part
\u2514\u2500sdf9 8:89 0 8M 0 part
sdg 8:96 0 745.2G 0 disk
\u251c\u2500sdg1 8:97 0 745.2G 0 part
\u2514\u2500sdg9 8:105 0 8M 0 part
sdh 8:112 0 372.6G 0 disk
\u251c\u2500sdh1 8:113 0 372.6G 0 part
\u2514\u2500sdh9 8:121 0 8M 0 part
sdi 8:128 0 372.6G 0 disk
\u251c\u2500sdi1 8:129 0 372.6G 0 part
\u2514\u2500sdi9 8:137 0 8M 0 part
sdj 8:144 0 372.6G 0 disk
\u251c\u2500sdj1 8:145 0 372.6G 0 part
\u2514\u2500sdj9 8:153 0 8M 0 part
sdl 8:176 0 372.6G 0 disk
\u251c\u2500sdl1 8:177 0 372.6G 0 part
\u2514\u2500sdl9 8:185 0 8M 0 part
sdm 8:192 0 372.6G 0 disk
\u251c\u2500sdm1 8:193 0 372.6G 0 part
\u2514\u2500sdm9 8:201 0 8M 0 part
sdn 8:208 0 372.6G 0 disk
\u251c\u2500sdn1 8:209 0 372.6G 0 part
\u2514\u2500sdn9 8:217 0 8M 0 part
sdo 8:224 0 372.6G 0 disk
\u251c\u2500sdo1 8:225 0 372.6G 0 part
\u2514\u2500sdo9 8:233 0 8M 0 part
sdp 8:240 0 372.6G 0 disk
\u251c\u2500sdp1 8:241 0 372.6G 0 part
\u2514\u2500sdp9 8:249 0 8M 0 part
sdq 65:0 0 372.6G 0 disk
\u251c\u2500sdq1 65:1 0 372.6G 0 part
\u2514\u2500sdq9 65:9 0 8M 0 part
sdr 65:16 0 745.2G 0 disk
\u251c\u2500sdr1 65:17 0 745.2G 0 part
\u2514\u2500sdr9 65:25 0 8M 0 part
sds 65:32 0 745.2G 0 disk
\u251c\u2500sds1 65:33 0 745.2G 0 part
\u2514\u2500sds9 65:41 0 8M 0 part
sdt 65:48 0 745.2G 0 disk
\u251c\u2500sdt1 65:49 0 745.2G 0 part
\u2514\u2500sdt9 65:57 0 8M 0 part
sdu 65:64 0 745.2G 0 disk
\u251c\u2500sdu1 65:65 0 745.2G 0 part
\u2514\u2500sdu9 65:73 0 8M 0 part
sdv 65:80 0 372.6G 0 disk
\u251c\u2500sdv1 65:81 0 372.6G 0 part
\u2514\u2500sdv9 65:89 0 8M 0 part
sdw 65:96 0 372.6G 0 disk
\u251c\u2500sdw1 65:97 0 372.6G 0 part
\u2514\u2500sdw9 65:105 0 8M 0 part
sdx 65:112 0 372.6G 0 disk
\u251c\u2500sdx1 65:113 0 372.6G 0 part
\u2514\u2500sdx9 65:121 0 8M 0 part
sdz 65:144 0 372.6G 0 disk
\u251c\u2500sdz1 65:145 0 372.6G 0 part
\u2514\u2500sdz9 65:153 0 8M 0 part
sdaa 65:160 0 372.6G 0 disk
\u251c\u2500sdaa1 65:161 0 372.6G 0 part
\u2514\u2500sdaa9 65:169 0 8M 0 part
sdab 65:176 0 372.6G 0 disk
\u251c\u2500sdab1 65:177 0 372.6G 0 part
\u2514\u2500sdab9 65:185 0 8M 0 part
sdac 65:192 0 372.6G 0 disk
\u251c\u2500sdac1 65:193 0 372.6G 0 part
\u2514\u2500sdac9 65:201 0 8M 0 part
[root@mds-201 ~]#

 

root@mds-201 ~]# zpool status |more
pool: mdtpool
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: none requested
config:

NAME STATE READ WRITE CKSUM
mdtpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
sdd ONLINE 0 0 0
sde DEGRADED 0 0 39 too many errors

errors: No known data errors

pool: mdtpool1
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors

using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: none requested
config:

NAME STATE READ WRITE CKSUM
mdtpool1 DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
sdf DEGRADED 0 0 31 too many errors
sdg ONLINE 0 0 0

errors: No known data errors

 

 

thanks,

Abe

 

 

Generated at Sat Feb 10 02:41:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.