Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.11.0
-
None
-
3
-
9223372036854775807
Description
We are observing performance degradation in mutest IOPs testing at around 50K IOPS with zfs.
Configuration has DNE and DOM, however it seems there is only one MDT that is being utilized out of 3 MDTs:
steps for the configuration in place:
root@mds-201 ~]# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
mdtpool 744G 2.35G 742G - 0% 0% 1.00x ONLINE -
mdtpool1 744G 12.1M 744G - 0% 0% 1.00x ONLINE -
mdtpool2 372G 8.96M 372G - 0% 0% 1.00x ONLINE -
[root@mds-201 ~]#
lfs setstripe -E 1M -L mdt -E -1 /mnt/lustre/domdir
lfs setstripe -L mdt --component-end 1M /mnt/lustre/domdir/MDT0
lfs setstripe -L mdt --component-end 1M /mnt/lustre/domdir/MDT1
lfs setstripe -L mdt --component-end 1M /mnt/lustre/domdir/MDT5
snapshots of the results:
Command line used: ./mdtest -d /mnt/lustre/domdir-mdts/testdir-4 -n 47662 -F -e -u -i 1
Path: /mnt/lustre/domdir-mdts
FS: 351.4 TiB Used FS: 0.0% Inodes: 184.4 Mi Used Inodes: 0.0%
44 tasks, 2097128 files
SUMMARY: (of 1 iterations)
Operation Max Min Mean Std Dev
--------- — --- ---- -------
File creation : 18572.644 18572.644 18572.644 0.000
File stat : 22206.703 22206.703 22206.703 0.000
File read : 66375.392 66375.392 66375.392 0.000
File removal : 11525.913 11525.913 11525.913 0.000
Tree creation : 1016.260 1016.260 1016.260 0.000
Tree removal : 2.698 2.698 2.698 0.000
– finished at 08/03/2018 09:41:01 –
================= 11 client END =======================
thanks,
Abe
Attachments
Issue Links
- is related to
-
LU-11213 DNE3: remote mkdir() in ROOT/ by default
-
- Resolved
-
Activity
If the pool is degraded like this, it means there is some problem with the devices below the Lustre level. One possibility if you are seeing problems with two zpools is that the devices are configured incorrectly and a disk is shared between the two pools? Alternately, it is possible there is a marginal cable or power supply that has problems under heavy load?
Hi Andreas,
The fs is more accessible now after making sure the index for mdtpool6 & 7 are using index 6,7.
The MDTs are participating in the mdt test workload except for mdtpool7 not sure why is this the case since they are all configured the same.
Also, the zfs pool go to degraded state after starting the mdt test for 5min
Any insight on this ?
[root@mds-201 ~]# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
mdtpool 744G 235M 744G - 0% 0% 1.00x ONLINE -
mdtpool1 744G 238M 744G - 0% 0% 1.00x ONLINE -
mdtpool2 372G 239M 372G - 0% 0% 1.00x ONLINE -
mdtpool3 372G 235M 372G - 0% 0% 1.00x ONLINE -
mdtpool4 372G 239M 372G - 0% 0% 1.00x DEGRADED ---> degraded
[root@mgs-200 ~]# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
mdtpool6 372G 146M 372G - 0% 0% 1.00x DEGRADED -.> degraded
mdtpool7 744G 8.95M 744G - 0% 0% 1.00x ONLINE - --> not participating..
pool: mdtpool4
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: none requested
config:
NAME STATE READ WRITE CKSUM
mdtpool4 DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
sdx ONLINE 0 0 0
sdz FAULTED 0 0 0 corrupted data
thanks,
Abe
Abe,
it looks like you are doing something wrong with your formatting commands. You show:
mds #1 mkfs.lustre --mdt --fsname=tempAA --reformat --index=0 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.200@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.200@o2ib --backfstype=zfs mdtpool6/mdt6 mkfs.lustre --mdt --fsname=tempAA --reformat --index=0 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.200@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.200@o2ib --backfstype=zfs mdtpool7/mdt7
which means you have two "tempAA-MDT0000" (same --index=0 option for both MDTs) formatted on mds #1, but on two different ZFS datasets. That is not good. You also show:
mds #2 mkfs.lustre --mdt --fsname=tempAA --reformat --index=0 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.201@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.201@o2ib --backfstype=zfs mdtpool/mdt
That means you have another "tempAA-MDT0000" on mds #2. There should only be a single MDT0000 in the whole filesystem. You need to use a unique --index=N option for each MDT, so --index=6 for mdtpool6/mdt6 and --index=7 for mdtpool7/mdt7 at format time.
It is likely that the current filesystem is corrupted, so I would suggest reformatting it from scratch, since it is only a test filesystem.
the command used on the client to mount fs: mount -t lustre 10.10.10.251@o2ib:/tempAA /mnt/lustre
This appears to be correct for a single MGS node, once the other issues are fixed up.
Hi Andreas,
I have modified mds200 to have only MDT006 and not MDT000.
Also, we are only using one MGS (10.10.10.251) without a backup & 2 MDS servers (MDS200 (10.10.10.200) & MDS201 (10.10.10.201))
and the command used on the client to mount fs:
mount -t lustre 10.10.10.251@o2ib:/tempAA /mnt/lustre
but access the filesystem is still slow!!!
root@client1-221 ~]# mkdir /mnt/lustre/aadomdir
hangs!!
mds #1
mkfs --mgs --fsname=tempAA --reformat --servicenode=10.10.10.200@o2ib --servicenode=10.10.10.200@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.251@o2ib --backfstype=zfs mgspool6/mdt6
mkfs.lustre --mdt --fsname=tempAA --reformat --index=0 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.200@o2ib --mgsnode=10.10.10.10.251@o2ib --mgsnode=10.10.10.200@o2ib --backfstype=zfs mdtpool6/mdt6
mkfs.lustre --mdt --fsname=tempAA --reformat --index=0 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.200@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.200@o2ib --backfstype=zfs mdtpool6/mdt6
mkfs.lustre --mdt --fsname=tempAA --reformat --index=0 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.200@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.200@o2ib --backfstype=zfs mdtpool6/mdt6
mkfs.lustre --mdt --fsname=tempAA --reformat --index=0 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.200@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.200@o2ib --backfstype=zfs mdtpool7/mdt7
mds-200 ~]# lctl dl
0 UP osd-zfs tempAA-MDT0006-osd tempAA-MDT0006-osd_UUID 18
1 UP mgc MGC10.10.10.251@o2ib 560e021a-03ca-b753-fc54-483243dda809 4
2 UP mds MDS MDS_uuid 2
3 UP lod tempAA-MDT0006-mdtlov tempAA-MDT0006-mdtlov_UUID 3
4 UP mdt tempAA-MDT0006 tempAA-MDT0006_UUID 34
5 UP mdd tempAA-MDD0006 tempAA-MDD0006_UUID 3
6 UP osp tempAA-MDT0000-osp-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
7 UP osp tempAA-MDT0001-osp-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
8 UP osp tempAA-MDT0002-osp-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
9 UP osp tempAA-MDT0003-osp-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
10 UP osp tempAA-MDT0004-osp-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
11 UP osp tempAA-OST0005-osc-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
12 UP osp tempAA-OST0006-osc-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
13 UP osp tempAA-OST0007-osc-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
14 UP osp tempAA-OST0008-osc-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
15 UP osp tempAA-OST0001-osc-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
16 UP osp tempAA-OST0002-osc-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
17 UP osp tempAA-OST0003-osc-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
18 UP osp tempAA-OST0004-osc-MDT0006 tempAA-MDT0006-mdtlov_UUID 4
19 UP lwp tempAA-MDT0000-lwp-MDT0006 tempAA-MDT0000-lwp-MDT0006_UUID 4
mds #2
mkfs.lustre --mdt --fsname=$NAME --reformat --index=0 --servicenode=$MGS_NID --servicenode=$MDT_NID --mgsnode=$MGS_NID --mgsnode=$MDT_NID --backfstype=zfs mdtpool/mdt
+ mkfs.lustre --mdt --fsname=tempAA --reformat --index=0 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.201@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.201@o2ib --backfstype=zfs mdtpool/mdt
mkfs.lustre --mdt --fsname=$NAME --reformat --index=1 --servicenode=$MGS_NID --servicenode=$MDT_NID --mgsnode=$MGS_NID --mgsnode=$MDT_NID --backfstype=zfs mdtpool1/mdt1
+ mkfs.lustre --mdt --fsname=tempAA --reformat --index=1 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.201@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.201@o2ib --backfstype=zfs mdtpool1/mdt1
mkfs.lustre --mdt --fsname=$NAME --reformat --index=3 --servicenode=$MGS_NID --servicenode=$MDT_NID --mgsnode=$MGS_NID --mgsnode=$MDT_NID --backfstype=zfs mdtpool3/mdt3
+ mkfs.lustre --mdt --fsname=tempAA --reformat --index=3 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.201@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.201@o2ib --backfstype=zfs mdtpool3/mdt3
mkfs.lustre --mdt --fsname=$NAME --reformat --index=4 --servicenode=$MGS_NID --servicenode=$MDT_NID --mgsnode=$MGS_NID --mgsnode=$MDT_NID --backfstype=zfs mdtpool4/mdt4
+ mkfs.lustre --mdt --fsname=tempAA --reformat --index=4 --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.201@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.201@o2ib --backfstype=zfs mdtpool4/mdt4
mount -t lustre mdtpool/mdt /mnt/lustre/mdt
+ mount -t lustre mdtpool/mdt /mnt/lustre/mdt
[root@mds-201 ~]# lctl dl
0 UP osd-zfs tempAA-MDT0000-osd tempAA-MDT0000-osd_UUID 19
1 UP mgc MGC10.10.10.251@o2ib ba5bb4ef-c13e-30b7-318b-ba23b06f65bd 4
2 UP mds MDS MDS_uuid 2
3 UP lod tempAA-MDT0000-mdtlov tempAA-MDT0000-mdtlov_UUID 3
4 UP mdt tempAA-MDT0000 tempAA-MDT0000_UUID 66
5 UP mdd tempAA-MDD0000 tempAA-MDD0000_UUID 3
6 UP qmt tempAA-QMT0000 tempAA-QMT0000_UUID 3
7 UP lwp tempAA-MDT0000-lwp-MDT0000 tempAA-MDT0000-lwp-MDT0000_UUID 4
8 UP osd-zfs tempAA-MDT0001-osd tempAA-MDT0001-osd_UUID 18
9 UP lod tempAA-MDT0001-mdtlov tempAA-MDT0001-mdtlov_UUID 3
10 UP mdt tempAA-MDT0001 tempAA-MDT0001_UUID 36
11 UP mdd tempAA-MDD0001 tempAA-MDD0001_UUID 3
12 UP osp tempAA-MDT0000-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
13 UP lwp tempAA-MDT0000-lwp-MDT0001 tempAA-MDT0000-lwp-MDT0001_UUID 4
14 UP osd-zfs tempAA-MDT0002-osd tempAA-MDT0002-osd_UUID 18
15 UP lod tempAA-MDT0002-mdtlov tempAA-MDT0002-mdtlov_UUID 3
16 UP mdt tempAA-MDT0002 tempAA-MDT0002_UUID 34
17 UP mdd tempAA-MDD0002 tempAA-MDD0002_UUID 3
18 UP osp tempAA-MDT0000-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
19 UP osp tempAA-MDT0001-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
20 UP lwp tempAA-MDT0000-lwp-MDT0002 tempAA-MDT0000-lwp-MDT0002_UUID 4
21 UP osd-zfs tempAA-MDT0003-osd tempAA-MDT0003-osd_UUID 18
22 UP lod tempAA-MDT0003-mdtlov tempAA-MDT0003-mdtlov_UUID 3
23 UP mdt tempAA-MDT0003 tempAA-MDT0003_UUID 34
24 UP mdd tempAA-MDD0003 tempAA-MDD0003_UUID 3
25 UP osp tempAA-MDT0000-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
26 UP osp tempAA-MDT0001-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
27 UP osp tempAA-MDT0002-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
28 UP lwp tempAA-MDT0000-lwp-MDT0003 tempAA-MDT0000-lwp-MDT0003_UUID 4
29 UP osd-zfs tempAA-MDT0004-osd tempAA-MDT0004-osd_UUID 18
30 UP lod tempAA-MDT0004-mdtlov tempAA-MDT0004-mdtlov_UUID 3
31 UP mdt tempAA-MDT0004 tempAA-MDT0004_UUID 34
32 UP mdd tempAA-MDD0004 tempAA-MDD0004_UUID 3
33 UP osp tempAA-MDT0000-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
34 UP osp tempAA-MDT0001-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
35 UP osp tempAA-MDT0002-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
36 UP osp tempAA-MDT0003-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
37 UP lwp tempAA-MDT0000-lwp-MDT0004 tempAA-MDT0000-lwp-MDT0004_UUID 4
38 UP osp tempAA-MDT0004-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
39 UP osp tempAA-MDT0003-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
40 UP osp tempAA-MDT0004-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
41 UP osp tempAA-MDT0002-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
42 UP osp tempAA-MDT0003-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
43 UP osp tempAA-MDT0004-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
44 UP osp tempAA-MDT0001-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
45 UP osp tempAA-MDT0002-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
46 UP osp tempAA-MDT0003-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
47 UP osp tempAA-MDT0004-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
48 UP osp tempAA-OST0005-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
49 UP osp tempAA-OST0006-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
50 UP osp tempAA-OST0005-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
51 UP osp tempAA-OST0006-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
52 UP osp tempAA-OST0005-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
53 UP osp tempAA-OST0006-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
54 UP osp tempAA-OST0005-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
55 UP osp tempAA-OST0006-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
56 UP osp tempAA-OST0005-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
57 UP osp tempAA-OST0006-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
58 UP osp tempAA-OST0007-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
59 UP osp tempAA-OST0007-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
60 UP osp tempAA-OST0007-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
61 UP osp tempAA-OST0007-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
62 UP osp tempAA-OST0007-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
63 UP osp tempAA-OST0008-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
64 UP osp tempAA-OST0008-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
65 UP osp tempAA-OST0008-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
66 UP osp tempAA-OST0008-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
67 UP osp tempAA-OST0008-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
68 UP osp tempAA-OST0001-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
69 UP osp tempAA-OST0002-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
70 UP osp tempAA-OST0003-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
71 UP osp tempAA-OST0001-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
72 UP osp tempAA-OST0002-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
73 UP osp tempAA-OST0003-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
74 UP osp tempAA-OST0001-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
75 UP osp tempAA-OST0002-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
76 UP osp tempAA-OST0003-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
77 UP osp tempAA-OST0001-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
78 UP osp tempAA-OST0002-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
79 UP osp tempAA-OST0003-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
80 UP osp tempAA-OST0001-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
81 UP osp tempAA-OST0002-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
82 UP osp tempAA-OST0003-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
83 UP osp tempAA-OST0004-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
84 UP osp tempAA-OST0004-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
85 UP osp tempAA-OST0004-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
86 UP osp tempAA-OST0004-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
87 UP osp tempAA-OST0004-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
88 UP osp tempAA-MDT0006-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
89 UP osp tempAA-MDT0006-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
90 UP osp tempAA-MDT0006-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
91 UP osp tempAA-MDT0006-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
92 UP osp tempAA-MDT0006-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
[root@mds-201 ~]#
mgs config:
zpool create -f -O canmount=off -o cachefile=none mgspool sdb
mkfs.lustre --mgs --fsname=$NAME --reformat --servicenode=$MGS_NID --servicenode=$MDT_NID --mgsnode=$MGS_NID --mgsnode=$MDT_NID --backfstype=zfs mgspool/mgt
+ mkfs.lustre --mgs --fsname=tempAA --reformat --servicenode=10.10.10.251@o2ib --servicenode=10.10.10.201@o2ib --mgsnode=10.10.10.251@o2ib --mgsnode=10.10.10.201@o2ib --backfstype=zfs mgspool/mgt
lctl dl
0 UP osd-zfs MGS-osd MGS-osd_UUID 4
1 UP mgs MGS MGS 18
2 UP mgc MGC10.10.10.251@o2ib 6f0306ce-d9bf-1556-1288-9800b8b62090 4
zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
mgspool 5.44T 9.02M 5.44T - 0% 0% 1.00x ONLINE -
client mount:
root@client1-221 ~]# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=32836108k,nr_inodes=8209027,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/mapper/centos-root on / type xfs (rw,relatime,attr2,inode64,noquota)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=33,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
/dev/sda2 on /boot type xfs (rw,relatime,attr2,inode64,noquota)
/dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro)
/dev/mapper/centos-home on /home type xfs (rw,relatime,attr2,inode64,noquota)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=6569420k,mode=700)
10.10.10.251@o2ib:/tempAA on /mnt/lustre type lustre (rw,lazystatfs)
[root@client1-221 ~]#
thanks,
Abe
As I previously mentioned, you should NOT specify the MDS NID on the mount command line. Only the MGS NID (primary and backup) should be on the mount command line. Sometimes the MGS is on the same node as the MDS, but with DNE there may be many MDS nodes, and they should definitely NOT be listed. This is likely causing the slow mount as the client is trying to contact the MGS on each of the listed NIDs.
I also see that in your example, you have an "MDT0000" listed on both MDS1 and MDS2. That is not a valid configuration, as each MDT needs to have a different index. This would cause severed corruption of the filesystem to have multiple MDT0000 devices in the same filesystem.
also, a note here when I try to mount the fs on the client
mount -t lustre 10.10.10.251@o2ib:10.10.10.201@o2ib:/tempAA /mnt/lustre
get this error on the client server domes:
[ 1810.256672] LustreError: 2179:0:(mgc_request.c:1576:mgc_apply_recover_logs()) mgc: cannot find uuid by nid 10.10.10.200@o2ib
[ 1810.257342] Lustre: 2179:0:(mgc_request.c:1802:mgc_process_recover_nodemap_log()) MGC10.10.10.251@o2ib: error processing recovery log tempAA-cliir: rc = -2
[ 1810.257396] LustreError: 2179:0:(mgc_request.c:2132:mgc_process_log()) MGC10.10.10.251@o2ib: recover log tempAA-cliir failed, not fatal: rc = -2
[ 1810.267968] Lustre: Mounted tempAA-client
thanks,
Abe
Hi Andreas,
Below is the config for 2 mds servers, 1 msg server and 1 client..
when I mount the client, access to the fs /mnt/lustre is very slow:
is there something wrong with the way, I'm mounting the client ?
do I need specify the kids for the Mgc and the 2 mds servers in the mount command?
I do see an error when I tried to mount the client ..
root@client1-221 ~]# mount -t lustre 10.10.10.251@o2ib:10.10.10.201@o2ib:10.10.10.200@o2ib:/tempAA /mnt/lustre
[root@client1-221 ~]#
134.067195] LNet: 1355:0:(o2iblnd.c:943:kiblnd_create_conn()) peer 10.10.10.251@o2ib - queue depth reduced from 128 to 63 to allow for qp creation
[ 134.237519] LustreError: 1935:0:(mgc_request.c:1576:mgc_apply_recover_logs()) mgc: cannot find uuid by nid 10.10.10.200@o2ib
[ 134.239654] Lustre: 1935:0:(mgc_request.c:1802:mgc_process_recover_nodemap_log()) MGC10.10.10.251@o2ib: error processing recovery log tempAA-cliir: rc = -2
[ 134.239726] LustreError: 1935:0:(mgc_request.c:2132:mgc_process_log()) MGC10.10.10.251@o2ib: recover log tempAA-cliir failed, not fatal: rc = -2
[ 134.251092] Lustre: Mounted tempAA-client
ot@client1-221 ~]# ls -l /mnt/lustre
^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C
2 mds config:
mds 1;
[root@mds-201 ~]#
[root@mds-201 ~]# lctl dl
0 UP osd-zfs tempAA-MDT0000-osd tempAA-MDT0000-osd_UUID 18
1 UP mgc MGC10.10.10.251@o2ib ba5bb4ef-c13e-30b7-318b-ba23b06f65bd 4
2 UP mds MDS MDS_uuid 2
3 UP lod tempAA-MDT0000-mdtlov tempAA-MDT0000-mdtlov_UUID 3
4 UP mdt tempAA-MDT0000 tempAA-MDT0000_UUID 56
5 UP mdd tempAA-MDD0000 tempAA-MDD0000_UUID 3
6 UP qmt tempAA-QMT0000 tempAA-QMT0000_UUID 3
7 UP lwp tempAA-MDT0000-lwp-MDT0000 tempAA-MDT0000-lwp-MDT0000_UUID 4
8 UP osd-zfs tempAA-MDT0001-osd tempAA-MDT0001-osd_UUID 17
9 UP lod tempAA-MDT0001-mdtlov tempAA-MDT0001-mdtlov_UUID 3
10 UP mdt tempAA-MDT0001 tempAA-MDT0001_UUID 34
11 UP mdd tempAA-MDD0001 tempAA-MDD0001_UUID 3
12 UP osp tempAA-MDT0000-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
13 UP lwp tempAA-MDT0000-lwp-MDT0001 tempAA-MDT0000-lwp-MDT0001_UUID 4
14 UP osd-zfs tempAA-MDT0002-osd tempAA-MDT0002-osd_UUID 17
15 UP lod tempAA-MDT0002-mdtlov tempAA-MDT0002-mdtlov_UUID 3
16 UP mdt tempAA-MDT0002 tempAA-MDT0002_UUID 32
17 UP mdd tempAA-MDD0002 tempAA-MDD0002_UUID 3
18 UP osp tempAA-MDT0000-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
19 UP osp tempAA-MDT0001-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
20 UP lwp tempAA-MDT0000-lwp-MDT0002 tempAA-MDT0000-lwp-MDT0002_UUID 4
21 UP osd-zfs tempAA-MDT0003-osd tempAA-MDT0003-osd_UUID 17
22 UP lod tempAA-MDT0003-mdtlov tempAA-MDT0003-mdtlov_UUID 3
23 UP mdt tempAA-MDT0003 tempAA-MDT0003_UUID 32
24 UP mdd tempAA-MDD0003 tempAA-MDD0003_UUID 3
25 UP osp tempAA-MDT0000-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
26 UP osp tempAA-MDT0001-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
27 UP osp tempAA-MDT0002-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
28 UP lwp tempAA-MDT0000-lwp-MDT0003 tempAA-MDT0000-lwp-MDT0003_UUID 4
29 UP osd-zfs tempAA-MDT0004-osd tempAA-MDT0004-osd_UUID 17
30 UP lod tempAA-MDT0004-mdtlov tempAA-MDT0004-mdtlov_UUID 3
31 UP mdt tempAA-MDT0004 tempAA-MDT0004_UUID 32
32 UP mdd tempAA-MDD0004 tempAA-MDD0004_UUID 3
33 UP osp tempAA-MDT0000-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
34 UP osp tempAA-MDT0001-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
35 UP osp tempAA-MDT0002-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
36 UP osp tempAA-MDT0003-osp-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
37 UP lwp tempAA-MDT0000-lwp-MDT0004 tempAA-MDT0000-lwp-MDT0004_UUID 4
38 UP osp tempAA-MDT0004-osp-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
39 UP osp tempAA-MDT0003-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
40 UP osp tempAA-MDT0004-osp-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
41 UP osp tempAA-MDT0002-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
42 UP osp tempAA-MDT0003-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
43 UP osp tempAA-MDT0004-osp-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
44 UP osp tempAA-MDT0001-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
45 UP osp tempAA-MDT0002-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
46 UP osp tempAA-MDT0003-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
47 UP osp tempAA-MDT0004-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
48 UP osp tempAA-OST0005-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
49 UP osp tempAA-OST0006-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
50 UP osp tempAA-OST0005-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
51 UP osp tempAA-OST0006-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
52 UP osp tempAA-OST0005-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
53 UP osp tempAA-OST0006-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
54 UP osp tempAA-OST0005-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
55 UP osp tempAA-OST0006-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
56 UP osp tempAA-OST0005-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
57 UP osp tempAA-OST0006-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
58 UP osp tempAA-OST0007-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
59 UP osp tempAA-OST0007-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
60 UP osp tempAA-OST0007-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
61 UP osp tempAA-OST0007-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
62 UP osp tempAA-OST0007-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
63 UP osp tempAA-OST0008-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
64 UP osp tempAA-OST0008-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
65 UP osp tempAA-OST0008-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
66 UP osp tempAA-OST0008-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
67 UP osp tempAA-OST0008-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
68 UP osp tempAA-OST0001-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
69 UP osp tempAA-OST0002-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
70 UP osp tempAA-OST0003-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
71 UP osp tempAA-OST0001-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
72 UP osp tempAA-OST0002-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
73 UP osp tempAA-OST0003-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
74 UP osp tempAA-OST0001-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
75 UP osp tempAA-OST0002-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
76 UP osp tempAA-OST0003-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
77 UP osp tempAA-OST0001-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
78 UP osp tempAA-OST0002-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
79 UP osp tempAA-OST0003-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
80 UP osp tempAA-OST0001-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
81 UP osp tempAA-OST0002-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
82 UP osp tempAA-OST0003-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
83 UP osp tempAA-OST0004-osc-MDT0004 tempAA-MDT0004-mdtlov_UUID 4
84 UP osp tempAA-OST0004-osc-MDT0003 tempAA-MDT0003-mdtlov_UUID 4
85 UP osp tempAA-OST0004-osc-MDT0002 tempAA-MDT0002-mdtlov_UUID 4
86 UP osp tempAA-OST0004-osc-MDT0001 tempAA-MDT0001-mdtlov_UUID 4
87 UP osp tempAA-OST0004-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
[root@mds-201 ~]# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=65200336k,nr_inodes=16300084,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/mapper/centos-root on / type xfs (rw,relatime,attr2,inode64,noquota)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=35,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=785)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
/dev/sda2 on /boot type xfs (rw,relatime,attr2,inode64,noquota)
/dev/mapper/centos-home on /home type xfs (rw,relatime,attr2,inode64,noquota)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=13042812k,mode=700)
mdtpool/mdt on /mnt/lustre/mdt type lustre (ro,svname=tempAA-MDT0000,mgsnode=10.10.10.251@o2ib:10.10.10.201@o2ib,osd=osd-zfs)
mdtpool1/mdt1 on /mnt/lustre/mdt1 type lustre (ro,svname=tempAA-MDT0001,mgsnode=10.10.10.251@o2ib:10.10.10.201@o2ib,osd=osd-zfs)
mdtpool2/mdt2 on /mnt/lustre/mdt2 type lustre (ro,svname=tempAA-MDT0002,mgsnode=10.10.10.251@o2ib:10.10.10.201@o2ib,osd=osd-zfs)
mdtpool3/mdt3 on /mnt/lustre/mdt3 type lustre (ro,svname=tempAA-MDT0003,mgsnode=10.10.10.251@o2ib:10.10.10.201@o2ib,osd=osd-zfs)
mdtpool4/mdt4 on /mnt/lustre/mdt4 type lustre (ro,svname=tempAA-MDT0004,mgsnode=10.10.10.251@o2ib:10.10.10.201@o2ib,osd=osd-zfs)
[root@mds-201 ~]#
mds 2:
**
[root@mgs-200 ~]# lctl dl
0 UP osd-zfs tempAA-MDT0000-osd tempAA-MDT0000-osd_UUID 18
1 UP mgc MGC10.10.10.251@o2ib 4dfe3fbf-5953-a8e6-3fb6-9eebff8592e3 4
2 UP mds MDS MDS_uuid 2
3 UP lod tempAA-MDT0000-mdtlov tempAA-MDT0000-mdtlov_UUID 3
4 UP mdt tempAA-MDT0000 tempAA-MDT0000_UUID 2
5 UP mdd tempAA-MDD0000 tempAA-MDD0000_UUID 3
6 UP qmt tempAA-QMT0000 tempAA-QMT0000_UUID 3
7 UP osp tempAA-MDT0001-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
8 UP osp tempAA-MDT0002-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
9 UP osp tempAA-MDT0003-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
10 UP osp tempAA-MDT0004-osp-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
11 UP lwp tempAA-MDT0000-lwp-MDT0000 tempAA-MDT0000-lwp-MDT0000_UUID 4
12 UP osp tempAA-OST0005-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
13 UP osp tempAA-OST0006-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
14 UP osp tempAA-OST0007-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
15 UP osp tempAA-OST0008-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
16 UP osp tempAA-OST0001-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
17 UP osp tempAA-OST0002-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
18 UP osp tempAA-OST0003-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
19 UP osp tempAA-OST0004-osc-MDT0000 tempAA-MDT0000-mdtlov_UUID 4
- Mgc:
root@sbb-client1 ~]# lctl dl
0 UP osd-zfs MGS-osd MGS-osd_UUID 4
1 UP mgs MGS MGS 18
2 UP mgc MGC10.10.10.251@o2ib 6f0306ce-d9bf-1556-1288-9800b8b62090 4
[root@sbb-client1 ~]# moun
-bash: moun: command not found
[root@sbb-client1 ~]# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=32833268k,nr_inodes=8208317,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/mapper/rhel-root on / type xfs (rw,relatime,attr2,inode64,noquota)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=34,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=17604)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
/dev/mapper/rhel-home on /home type xfs (rw,relatime,attr2,inode64,noquota)
/dev/sda2 on /boot type xfs (rw,relatime,attr2,inode64,noquota)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=6569624k,mode=700)
mgspool/mgt on /mnt/lustre/mgt type lustre (ro,svname=MGS,nosvc,mgs,osd=osd-zfs)
[root@sbb-client1 ~]#
client1 config:
root@client1-221 ~]# lctl dl
0 UP mgc MGC10.10.10.251@o2ib 8d56ab9b-2220-9262-99f0-7558b40523ba 4
1 UP lov tempAA-clilov-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 3
2 UP lmv tempAA-clilmv-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4
3 UP mdc tempAA-MDT0000-mdc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4
4 UP mdc tempAA-MDT0001-mdc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4
5 UP mdc tempAA-MDT0002-mdc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4
6 UP mdc tempAA-MDT0003-mdc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4
7 UP mdc tempAA-MDT0004-mdc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4
8 UP osc tempAA-OST0005-osc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4
9 UP osc tempAA-OST0006-osc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4
10 UP osc tempAA-OST0007-osc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4
11 UP osc tempAA-OST0008-osc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4
12 UP osc tempAA-OST0001-osc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4
13 UP osc tempAA-OST0002-osc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4
14 UP osc tempAA-OST0003-osc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4
15 UP osc tempAA-OST0004-osc-ffff88105b505800 05280644-a903-6f5e-abfc-21a149e8384b 4
[root@client1-221 ~]# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=32836108k,nr_inodes=8209027,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/mapper/centos-root on / type xfs (rw,relatime,attr2,inode64,noquota)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=33,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
/dev/sda2 on /boot type xfs (rw,relatime,attr2,inode64,noquota)
/dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro)
/dev/mapper/centos-home on /home type xfs (rw,relatime,attr2,inode64,noquota)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=6569420k,mode=700)
10.10.10.251@o2ib:10.10.10.201@o2ib:/tempAA on /mnt/lustre type lustre (rw,lazystatfs)
[root@client1-221 ~]#
thanks,
Abe
Abe, the IP address (or more correctly "Lustre NID") listed on the client mount is the address and failover for the MGS. This will typically be located on MDS0 with MDT0000. It is preferred to have the MGS use a separate device so that it can be failed over independently of MDT0000.
In any case, the clients do not need to change anything for their mount command, since there is only a single MGS for the filesystem. The actual connections to the MDT(s) are handled internally by the Lustre configuration log in the same way as with OSTs.
Hi Andreas,
I'm adding a 2nd MDS server with its own mdts, how do the clients mount to the same namespace to two separate servers having different ip addresses ?
e.g:
On the clients servers: ( They will have 2 separate mounts ?)
1st Mds server mount:
mount -t lustre 10.10.10.200@o2ib:10.10.10.201@o2ib:/tempAA /mnt/lustre
And the 2nd Mds server mount:
mount -t lustre 10.10.10.200@o2ib:10.10.10.202@o2ib:/tempAA /mnt/lustre
thanks,
Abe
Hi Andreas,
definitely there is a problem on the power supply for one of the clients
which I will replace the power supplies tomorrow:
[root@client1-221 ~]#
Message from syslogd@client1-221 at Aug 12 03:49:37 ...
kernel:NMI watchdog: BUG: soft lockup - CPU#13 stuck for 22s! [ldlm_bl_07:5391]
Message from syslogd@client1-221 at Aug 12 03:49:41 ...
kernel:NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [ldlm_bl_01:5003]
Message from syslogd@client1-221 at Aug 12 03:49:41 ...
kernel:NMI watchdog: BUG: soft lockup - CPU#16 stuck for 23s! [ldlm_bl_13:10002]
Message from syslogd@client1-221 at Aug 12 03:49:41 ...
kernel:NMI watchdog: BUG: soft lockup - CPU#25 stuck for 22s! [ldlm_bl_08:5498]
Not sure about the ssd drives being shared as the root cause of the degradation, I think zfs does not allow pool configuration with shared ssds..
wonder if there is a way to check whether the ssds are shared across the mds servers ..
output of lsblk on the ssd bod:
[root@mds-201 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 5.7T 0 disk
\u251c\u2500sda1 8:1 0 1M 0 part
\u251c\u2500sda2 8:2 0 1G 0 part /boot
\u2514\u2500sda3 8:3 0 5.7T 0 part
\u251c\u2500centos-root 253:0 0 50G 0 lvm /
\u251c\u2500centos-swap 253:1 0 4G 0 lvm [SWAP]
\u2514\u2500centos-home 253:2 0 5.6T 0 lvm /home
sdb 8:16 0 372.6G 0 disk
\u251c\u2500sdb1 8:17 0 372.6G 0 part
\u2514\u2500sdb9 8:25 0 8M 0 part
sdc 8:32 0 372.6G 0 disk
\u251c\u2500sdc1 8:33 0 372.6G 0 part
\u2514\u2500sdc9 8:41 0 8M 0 part
sdd 8:48 0 745.2G 0 disk
\u251c\u2500sdd1 8:49 0 745.2G 0 part
\u2514\u2500sdd9 8:57 0 8M 0 part
sde 8:64 0 745.2G 0 disk
\u251c\u2500sde1 8:65 0 745.2G 0 part
\u2514\u2500sde9 8:73 0 8M 0 part
sdf 8:80 0 745.2G 0 disk
\u251c\u2500sdf1 8:81 0 745.2G 0 part
\u2514\u2500sdf9 8:89 0 8M 0 part
sdg 8:96 0 745.2G 0 disk
\u251c\u2500sdg1 8:97 0 745.2G 0 part
\u2514\u2500sdg9 8:105 0 8M 0 part
sdh 8:112 0 372.6G 0 disk
\u251c\u2500sdh1 8:113 0 372.6G 0 part
\u2514\u2500sdh9 8:121 0 8M 0 part
sdi 8:128 0 372.6G 0 disk
\u251c\u2500sdi1 8:129 0 372.6G 0 part
\u2514\u2500sdi9 8:137 0 8M 0 part
sdj 8:144 0 372.6G 0 disk
\u251c\u2500sdj1 8:145 0 372.6G 0 part
\u2514\u2500sdj9 8:153 0 8M 0 part
sdl 8:176 0 372.6G 0 disk
\u251c\u2500sdl1 8:177 0 372.6G 0 part
\u2514\u2500sdl9 8:185 0 8M 0 part
sdm 8:192 0 372.6G 0 disk
\u251c\u2500sdm1 8:193 0 372.6G 0 part
\u2514\u2500sdm9 8:201 0 8M 0 part
sdn 8:208 0 372.6G 0 disk
\u251c\u2500sdn1 8:209 0 372.6G 0 part
\u2514\u2500sdn9 8:217 0 8M 0 part
sdo 8:224 0 372.6G 0 disk
\u251c\u2500sdo1 8:225 0 372.6G 0 part
\u2514\u2500sdo9 8:233 0 8M 0 part
sdp 8:240 0 372.6G 0 disk
\u251c\u2500sdp1 8:241 0 372.6G 0 part
\u2514\u2500sdp9 8:249 0 8M 0 part
sdq 65:0 0 372.6G 0 disk
\u251c\u2500sdq1 65:1 0 372.6G 0 part
\u2514\u2500sdq9 65:9 0 8M 0 part
sdr 65:16 0 745.2G 0 disk
\u251c\u2500sdr1 65:17 0 745.2G 0 part
\u2514\u2500sdr9 65:25 0 8M 0 part
sds 65:32 0 745.2G 0 disk
\u251c\u2500sds1 65:33 0 745.2G 0 part
\u2514\u2500sds9 65:41 0 8M 0 part
sdt 65:48 0 745.2G 0 disk
\u251c\u2500sdt1 65:49 0 745.2G 0 part
\u2514\u2500sdt9 65:57 0 8M 0 part
sdu 65:64 0 745.2G 0 disk
\u251c\u2500sdu1 65:65 0 745.2G 0 part
\u2514\u2500sdu9 65:73 0 8M 0 part
sdv 65:80 0 372.6G 0 disk
\u251c\u2500sdv1 65:81 0 372.6G 0 part
\u2514\u2500sdv9 65:89 0 8M 0 part
sdw 65:96 0 372.6G 0 disk
\u251c\u2500sdw1 65:97 0 372.6G 0 part
\u2514\u2500sdw9 65:105 0 8M 0 part
sdx 65:112 0 372.6G 0 disk
\u251c\u2500sdx1 65:113 0 372.6G 0 part
\u2514\u2500sdx9 65:121 0 8M 0 part
sdz 65:144 0 372.6G 0 disk
\u251c\u2500sdz1 65:145 0 372.6G 0 part
\u2514\u2500sdz9 65:153 0 8M 0 part
sdaa 65:160 0 372.6G 0 disk
\u251c\u2500sdaa1 65:161 0 372.6G 0 part
\u2514\u2500sdaa9 65:169 0 8M 0 part
sdab 65:176 0 372.6G 0 disk
\u251c\u2500sdab1 65:177 0 372.6G 0 part
\u2514\u2500sdab9 65:185 0 8M 0 part
sdac 65:192 0 372.6G 0 disk
\u251c\u2500sdac1 65:193 0 372.6G 0 part
\u2514\u2500sdac9 65:201 0 8M 0 part
[root@mds-201 ~]#
root@mds-201 ~]# zpool status |more
pool: mdtpool
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: none requested
config:
NAME STATE READ WRITE CKSUM
mdtpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
sdd ONLINE 0 0 0
sde DEGRADED 0 0 39 too many errors
errors: No known data errors
pool: mdtpool1
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: none requested
config:
NAME STATE READ WRITE CKSUM
mdtpool1 DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
sdf DEGRADED 0 0 31 too many errors
sdg ONLINE 0 0 0
errors: No known data errors
thanks,
Abe