Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.8.0
-
lustre-2.8.0-2.6.32_573.22.1.1chaos.ch5.4.x86_64.x86_64
clients and servers all on same OS and same Lustre build.
Patch stack on top of lustre 2.8.0 tag is:
{noformat}
cb25ac6 Target building for RHEL7 under Koji.
675a140LU-7841doc: stop using python-docutils
b50a29aLU-7893osd-zfs: calls dmu_objset_disown() with NULL
67fe716LU-7198clio: remove mtime check in vvp_io_fault_start()
80b4633 LLNL-0000 llapi: get OST count from proc
e2717c9LU-5725ofd: Expose OFD site_stats through proc
8d9a8f2 LU-4009 osd-zfs: Add tunables to disable sync (DEBUG)
699abe4LU-8073build: Eliminate lustre-source binary package
71ee38aLU-8072build: Restore module debuginfo
7fb8959LU-7962build: Support builds w/ weak module ZFS
66579d9LU-7961build: Fix ldiskfs source autodetect for CentOS 6
52aa718LU-7643build: Remove Linux version string from RPM release field
445b063LU-5614build: use %kernel_module_package in rpm spec
f5b8fb1LU-7699build: Convert lustre_ver.h.in into a static .h file
333612eLU-7699build: Eliminate lustre_build_version.h
a49b396LU-7699build: Replace version_tag.pl with LUSTRE-VERSION-GEN
6948075LU-7518build: Remove the Phi accelerator-specific packaging
ea79df5 New tag 2.8.0-RC5
{noformat}lustre-2.8.0-2.6.32_573.22.1.1chaos.ch5.4.x86_64.x86_64 clients and servers all on same OS and same Lustre build. Patch stack on top of lustre 2.8.0 tag is: {noformat} cb25ac6 Target building for RHEL7 under Koji. 675a140 LU-7841 doc: stop using python-docutils b50a29a LU-7893 osd-zfs: calls dmu_objset_disown() with NULL 67fe716 LU-7198 clio: remove mtime check in vvp_io_fault_start() 80b4633 LLNL-0000 llapi: get OST count from proc e2717c9 LU-5725 ofd: Expose OFD site_stats through proc 8d9a8f2 LU-4009 osd-zfs: Add tunables to disable sync (DEBUG) 699abe4 LU-8073 build: Eliminate lustre-source binary package 71ee38a LU-8072 build: Restore module debuginfo 7fb8959 LU-7962 build: Support builds w/ weak module ZFS 66579d9 LU-7961 build: Fix ldiskfs source autodetect for CentOS 6 52aa718 LU-7643 build: Remove Linux version string from RPM release field 445b063 LU-5614 build: use %kernel_module_package in rpm spec f5b8fb1 LU-7699 build: Convert lustre_ver.h.in into a static .h file 333612e LU-7699 build: Eliminate lustre_build_version.h a49b396 LU-7699 build: Replace version_tag.pl with LUSTRE-VERSION-GEN 6948075 LU-7518 build: Remove the Phi accelerator-specific packaging ea79df5 New tag 2.8.0-RC5 {noformat}
-
3
-
9223372036854775807
Description
We are experiencing a severe metadata performance issue intermittently. So far it has occurred with a striped directory being alter
ed by many threads on multiple nodes.
Setup:
Multiple MDTs, one MDT per MDS
A directory striped across those MDTs with lfs setdirstripe -D set
several processes on each of several nodes making metadata changes in that striped dir
Symptoms:
Create rates in 10s of creates/second total across all the MDTs hosting the shards
getdents() times >1000 seconds per getdents() call
In one case:
The directory was striped across 10 MDTs and 16 OSTs
Mdtest had been run as follows:
srun -N 10 -n 80 mdtest -d /p/lustre/faaland1/mdtest -n 128000 -F
The create rate started out about 30,000 creates/second total across all 10 MDTs. After some time it dropped to 10-20 creates/secon
d. On a separate node, which mounted the same filesystem but was not running any of the mdtest processes and was entirely idle, I r
an ls and observed the very slow getdents() calls.
On yet another idle node, I created another directory, striped across the same MDTs, and created 10000 files within it. Create rate
was good. Listing that directory produced getdents() times of about 0.003 seconds.
There were no indications of network problems within the nodes at the time, nor before or after our test (this is on catalyst and th
e nodes are normally used for compute jobs and monitored 24x7).
In the other case:
This filesystem has 4 MDTs, each on its own MDS, and 2 OSTs.
The directory is striped across all 4 MDTs and has -D set.
The workload involved 10 clients, each running 96 threads. Shell scripts were randomly invoking mkdir, touch, rmdir, or rm (the lat
ter two having chosen a file or directory to remove).
Create rate started at about 10,000/sec concurrent with thousands of unlinks, mkdirs, rmdirs, and stats per sec. All those operatio
ns slowed to single digit per-second rates. An ls in that common directory, on a node not running the job, also produced >1000 seco
nd getdents() calls.
strace -T output:
getdents(3, /* 1061 entries */, 32768) = 32760 <1887.749809> getdents(3, /* 1102 entries */, 32768) = 32760 <1990.174707> getdents(3, /* 1087 entries */, 32768) = 32752 <1994.781547> getdents(3, /* 1056 entries */, 32768) = 32768 <1907.404333> brk(0xcf0000) = 0xcf0000 <0.000030> getdents(3, /* 1091 entries */, 32768) = 32752 <1860.720958>