[LU-2731] Speed up the run time of "stop_services" function in lustre init script Created: 31/Jan/13 Updated: 22/Mar/13 Resolved: 22/Mar/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0, Lustre 2.1.4 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Prakash Surya (Inactive) | Assignee: | Emoly Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Rank (Obsolete): | 6635 |
| Description |
|
The "stop_services" function in the lustre init script can be sped up if it parallelizes the teardown of each service. This can have a huge positive performance impact on the time it takes to bring down a Lustre server. I wrote a patch to demonstrate this, and tested it on a Lustre 2.1 based OSS with up to 32 ldiskfs OSTs. First the patch to the init script: diff --git i/lustre/scripts/lustre w/lustre/scripts/lustre
index b97951f..eeb5941 100644
--- i/lustre/scripts/lustre
+++ w/lustre/scripts/lustre
@@ -498,6 +498,7 @@ stop_services ()
{
local labels=$*
local result=0
+ local pids=""
local dir dev label
for label in $labels; do
@@ -512,9 +513,22 @@ stop_services ()
# no error
continue
fi
+
echo "Unmounting $dir"
- umount $dir || result=2
+ umount $dir &
+
+ if [ -z "$pids" ]; then
+ pids="$!"
+ else
+ pids="$pids $!"
+ fi
done
+
+ # wait for all umount processes to complete, report any errors
+ for pid in $pids; do
+ wait $pid || result=2
+ done
+
# double check!
for label in $labels; do
if mountpt_is_active $label; then
The testing I performed shows this patch considerably improves the time it takes to "stop" the OSS. The raw numbers are below. Here is the timing information I gathered using the time command when running /etc/init.d/lustre start and /etc/init.d/lustre stop: $ time /etc/init.d/lustre start # (w/o patch) $ time /etc/init.d/lustre start # (w/o patch) +-----------+------------+-----------+-----------+ | # of OSTs | real | user | sys | +-----------+------------+-----------+-----------+ | 1 | 0m 2.184s | 0m 0.162s | 0m 0.077s | | 2 | 0m 4.285s | 0m 0.281s | 0m 0.148s | | 4 | 0m 8.508s | 0m 0.500s | 0m 0.302s | | 8 | 0m 16.961s | 0m 1.017s | 0m 0.568s | | 16 | 0m 33.884s | 0m 1.964s | 0m 1.176s | | 32 | 1m 7.744s | 0m 3.986s | 0m 2.280s | +-----------+------------+-----------+-----------+ $ time /etc/init.d/lustre stop # (w/o patch) $ time /etc/init.d/lustre stop # (w/o patch) +-----------+------------+-----------+-----------+ | # of OSTs | real | user | sys | +-----------+------------+-----------+-----------+ | 1 | 0m 4.758s | 0m 0.072s | 0m 0.030s | | 2 | 0m 9.018s | 0m 0.118s | 0m 0.049s | | 4 | 0m 18.813s | 0m 0.185s | 0m 0.083s | | 8 | 0m 37.586s | 0m 0.337s | 0m 0.141s | | 16 | 1m 16.092s | 0m 0.597s | 0m 0.263s | | 32 | 2m 37.550s | 0m 1.181s | 0m 0.403s | +-----------+------------+-----------+-----------+ Here is the timing information gathered the same way as above, but with my patch applied (all else being equal): $ time /etc/init.d/lustre start # (w/ patch) $ time /etc/init.d/lustre start # (w/ patch) +-----------+------------+-----------+-----------+ | # of OSTs | real | user | sys | +-----------+------------+-----------+-----------+ | 1 | 0m 2.183s | 0m 0.158s | 0m 0.083s | | 2 | 0m 4.282s | 0m 0.274s | 0m 0.153s | | 4 | 0m 8.519s | 0m 0.510s | 0m 0.303s | | 8 | 0m 16.966s | 0m 1.019s | 0m 0.583s | | 16 | 0m 33.878s | 0m 1.984s | 0m 1.154s | | 32 | 1m 7.745s | 0m 3.944s | 0m 2.322s | +-----------+------------+-----------+-----------+ $ time /etc/init.d/lustre stop # (w/ patch) $ time /etc/init.d/lustre stop # (w/ patch) +-----------+------------+-----------+-----------+ | # of OSTs | real | user | sys | +-----------+------------+-----------+-----------+ | 1 | 0m 4.566s | 0m 0.075s | 0m 0.023s | | 2 | 0m 4.857s | 0m 0.105s | 0m 0.070s | | 4 | 0m 4.777s | 0m 0.175s | 0m 0.064s | | 8 | 0m 5.449s | 0m 0.323s | 0m 0.153s | | 16 | 0m 5.862s | 0m 0.606s | 0m 0.208s | | 32 | 0m 6.307s | 0m 1.183s | 0m 0.811s | +-----------+------------+-----------+-----------+ This is a drastic improvement in the time it takes for /etc/init.d/lustre stop to complete as the number of OSTs increases. |
| Comments |
| Comment by Prakash Surya (Inactive) [ 31/Jan/13 ] |
|
Please see: http://review.whamcloud.com/5235 |
| Comment by Peter Jones [ 04/Feb/13 ] |
|
Thanks for the patch Prakash! Emoly Could you please take care of this patch? Thanks Peter |
| Comment by Emoly Liu [ 04/Feb/13 ] |
|
OK. |
| Comment by Peter Jones [ 22/Mar/13 ] |
|
Landed for 2.4 |