[LU-11] Lustre 2.x functionality regression: Missing aggregate MDT stats Created: 03/Nov/10 Updated: 12/Nov/10 Resolved: 12/Nov/10 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.0.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Michael MacDonald (Inactive) | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Bugzilla ID: | 21,420 |
| Rank (Obsolete): | 10163 |
| Description |
|
LLNL has pointed out that LMT performance at scale (e.g. 20k clients) will greatly suffer if LMT has to read the per-client-export stats in order to recreate the missing aggregate MDT stats. By extension, any other monitoring tools depending on aggregate MDT stats will also be affected by this regression. ----- Forwarded message from Brian Behlendorf <behlendorf1@llnl.gov> ----- Date: Tue, 2 Nov 2010 14:43:49 -0700 Check out bug 21420 comment #40, specifically commit 9eb3d1db in HEAD. commit 9eb3d1db42d2937daef25950f6527ccb46221f8e b=21420 Add mds/mgs stats to HEAD 1)remove useless counter from mds and move some definitions i=andreas |
| Comments |
| Comment by Robert Read (Inactive) [ 03/Nov/10 ] |
|
I reopened 21420 and requested the functionality be restored however it looks like the MDS aggregate stats were removed in 2008 in commit 69a3513021212ed1eb8823a50f80853e22e607b3. This patch only removed the unused initialization code. |
| Comment by Robert Read (Inactive) [ 05/Nov/10 ] |
|
I had this chat with Andreas earlier today: [11:00] adilger: looking at the patch, it _does_ appear that there should be MDT global stats - see mdt_lproc.c::mdt_procfs_init() hunk, and that mdt_counter_incr() is incrementing the obd_stats counter in addition to the per-export counter [11:00] rread: true, but i couldn't find the stats when i tested this [11:01] rread: mdt_stats_counter_init is only called for the nid_stats [11:02] rread: don't we also need to call this with obd_stats somewhere? [11:04] adilger: the stats init for the obd devices is done as part of the lprocfs_alloc_md_stats() code [11:05] adilger: I wonder if the stats are being collected, but the MDT obd device itself is not being hooked into lprocfs? [11:06] rread: the stats file was there, just no stats [11:58] adilger: sorry, was on another concall... I suspect this is a bug in the MDT device setup due to the half-finished MDS->MDT code reorg [11:59] adilger: i.e. something foolish like the "old" MDT has an OBD device, and the "new" CMD MDT has a separate MDT device [12:00] adilger: err, a separate OBD device |
| Comment by Robert Read (Inactive) [ 05/Nov/10 ] |
|
Bobi Jam, please review the comments here and for some context, the most recent ones on 21420. It appears there is just an initialization problem here. |
| Comment by Zhenyu Xu [ 06/Nov/10 ] |
|
found the root cause, mdt_counter_incr() should act upon obd->md_stats instead upon obd->obd_stats, the former is for recording md ops, while the later one for obd ops (such as connect, disconnect) |
| Comment by Zhenyu Xu [ 06/Nov/10 ] |
|
I've tried my patch
================ with patch ==================================================
|
| Comment by Zhenyu Xu [ 06/Nov/10 ] |
|
posted patch for review at http://review.whamcloud.com/#change,124 |
| Comment by Zhenyu Xu [ 08/Nov/10 ] |
|
posted patch in bz 21420. |
| Comment by Zhenyu Xu [ 12/Nov/10 ] |
|
patch (https://bugzilla.lustre.org/attachment.cgi?id=32148) got landed. |