[LU-3624] mds-survey has several bugs Created: 24/Jul/13 Updated: 31/Dec/13 Resolved: 25/Oct/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0, Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Gregoire Pichon | Assignee: | Minh Diep |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Environment: |
kernel 2.6.32-358.6.2 |
||
| Severity: | 3 |
| Epic: | mds-survey, metadata, test |
| Rank (Obsolete): | 9332 |
| Description |
|
I faced several bugs while running mds-survey tests. 1) Here are the step by step commands executed by mds-survey. # lctl dl 0 UP osd-ldiskfs fs2-MDT0000-osd fs2-MDT0000-osd_UUID 11 1 UP mgc MGC30.1.0.95@o2ib 21007e3d-b2b2-494c-8c9e-a536f037ee6d 5 2 UP mds MDS MDS_uuid 3 3 UP lod fs2-MDT0000-mdtlov fs2-MDT0000-mdtlov_UUID 4 4 UP mdt fs2-MDT0000 fs2-MDT0000_UUID 15 5 UP mdd fs2-MDD0000 fs2-MDD0000_UUID 4 6 UP qmt fs2-QMT0000 fs2-QMT0000_UUID 4 7 UP lwp fs2-MDT0000-lwp-MDT0000 fs2-MDT0000-lwp-MDT0000_UUID 5 8 UP osp fs2-OST0003-osc-MDT0000 fs2-MDT0000-mdtlov_UUID 5 9 UP osp fs2-OST0002-osc-MDT0000 fs2-MDT0000-mdtlov_UUID 5 10 UP osp fs2-OST0001-osc-MDT0000 fs2-MDT0000-mdtlov_UUID 5 11 UP osp fs2-OST0000-osc-MDT0000 fs2-MDT0000-mdtlov_UUID 5 # modprobe obdecho # lctl << EOF > attach echo_client fs2-MDT0000_ecc fs2-MDT0000_ecc_UUID > setup fs2-MDT0000 mdd > EOF # lctl --device 12 test_mkdir /test0 (/homes/pichong/SB/AE4_kernel38/obj/x86_64_bullxlinux6.3/topdir/BUILD/lustre-2.4.0/lustre/include/lustre/lustre_idl.h:683:ostid_set_id()) Bad 18446744073709551615 to set 0:0 2) get_global_stats () {
local rfile=$1
awk < $rfile \
'BEGIN {n = 0;} \
{ n++; \
if (n == 1) { err = $1; ave = $2; min = $3; max = $4} \
else \
{ if ($1 < err) err = $1; \
if ($2 < min) min = $2; \
if ($3 > max) max = $3; \
} \
} \
END { if (n == 0) err = 0; \
printf "%d %f %f %f\n", err, ave, min, max}'
}
should be get_global_stats () {
local rfile=$1
awk < $rfile \
'BEGIN {n = 0;} \
{ n++; \
if (n == 1) { err = $1; ave = $2; min = $3; max = $4} \
else \
{ if ($1 < err) err = $1; \
ave += $2; \
if ($3 < min) min = $3; \
if ($4 > max) max = $4; \
} \
} \
END { if (n == 0) err = 0; \
printf "%d %f %f %f\n", err, ave/n, min, max}'
}
I am going to provide a patch. 3) [<ffffffffa0171731>] cfs_waitq_timedwait+0x11/0x20 [libcfs] [<ffffffffa0e74154>] osp_precreate_reserve+0x5c4/0x1ee0 [osp] [<ffffffffa0e6dc55>] osp_declare_object_create+0x155/0x4f0 [osp] [<ffffffffa0b7438d>] lod_qos_declare_object_on+0xed/0x480 [lod] [<ffffffffa0b75f0f>] lod_alloc_rr.clone.2+0x66f/0xde0 [lod] [<ffffffffa0b77b69>] lod_qos_prep_create+0xfa9/0x1b14 [lod] [<ffffffffa0b71cab>] lod_declare_striped_object+0x14b/0x880 [lod] [<ffffffffa0b72df3>] lod_declare_xattr_set+0x273/0x410 [lod] [<ffffffffa0547700>] mdo_declare_xattr_set.clone.4+0x40/0xe0 [mdd] [<ffffffffa054a470>] mdd_declare_create+0x4b0/0x860 [mdd] [<ffffffffa054afb1>] mdd_create+0x791/0x1740 [mdd] [<ffffffffa0ea8fef>] echo_md_create_internal+0x1cf/0x640 [obdecho] [<ffffffffa0eb2b43>] echo_md_handler+0x1333/0x1ac0 [obdecho] [<ffffffffa0eb7257>] echo_client_iocontrol+0x2dc7/0x3b40 [obdecho] [<ffffffffa060849f>] class_handle_ioctl+0x12ff/0x1ec0 [obdclass] [<ffffffffa05f02ab>] obd_class_ioctl+0x4b/0x190 [obdclass] [<ffffffff81181372>] vfs_ioctl+0x22/0xa0 [<ffffffff81181514>] do_vfs_ioctl+0x84/0x580 [<ffffffff81181a91>] sys_ioctl+0x81/0xa0 [<ffffffff81003072>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff |
| Comments |
| Comment by Gregoire Pichon [ 24/Jul/13 ] |
|
About point 3), the issue might not be caused by the "destroy" action. Looking at free/used objects on the OSTs shows that freeing is delayed and lasts several minutes. I run mds-survey with stripe_count=4 and file_count=100000, and in the meantime displayed the objects in use on one of the OSTs (/proc/fs/lustre/obdfilter/fs2-OST0000/filestotal - cat /proc/fs/lustre/obdfilter/fs2-OST0000/filesfree). 15:53:42 fs2-OST0000 filesused=16543 This behavior seems to be a problem when launching several mds-survey runs in a raw. |
| Comment by Gregoire Pichon [ 24/Jul/13 ] |
|
Here is a patch for item 1) |
| Comment by Peter Jones [ 25/Jul/13 ] |
|
Bobbie Could you please take care of this one? Thanks Peter |
| Comment by Gregoire Pichon [ 05/Sep/13 ] |
|
I have posted a patch for item 2) and other issues related to multiple MDT support in mds-survey. |
| Comment by Bobbie Lind (Inactive) [ 11/Sep/13 ] |
|
Patch 1 has landed to master as 5ae2c575ff234c7b1189d2f71d8e5a73509591f3 |
| Comment by Peter Jones [ 25/Sep/13 ] |
|
Minh will help with landing the remaining patch |
| Comment by Minh Diep [ 26/Sep/13 ] |
|
Hi Gregoire, If you still have your system, could you provide the output of mds-survey run after you apply patch 2? - thanks |
| Comment by Gregoire Pichon [ 02/Oct/13 ] |
|
Here is the output of mds-survey run with patch2. targets="mo89:fs2-MDT0001 mo89:fs2-MDT0000 mo90:fs2-MDT0002 " dir_count=4 file_count=1000000 thrlo=2 thrhi=16 stripe_count=0 tests_str="create lookup destroy" rslt_loc=/tmp/mds-survey/20131002_172138/ /tmp/mds-survey/mds-survey Wed Oct 2 17:21:41 CEST 2013 /tmp/mds-survey/mds-survey from mo89 mdt 3 file 1000000 dir 4 thr 4 create 47733.54 [ 0.00, 75995.06] lookup 595223.13 [ 565806.08, 645481.12] destroy 48611.00 [ 0.00, 95992.22] mdt 3 file 1000000 dir 4 thr 8 create 38134.17 [ 0.00, 58996.99] lookup 914477.78 [ 821165.71, 1005936.03] destroy 46286.85 [ 0.00, 87995.25] mdt 3 file 1000000 dir 4 thr 16 create 39500.31 [ 0.00, 62936.69] lookup 979904.69 [ 912483.69, 1042839.86] destroy 40600.56 [ 0.00, 88994.66] done! |
| Comment by Gregoire Pichon [ 25/Oct/13 ] |
|
The path http://review.whamcloud.com/7558 has been merged into master. |
| Comment by Gregoire Pichon [ 25/Oct/13 ] |
|
Peter, Since the patch http://review.whamcloud.com/7558 fixes several bugs, do you think it could also be integrated in the 2.5.1, the next maintenance release ? |
| Comment by Peter Jones [ 25/Oct/13 ] |
|
Yes this patch will certainly be a candidate for 2.5.1 |