[LU-4109] sanity test_57b failure: 'MDC before 15214812 != after 15214800' Created: 16/Oct/13 Updated: 14/Dec/21 Resolved: 27/Nov/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.2 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HSM | ||
| Environment: |
Lustre 2.5.0-RC1, el6 OpenSFS cluster with combined MGS/MDS (c03), single OSS (c04) with two OSTs, archive MGS/MDS (c05), archive OST (c06) with two OSTs, archive OST2 (c07) with two OSTs, eight clients; one agent + client(c08), one robinhood/db + client(c09) and others just running as Lustre clients (c09, c10, c11, c12, c13,c14, c15) |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 11065 | ||||||||
| Description |
|
While testing HSM features, acceptance-small was run with test results at: https://maloo.whamcloud.com/test_sessions/4574cfe8-35e1-11e3-b051-52540035b04c From the test_log: == sanity test 57b: default LOV EAs are stored inside large inodes ===== 13:06:10 (1381781170) mcreating 100 files total: 100 creates in 0.17 seconds: 572.16 creates/second Filesystem 1K-blocks Used Available Use% Mounted on mds@o2ib:/scratch 25088052 942228 22886476 4% /lustre/scratch opening files to create objects/EAs Filesystem 1K-blocks Used Available Use% Mounted on mds@o2ib:/scratch 25088052 942228 22886720 4% /lustre/scratch sanity test_57b: @@@@@@ FAIL: MDC before 15214812 != after 15214800 Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4264:error_noexit() = /usr/lib64/lustre/tests/test-framework.sh:4291:error() = /usr/lib64/lustre/tests/sanity.sh:4670:test_57b() = /usr/lib64/lustre/tests/test-framework.sh:4530:run_one() = /usr/lib64/lustre/tests/test-framework.sh:4563:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:4433:run_test() = /usr/lib64/lustre/tests/sanity.sh:4678:main() |
| Comments |
| Comment by Peter Jones [ 16/Oct/13 ] |
|
Bruno Could you please help with this one? Thanks Peter |
| Comment by Andreas Dilger [ 16/Oct/13 ] |
|
Should be trivial to increase the margin from 8 to 16 or something. Since we create 100 files in the test, we should still be pretty safe. |
| Comment by Bruno Faccini (Inactive) [ 28/Oct/13 ] |
|
Andreas, I would like to better understand what sanity/test_57b() tries to detect. James, do you remember these parameters value during your failing test ? |
| Comment by James Nunez (Inactive) [ 29/Oct/13 ] |
|
Bruno, James |
| Comment by Andreas Dilger [ 30/Oct/13 ] |
|
Bruno, test_57b is trying to detect if the LOV EA is too large to fit into the MDT inode for some reason (e.g. formatted with too-small inodes, default LOV EA becomes too large, etc). If 100 files are created and 100 blocks are allocated then the test fails. In this case, only a few blocks are allocated (e.g. llog files, ChangeLog, etc), and the test shouldn't fail, so increasing the margin a small amount is fine. |
| Comment by Bruno Faccini (Inactive) [ 30/Oct/13 ] |
|
Ok it was also my understanding that if LOV EA can not fit into MDT inode this will need at least 1 additional block per file, so in this case why not do the reverse test and simply detect if the number of free blocks decreased of 100+ blocks ? |
| Comment by Andreas Dilger [ 30/Oct/13 ] |
|
What if the xattrs are stored more efficiently than 1 per 4kB (e.g. MDT formatted with 1kB blocksize)? I don't mind having a larger margin, but I think the test could start to fail silently if the error margin is too large. I think 32-64kB is large enough to avoid future problems, and smaller than the 400kB that would be seen with 100 4kB blocks or even 400 1kB blocks. |
| Comment by Bruno Faccini (Inactive) [ 04/Nov/13 ] |
|
Patch raising threshold, to detect out-of inode LOV EA alloc, from 8 to 16 is at http://review.whamcloud.com/8156. |
| Comment by Bruno Faccini (Inactive) [ 27/Nov/13 ] |
|
Patch has landed. |
| Comment by James Nunez (Inactive) [ 13/Mar/14 ] |
|
I'm seeing a similar (same?) error again in 2.5.1-RC3 with error "sanity test_57b: @@@@@@ FAIL: MDC before 46667332 != after 46667320". The results are at https://maloo.whamcloud.com/test_sessions/f6c5b8e6-aac8-11e3-a41c-52540035b04c I'm running on the OpenSFS cluster with similar set up to what was originally reported. It looks like this patch was not back ported to b2_5. |
| Comment by James Nunez (Inactive) [ 13/Mar/14 ] |
|
Patch for b2_5 at http://review.whamcloud.com/#/c/9652/ |
| Comment by James Nunez (Inactive) [ 09/Apr/14 ] |
|
http://review.whamcloud.com/#/c/9652/ landed to b2_5. |