[LU-4109] sanity test_57b failure: 'MDC before 15214812 != after 15214800' Created: 16/Oct/13  Updated: 14/Dec/21  Resolved: 27/Nov/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.6.0, Lustre 2.5.2

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: HSM
Environment:

Lustre 2.5.0-RC1, el6

OpenSFS cluster with combined MGS/MDS (c03), single OSS (c04) with two OSTs, archive MGS/MDS (c05), archive OST (c06) with two OSTs, archive OST2 (c07) with two OSTs, eight clients; one agent + client(c08), one robinhood/db + client(c09) and others just running as Lustre clients (c09, c10, c11, c12, c13,c14, c15)


Issue Links:
Related
is related to LU-9661 sanity test_57b: MDC before 382704 !=... Resolved
Severity: 3
Rank (Obsolete): 11065

 Description   

While testing HSM features, acceptance-small was run with test results at: https://maloo.whamcloud.com/test_sessions/4574cfe8-35e1-11e3-b051-52540035b04c

From the test_log:

== sanity test 57b: default LOV EAs are stored inside large inodes ===== 13:06:10 (1381781170)
mcreating 100 files
total: 100 creates in 0.17 seconds: 572.16 creates/second
Filesystem           1K-blocks      Used Available Use% Mounted on
mds@o2ib:/scratch     25088052    942228  22886476   4% /lustre/scratch
opening files to create objects/EAs
Filesystem           1K-blocks      Used Available Use% Mounted on
mds@o2ib:/scratch     25088052    942228  22886720   4% /lustre/scratch
 sanity test_57b: @@@@@@ FAIL: MDC before 15214812 != after 15214800 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4264:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:4291:error()
  = /usr/lib64/lustre/tests/sanity.sh:4670:test_57b()
  = /usr/lib64/lustre/tests/test-framework.sh:4530:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:4563:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4433:run_test()
  = /usr/lib64/lustre/tests/sanity.sh:4678:main()


 Comments   
Comment by Peter Jones [ 16/Oct/13 ]

Bruno

Could you please help with this one?

Thanks

Peter

Comment by Andreas Dilger [ 16/Oct/13 ]

Should be trivial to increase the margin from 8 to 16 or something. Since we create 100 files in the test, we should still be pretty safe.

Comment by Bruno Faccini (Inactive) [ 28/Oct/13 ]

Andreas, I would like to better understand what sanity/test_57b() tries to detect.
So, what could cause this ? Wide default striping vs Inode size at mke2fs ?
If yes, we could anticipate it and use/compute an appropriated margin, no ? If no, what did I miss and misunderstand?

James, do you remember these parameters value during your failing test ?

Comment by James Nunez (Inactive) [ 29/Oct/13 ]

Bruno,
The file system was set up and run with pretty much all default settings. I think the striping for this set of tests was a single stripe with a single OSS with two OSTs. So, default striping, but not wide.

James

Comment by Andreas Dilger [ 30/Oct/13 ]

Bruno, test_57b is trying to detect if the LOV EA is too large to fit into the MDT inode for some reason (e.g. formatted with too-small inodes, default LOV EA becomes too large, etc). If 100 files are created and 100 blocks are allocated then the test fails. In this case, only a few blocks are allocated (e.g. llog files, ChangeLog, etc), and the test shouldn't fail, so increasing the margin a small amount is fine.

Comment by Bruno Faccini (Inactive) [ 30/Oct/13 ]

Ok it was also my understanding that if LOV EA can not fit into MDT inode this will need at least 1 additional block per file, so in this case why not do the reverse test and simply detect if the number of free blocks decreased of 100+ blocks ?

Comment by Andreas Dilger [ 30/Oct/13 ]

What if the xattrs are stored more efficiently than 1 per 4kB (e.g. MDT formatted with 1kB blocksize)? I don't mind having a larger margin, but I think the test could start to fail silently if the error margin is too large. I think 32-64kB is large enough to avoid future problems, and smaller than the 400kB that would be seen with 100 4kB blocks or even 400 1kB blocks.

Comment by Bruno Faccini (Inactive) [ 04/Nov/13 ]

Patch raising threshold, to detect out-of inode LOV EA alloc, from 8 to 16 is at http://review.whamcloud.com/8156.

Comment by Bruno Faccini (Inactive) [ 27/Nov/13 ]

Patch has landed.

Comment by James Nunez (Inactive) [ 13/Mar/14 ]

I'm seeing a similar (same?) error again in 2.5.1-RC3 with error "sanity test_57b: @@@@@@ FAIL: MDC before 46667332 != after 46667320". The results are at https://maloo.whamcloud.com/test_sessions/f6c5b8e6-aac8-11e3-a41c-52540035b04c

I'm running on the OpenSFS cluster with similar set up to what was originally reported.

It looks like this patch was not back ported to b2_5.

Comment by James Nunez (Inactive) [ 13/Mar/14 ]

Patch for b2_5 at http://review.whamcloud.com/#/c/9652/

Comment by James Nunez (Inactive) [ 09/Apr/14 ]

http://review.whamcloud.com/#/c/9652/ landed to b2_5.

Generated at Sat Feb 10 01:39:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.