[LU-2857] 2.1.4<->2.4.0 interop: sanity test_76: FAIL: inode slab grew from 11183 to 12183 Created: 25/Feb/13  Updated: 26/Mar/18  Resolved: 28/Oct/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0, Lustre 2.4.0, Lustre 2.1.4, Lustre 1.8.9
Fix Version/s: Lustre 2.1.5

Type: Bug Priority: Major
Reporter: Jian Yu Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: yuc2
Environment:

Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/176
Lustre master server build: http://build.whamcloud.com/job/lustre-master/1269
Distro/Arch: RHEL6.3/x86_64


Issue Links:
Duplicate
is duplicated by LU-2036 sanity.sh test_76: fails with lod/osp... Resolved
Related
is related to LU-10131 Update inode attributes on unlink Resolved
Severity: 3
Rank (Obsolete): 6920

 Description   

The sanity test 76 failed as follows:

== sanity test 76: confirm clients recycle inodes properly ====== 01:56:01 (1361526961)
before inodes: 11183
after inodes: 12183
wait 2 seconds inodes: 12183
wait 4 seconds inodes: 12183
wait 6 seconds inodes: 12183
wait 8 seconds inodes: 12183
wait 10 seconds inodes: 12183
wait 12 seconds inodes: 12183
wait 14 seconds inodes: 12183
wait 16 seconds inodes: 12183
wait 18 seconds inodes: 12183
wait 20 seconds inodes: 12183
wait 22 seconds inodes: 12183
wait 24 seconds inodes: 12183
wait 26 seconds inodes: 12183
wait 28 seconds inodes: 12183
wait 30 seconds inodes: 12183
wait 32 seconds inodes: 12183
 sanity test_76: @@@@@@ FAIL: inode slab grew from 11183 to 12183

Maloo report: https://maloo.whamcloud.com/test_sets/8b6bd050-7d77-11e2-85d0-52540035b04c



 Comments   
Comment by Sarah Liu [ 13/Mar/13 ]

another instance: https://maloo.whamcloud.com/test_sets/82b6ad6c-8903-11e2-b643-52540035b04c

Comment by Jian Yu [ 13/Mar/13 ]

Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/186
Lustre master server build: http://build.whamcloud.com/job/lustre-master/1302
Distro/Arch: RHEL6.3/x86_64

The same issue occurred: https://maloo.whamcloud.com/test_sets/c8a1a6e8-8b55-11e2-965f-52540035b04c

Comment by Andreas Dilger [ 18/Mar/13 ]

Another hit on 2013-03-15 08:47:01 UTCs:

RC2--PRISTINE-2.6.32-279.14.1.el6.x86_64 (x86_64)
jenkins-arch=x86_64,build_type=server,distro=el6,ib_stack=inkern (x86_64)

https://maloo.whamcloud.com/test_sets/92de9552-8dfd-11e2-81eb-52540035b04c

Comment by Andreas Dilger [ 18/Mar/13 ]

I've pushed http://review.whamcloud.com/5755 to disable this on b2_1.

Alex's comments in LU-2036:

OSP objects are not removed immediately, so ll_d_iput() in unlink path finds extent locks and do not reset nlink to 0 leaving the inode in the cache.

we need to figure out another way to purge inode...

on MDS corresponding inode does get nlink=0 only if the inode is actually being destroyed (orphans get nlink=1 being on PENDING/). this zero nlink propagated to the client can be a signal for ELC on the client ?

Comment by Sarah Liu [ 21/Mar/13 ]

also seen in interop between 2.3.0 client and 2.4 server:
https://maloo.whamcloud.com/test_sets/491428e6-8fed-11e2-9b28-52540035b04c

Comment by Jian Yu [ 14/Aug/13 ]

Lustre client build: http://build.whamcloud.com/job/lustre-b1_8/258/ (1.8.9-wc1)
Lustre server build: http://build.whamcloud.com/job/lustre-b2_4/31/

sanity test 76 hit the same failure:
https://maloo.whamcloud.com/test_sets/03a55dac-0478-11e3-90ba-52540035b04c

Comment by Jian Yu [ 04/Sep/13 ]

Lustre client: http://build.whamcloud.com/job/lustre-b2_3/41/ (2.3.0)
Lustre server: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1)

sanity test 76 hit the same failure:
https://maloo.whamcloud.com/test_sets/cde5bab8-14f3-11e3-9828-52540035b04c

Comment by Jian Yu [ 05/Sep/13 ]

Lustre client build: http://build.whamcloud.com/job/lustre-b1_8/258/ (1.8.9-wc1)
Lustre server build: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1)

sanity test 76 hit the same failure:
https://maloo.whamcloud.com/test_sets/77da1820-15ad-11e3-87cb-52540035b04c

Comment by Jian Yu [ 19/Dec/13 ]

Lustre client: http://build.whamcloud.com/job/lustre-b2_3/41/ (2.3.0)
Lustre server: http://build.whamcloud.com/job/lustre-b2_4/69/ (2.4.2 RC1)

sanity test 76 hit the same failure:
https://maloo.whamcloud.com/test_sets/3c82ea6e-685e-11e3-a16f-52540035b04c

Comment by James Nunez (Inactive) [ 24/Apr/15 ]

sanity test 76 is still failing on master (pre-2.8.0). Recent result with logs is at https://testing.hpdd.intel.com/test_sets/7a914250-e8a1-11e4-b3c6-5254006e85c2

Comment by parinay v kondekar (Inactive) [ 10/Aug/15 ]

This issue is also seen on Seagate Test infrastructure quite regularly.
Client - 2.4.3, 2.5.1.x6
Server - 2.5.1.x6

Comment by James A Simmons [ 16/Aug/16 ]

Old ticket for unsupported version

Comment by Andreas Dilger [ 16/Aug/16 ]

James, I'm appreciative that you are closing old tickets, but in this case (maybe others) that are marked "always_except" it means the test that is failing has been excluded from the test list, but as James Nunez reported for 2.8.0 the test is still failing, and it does not mean that this bug is fixed.

Either the test needs to be fixed by a code or test script fix, or if the decision is that the tested functionality is obsolete then the test should be removed completely.

Comment by James A Simmons [ 16/Aug/16 ]

I purpose that a special tracker ticket be created that covers all the excepted test. This way these types of tickets will not get lost.

Comment by Peter Jones [ 16/Aug/16 ]

That's what the always_except lable is used for atm

Comment by Andreas Dilger [ 28/Oct/17 ]

Per Patrick's comment in LU-2036:

This appears to be fixed by:
https://jira.hpdd.intel.com/browse/LU-10131
https://review.whamcloud.com/#/c/29651/

Test passes with that patch, fails without it.

Comment by Andreas Dilger [ 28/Oct/17 ]

Fixed via LU-10131.

Generated at Sat Feb 10 01:28:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.