[LU-12415] conf-sanity test 69 fails with 'OST replacement created too many inodes; X' Created: 10/Jun/19  Updated: 18/Apr/23  Resolved: 27/Jun/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: zfs
Environment:

ZFS


Issue Links:
Duplicate
duplicates LU-12404 conf-sanity test 69 fails with 'creat... Resolved
Related
is related to LU-11760 formatted OST recognition change Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

conf-sanity test_69 fails with 'OST replacement created too many inodes; 96444' for ZFS testing only. Looking at the client test_log, we see

trevis-14vm12: osc.lustre-OST0000-osc-MDT0000.ost_server_uuid in FULL state after 11 sec
mount lustre on /mnt/lustre.....
Starting client: trevis-14vm9.trevis.whamcloud.com:  -o user_xattr,flock trevis-14vm12@tcp:/lustre /mnt/lustre
CMD: trevis-14vm9.trevis.whamcloud.com mkdir -p /mnt/lustre
CMD: trevis-14vm9.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-14vm12@tcp:/lustre /mnt/lustre
On OST0, 96444 used inodes
 conf-sanity test_69: @@@@@@ FAIL: OST replacement created too many inodes; 96444 

There’s no error messages in the console logs that indicate a problem happened.

conf-sanity test 69 started failing with this error message on 2019-05-27 with Lustre version 2.12.53.62. Here are some links to failed conf-sanity test 69 logs:
https://testing.whamcloud.com/test_sets/c388c3a4-8175-11e9-a028-52540065bddc
https://testing.whamcloud.com/test_sets/d4fb934c-8412-11e9-af1f-52540065bddc
https://testing.whamcloud.com/test_sets/7ed3d6f0-86dd-11e9-b8e0-52540065bddc
https://testing.whamcloud.com/test_sets/93bc20f4-8b6a-11e9-9bb5-52540065bddc



 Comments   
Comment by Patrick Farrell (Inactive) [ 10/Jun/19 ]

While the test doesn't give enough output for me to be sure, I suspect the culprit is LU-12396

(For a lot of ZFS failures)

Comment by Andreas Dilger [ 27/Jun/19 ]

This is the same root cause as LU-12404, namely the patch from LU-11760 allowing too many objects to be created, which is what test_69 is exactly trying to detect, but it was skipped because it is in the SLOW group (about 10 minutes per test).

Generated at Sat Feb 10 02:52:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.