[LU-12404] conf-sanity test 69 fails with 'create file after reformat' Created: 07/Jun/19  Updated: 25/Nov/19  Resolved: 10/Sep/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0, Lustre 2.12.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Sergey Cheremencev
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-12415 conf-sanity test 69 fails with 'OST r... Resolved
Related
is related to LU-8158 conf-sanity test_69: create file afte... Open
is related to LU-11760 formatted OST recognition change Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

conf-sanity test_69 fails with 'create file after reformat'

Looking at the client test log from the failure https://testing.whamcloud.com/test_sets/efc1357e-8895-11e9-8c65-52540065bddc, we see the following error

Starting client: trevis-18vm4.trevis.whamcloud.com:  -o user_xattr,flock trevis-18vm11@tcp:/lustre /mnt/lustre
CMD: trevis-18vm4.trevis.whamcloud.com mkdir -p /mnt/lustre
CMD: trevis-18vm4.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-18vm11@tcp:/lustre /mnt/lustre
touch: cannot touch '/mnt/lustre/d69.conf-sanity/f69.conf-sanity-last': No space left on device
 conf-sanity test_69: @@@@@@ FAIL: create file after reformat 

This looks like LU-8158 but this is happening for non-SLES clients.

Looking at the OST (vm6) console log, we see

[37858.976026] Lustre: DEBUG MARKER: /usr/sbin/lctl mark osc.lustre-OST0000-osc-MDT0000.ost_server_uuid in FULL state after 3 sec
[37859.170354] Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-MDT0000.ost_server_uuid in FULL state after 3 sec
[37862.106450] LustreError: 30205:0:(ofd_dev.c:1709:ofd_create_hdl()) lustre-OST0000: unable to precreate: rc = -28
[37879.498553] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_69: @@@@@@ FAIL: create file after reformat 
[37879.685068] Lustre: DEBUG MARKER: conf-sanity test_69: @@@@@@ FAIL: create file after reformat

A different ofd_create_hdl() error is seen in LU-8158, but the root cause could be the same.

We've started seeing this test fail with this ofd_create_hdl() error since 2019-05-27 Lustre version 2.12.53.62. Here are links to a few of the failed test session logs:
https://testing.whamcloud.com/test_sets/e3585f0a-8148-11e9-a028-52540065bddc
https://testing.whamcloud.com/test_sets/b1e108b8-8135-11e9-b8e0-52540065bddc
https://testing.whamcloud.com/test_sets/5cf8cb50-83de-11e9-a028-52540065bddc
https://testing.whamcloud.com/test_sets/6bd12340-83d2-11e9-af1f-52540065bddc



 Comments   
Comment by Andreas Dilger [ 27/Jun/19 ]

I suspect that this problem is caused by patch https://review.whamcloud.com/33833 which was committed 2019-05-25 and affects exactly the number of objects created after reformat that test_69() is verifying:

commit d07d9c5ed0aa1d6614944c7d1e0ca55cba301dc4
Author:     Sergey Cheremencev <c17829@cray.com>
AuthorDate: Fri Aug 24 17:03:45 2018 +0300
Commit:     Oleg Drokin <green@whamcloud.com>
CommitDate: Sat May 25 04:55:51 2019 +0000

LU-11760 ofd: formatted OST recognition change
    
    Modern system is fast enough to create above
    100 000(5 * OST_MAX_PRECREATE) objects during commit interval.
    Increase the difference between MDS last_used ID
    and OST LAST_ID to 500 000 to avoid gaps after OST failover.

The problem is that if the OST filesystem is does not have enough free inodes to store an extra 500k objects at recovery time, and the OST has previously created more objects than this, then the OST will run out of space during this test.

Comment by Andreas Dilger [ 28/Jun/19 ]

I put a more detailed comment on how to fix this in LU-11760. Maybe I should have left that ticket closed, and we should track the fix here?

Comment by Sergey Cheremencev [ 28/Jun/19 ]

Suggest to leave this open. And do revert of https://review.whamcloud.com/#/c/33833/ with LU-12404 while new patch will be landed under LU-11760.

Comment by Patrick Farrell (Inactive) [ 10/Sep/19 ]

Fixed under LU-11760.

Generated at Sat Feb 10 02:52:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.