[LU-13184] conf-sanity test_112: problem creating f112.conf-sanity.0 on OST0000 Created: 31/Jan/20  Updated: 17/Feb/21  Resolved: 01/Feb/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Andreas Dilger
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-13221 conf-sanity test_112: FAIL: MDS start... Reopened
is related to LU-12818 replay-single test_70b and other test... Resolved
is related to LU-13813 conf-sanity test_112: can't put impor... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/e0c5003a-4327-11ea-bffa-52540065bddc

test_112 failed with the following error in the test logs:

lfs setstripe: setstripe error for '/mnt/lustre/f112.conf-sanity.0': Numerical result out of range
problem creating f112.conf-sanity.0 on OST0000

It appears that the MDT0000-OST0000 connection has not completed in time, so creating the new file on OST0000 immediately after mounting the filesystem can intermittently fail. The MDS0 debug log reports:

1580363348.206540:0:25757:0:(mdt_handler.c:1755:mdt_getattr_name_lock()) getattr with lock for [0x200000007:0x1:0x0]/f112.conf-sanity.0, ldlm_rep = ffff9e67e08e51f8
1580363348.214294:0:25757:0:(osp_dev.c:743:osp_statfs()) Process leaving (rc=-107)
1580363348.214295:0:25757:0:(lod_qos.c:135:lod_statfs_and_check()) lustre-OST0000-osc-MDT0000: turns inactive
1580363348.214298:0:25757:0:(osp_dev.c:760:osp_statfs()) lustre-OST0001-osc-MDT0000: 37598 blocks, 35206 free, 31638 avail, 1 reserved mb low, 3 reserved mb high,50000 files, 49732 free files
1580363348.214300:0:25757:0:(osp_dev.c:779:osp_statfs()) Process leaving (rc=0 : 0 : 0)
1580363348.214301:0:25757:0:(lod_qos.c:135:lod_statfs_and_check()) lustre-OST0001-osc-MDT0000: turns inactive
1580363348.214306:0:25757:0:(lod_qos.c:2435:lod_qos_prep_create()) Process leaving via out (rc=-34)
1580363348.214308:0:25757:0:(lod_qos.c:2580:lod_prepare_create()) Process leaving (rc=-34)
1580363348.214312:0:25757:0:(lod_object.c:5594:lod_declare_striped_create()) Process leaving via out (rc=-34)
1580363348.214316:0:25757:0:(lod_object.c:3621:lod_declare_xattr_set()) Process leaving (rc=-34)
1580363348.214317:0:25757:0:(mdd_dir.c:1924:mdd_create_data()) Process leaving via stop (rc=-34)
1580363348.214325:0:25757:0:(mdd_dir.c:1947:mdd_create_data()) Process leaving (rc=-34)
1580363348.214326:0:25757:0:(mdt_open.c:134:mdt_create_data()) Process leaving (rc=-34)
1580363348.214326:0:25757:0:(mdt_open.c:361:mdt_mfd_open()) Process leaving (rc=-34)
1580363348.214327:0:25757:0:(mdt_open.c:640:mdt_finish_open()) Process leaving (rc=-34)

and the client debug log:

1580363348.241716:0:15508:0:(mdc_locks.c:1176:mdc_finish_intent_lock()) D_IT dentry  intent: open status -34 disp 3 rc -34
1580363348.241717:0:15508:0:(mdc_locks.c:1324:mdc_intent_lock()) Process leaving (rc=-34)
1580363348.241721:0:15508:0:(file.c:586:ll_intent_file_open()) lock enqueue: err: -34

when in fact MDT0000 doesn't finish connecting to OST0000 and precreating objects until afterward:

1580363348.461378:0:12583:0:(import.c:1169:ptlrpc_connect_interpret()) connected to replayable target: lustre-OST0000_UUID
1580363348.461379:0:12583:0:(import.c:86:import_set_state_nolock()) ffff9e67d8e4e800 lustre-OST0000_UUID: changing import state from CONNECTING to FULL
1580363348.463625:0:12584:0:(osp_precreate.c:1048:osp_pre_update_msfs()) lustre-OST0000-osc-MDT0000: blocks=37598 free=35139 avail=31572 avail_mb=123 hwm_mb=3 files=50000 ffree=49311 state=0

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
conf-sanity test_112 - problem creating f112.conf-sanity.0 on OST0000



 Comments   
Comment by Gerrit Updater [ 31/Jan/20 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37393
Subject: LU-13184 tests: wait for OST startup in test_112
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 707f6a0a418ba90f03e1cd63565393630f3b1599

Comment by Andreas Dilger [ 05/Sep/20 ]

+1 on master

Comment by Chris Horn [ 18/Sep/20 ]

+1 on master https://testing.whamcloud.com/test_sets/c43c6211-922f-4421-adb1-0e138e9c5cb4

Comment by Etienne Aujames [ 07/Oct/20 ]

+1 on master https://testing.whamcloud.com/test_sets/4f302e2b-5838-4feb-af4f-7e1cb6a15caf

Comment by Andreas Dilger [ 26/Jan/21 ]

It looks like the test_112 failures reporting "problem creating f112.conf-sanity.0 on OST0000" (marked LU-12818) are exclusively being hit on osd-zfs backends (20 FAIL vs 159 PASS in the past four weeks).

Conversely, for osd-ldiskfs backends, they are failing test_112 with "import is not in FULL state" (LU-13813) and "MDS start failed" (LU-13221).

Comment by Gerrit Updater [ 01/Feb/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37393/
Subject: LU-13184 tests: wait for OST startup in test_112
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e5080203e1358673ad7518c2b86bc9a5fc654b5f

Comment by Peter Jones [ 01/Feb/21 ]

Landed for 2.14

Generated at Sat Feb 10 02:59:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.