[LU-4061] replay-single test_73c: MDS returns error when no objects available Created: 04/Oct/13  Updated: 09/Jan/20  Resolved: 09/Jan/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 10884

 Description   

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite run:
http://maloo.whamcloud.com/test_sets/24f0ed88-2cdb-11e3-8631-52540035b04c
http://maloo.whamcloud.com/test_sets/6818a392-1c4d-11e3-9961-52540035b04c

The sub-test test_73c failed with the following error:

test_73c failed with 3

Info required for matching: replay-single 73c



 Comments   
Comment by Andreas Dilger [ 04/Oct/13 ]

It looks like the MDS timed out trying to create any objects on the OSTs (which are very slow due to ZFS on a single VM disk), and returned an error back to multiop doing open(O_CREATE|O_RDWR). The MDS shouldn't ever return an error during create when all of the OSTs are out of objects. Instead, the MDS should try indefinitely to create the OST objects, and it can return -EINPROGRESS (instead of the current -EIO) to the client so that it will retry without blocking up an MDS thread.

15:14:44:Lustre: DEBUG MARKER: == replay-single test 73c: open(O_CREAT), unlink, replay, reconnect at last_replay, close == 15:13:31 (1380838411)
15:14:44:Lustre: lustre-OST0006-osc-MDT0000: slow creates, last=[0x0:0x1:0x0], next=[0x0:0x1:0x0], reserved=0, syn_changes=1, syn_rpc_in_progress=8, status=0
15:14:44:Lustre: lustre-OST0000-osc-MDT0000: slow creates, last=[0x0:0x1:0x0], next=[0x0:0x1:0x0], reserved=0, syn_changes=0, syn_rpc_in_progress=17, status=0
15:14:44:Lustre: lustre-OST0001-osc-MDT0000: slow creates, last=[0x0:0x1:0x0], next=[0x0:0x1:0x0], reserved=0, syn_changes=17, syn_rpc_in_progress=8, status=0
15:14:44:LNet: Service thread pid 3621 completed after 60.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
15:14:44:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-single test_73c: @@@@@@ FAIL: test_73c failed with 3 
Comment by Andreas Dilger [ 09/Jan/20 ]

Close old bug

Generated at Sat Feb 10 01:39:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.