Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.1.1, Lustre 1.8.x (1.8.0 - 1.8.5)
Labels:
None
Environment:
https://github.com/chaos/lustre
Client: Lustre 1.8 BGP
Server: 2.1.1-3chaos

Severity:
3
Rank (Obsolete):
6415

A user application on our classified BGP system running a Lustre 1.8 client is having problems reading from 2.1 servers. We are still light on details about what exact errors the application is getting back from reads, if any. But on the client side we see reads timing out, lost connections, and EBUSY errors while reconnecting:

Request ost_read sent 675s ago to 172.18.102.48@tcp1 has timed out (limit 675s)
Connection to ls2-OST029f (at 172.18.102.48@tcp1) was lost; in progress operations using the service will wait for recovery to complete
An error occurred while communicating with 172.18.102.48@tcp1; the ost_connect operation failed with -16
(repeats several times)
Connection restored to ls2-OST029f (at 172.18.102.48@tcp1)

While on the server we get many of these corresponding events:

Lustre: ls2-OST029f: Client <uuid> reconnecting
Lustre: ls2-OST029f: Client <uuid> refused reconnection, still busy with 2 active RPCs
LustreError: ldlm_lib.c:2614:target_bulk_io()) @@@ build PUT failed: rc -107 ... rc 0/-1
Lustre: ls2-OST029f: Build IO read error with <uuid> ... client will retry: -107
Lustre: ldlm_lib.c:913:target_handle_connect()) ls2-OST-29f: connection from <uuid> ...

My understanding is that all of this should be transparent to the application and no error should propagate to user space unless the client is evicted. Is this correct?

Trackbacks

Lustre 1.8.x known issues tracker While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA

Assignee:: Zhenyu Xu

Reporter:: Ned Bass (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 13/Apr/12 9:13 PM

Updated:: 04/Jun/12 2:58 PM

Resolved:: 04/Jun/12 2:58 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates