[LU-1426] I/O failed after STONITH -- IMP_INVALID Created: 21/May/12  Updated: 15/Mar/14  Resolved: 15/Mar/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.7
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Roger Spellman (Inactive) Assignee: Yang Sheng
Resolution: Incomplete Votes: 0
Labels: None
Environment:

Lustre servers running Lustre 1.8.7
Lustre clients running Lustre 1.8.4
There are two MDSes, in an active-standby configuration.
There are two OSSes in an active-active configuration.
There are a total of 8 OSTs, 4 on each OSS.
Lustre Network is 10G


Attachments: File client-dmesg.out     File messages    
Severity: 3
Rank (Obsolete): 10099

 Description   

There are 8 clients, each creating files on one OST.
The 10G cable is removed from the first OSS. Within a few minutes, that OSS is killed by STONITH.
All the OSTs mounted on the peer OSS.
However, the test on one of the clients failed, with error:

cp: cannot fstat `/mnt/lustre/ost/ost-01/file.002645': Interrupted system call

The test on all the other clients was fine. Here is a bit of the client's dmesg output:

LustreError: 167-0: This client was evicted by lstr96-OST0001; in progress operations using this service will fail.
LustreError: 22101:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc -4, returning -EIO
LustreError: 24435:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@ffff810105f98000 x1400324452088101/t0 o4->lstr96-OST0001_UUID@10.7.90.4@tcp:6/4 lens 448/608 e 0 to 1 dl 0 ref 2 fl Rpc:/0/0 rc 0/0
LustreError: 24435:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@ffff8101236a5400 x1400324452088110/t0 o4->lstr96-OST0001_UUID@10.7.90.4@tcp:6/4 lens 448/608 e 0 to 1 dl 0 ref 2 fl Rpc:/0/0 rc 0/0
LustreError: 24435:0:(client.c:858:ptlrpc_import_delay_req()) Skipped 8 previous similar messages

Unfortunately, there are no dates in the dmesg output, and /var/log/messages on the client has nothing in it. The problem occurred May 21 15:09, as will be seen in the log files from the OSS

I will attach the rest of this log, and the logs from the OSS. Please let me know if you need more info.



 Comments   
Comment by Peter Jones [ 24/May/12 ]

Yangsheng

Could you please look into this one?

Thanks

Peter

Comment by Yang Sheng [ 05/Jun/12 ]

This log means is wrong:

LustreError: 22101:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc -4, returning -EIO

        if (rc != 0) {
                CERROR("obd_enqueue returned rc %d, returning -EIO\n", rc);
                RETURN(rc > 0 ? -EIO : rc);
        }

In fact, it return -4(-EINTR).

Comment by John Fuchs-Chesney (Inactive) [ 05/Mar/14 ]

Roger – does this require any more work, or can I mark it as resolved/fixed?
Thanks,
~ jfc.

Comment by John Fuchs-Chesney (Inactive) [ 15/Mar/14 ]

Looks like we will not pursue this issue any further.

Generated at Sat Feb 10 01:16:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.