[LU-4485] Some error message on lustre client Created: 14/Jan/14 Updated: 27/Feb/14 Resolved: 27/Feb/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.7 |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major |
| Reporter: | Supporto Lustre Jnet2000 (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
operating system redhat 5.7 |
||
| Attachments: |
|
| Rank (Obsolete): | 12276 |
| Description |
|
On a client we have an initial problem with quota "kernel: LustreError: 11-0: an error occurred while communicating with 10.121.13.59@tcp. The ost_write operation failed with -122" , after we have some error message of which we do not understand the meaning. Do you have any suggestions? Regards Augusto Casciola |
| Comments |
| Comment by Peter Jones [ 14/Jan/14 ] |
|
Niu Could you please advise with this ticket? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 14/Jan/14 ] |
|
The messages means "write failed with EDQUOT (run out of quota)", looks some user is over quota. |
| Comment by Niu Yawei (Inactive) [ 14/Jan/14 ] |
|
You can use "lfs quota -u uid/gid -v fsname" to check the quota limit and usage for the user, if it's not over quota, could you upload the syslog on OSS (10.121.13.59@tcp) to see why it returned EDQUOT? |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 14/Jan/14 ] |
|
In the attachment file we have these errors after the error of quota. Refer to the quota ? "2014-01-07T17:11:40.337576+01:00 osiride-lp-041 kernel: Lustre: 23101:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1436582400258712 sent from home-OST0008-osc-ffff81063fc2f800 to NID 10.121.13.28@tcp 7s ago has timed out (7s prior to deadline). |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 14/Jan/14 ] |
|
Sorry for the misunderstanding. We need to know because the connection to home-OST0008, home-OST0008 and home-OST0006 was lost by client and the meaning of the "failure to allocate a tage" error. |
| Comment by Niu Yawei (Inactive) [ 15/Jan/14 ] |
|
The "failure to allocate a tage" means Lustre logging system can't allocate buffer to store debug message, and the result is that some debug message will be lost. It won't break the connection between client and OSTs. So, client lost connection to OST0006, OST0007 and OST0008? and you want to know why the client lost connections? |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 11/Feb/14 ] |
|
Hi, we want know because there has been the client lost of connection to OST0006, OST0007 and OST0008. Regards |
| Comment by Niu Yawei (Inactive) [ 14/Feb/14 ] |
2014-01-07T17:11:57.274407+01:00 osiride-lp-041 kernel: Lustre: 7567:0:(import.c:517:import_select_connection()) home-OST000a-osc-ffff81063fc2f800: tried all connections, increasing latency to 3s
I suspect it's a network problem, not related to the write failures (-122 EDUOT error). |
| Comment by Gabriele Paciucci (Inactive) [ 27/Feb/14 ] |
|
I have talked with the customer and we agreed that this is a network problem. We can close this issue. In case of other similar errors, we can activate the debug daemon in order to have more informations. |
| Comment by Peter Jones [ 27/Feb/14 ] |
|
ok - thanks Gabriele |