[LU-3447] Client RDMA too fragmented: 128/255 src 128/256 dst frags Created: 10/Jun/13 Updated: 15/Mar/14 Resolved: 15/Mar/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Erich Focht | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | client | ||
| Environment: |
Lustre servers running 2.1.5, Lustre clients with 1.8.9. |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 8618 |
| Description |
|
During an IOR-like benchmark doing directIO from multiple clients (16, 64) clients get disconnected and evicted. The MPI process dies in misery and some of it's processes aren't even killable. We've seen that there was a similar bug a while ago that was marked as solved, it was occuring on lnet routers (https://bugzilla.lustre.org/show_bug.cgi?id=13607). This one is on clients. What can lead to the "RDMA too fragmented" issue? Any hint or suggestion? Client log messages are in the attached file. Regards, |
| Comments |
| Comment by Erich Focht [ 11/Jun/13 ] |
|
Increasing the MTT size on the client nodes seems to solve the problem. For instructions: http://community.mellanox.com/docs/DOC-1120 Having a more meaningful error message would be nice. This bug can be closed. |
| Comment by Bruno Faccini (Inactive) [ 12/Jun/13 ] |
|
Hello Eric, |
| Comment by Peter Jones [ 12/Jun/13 ] |
|
Bruno Can you please advise? Thanks Peter |
| Comment by Erich Focht [ 13/Jun/13 ] |
|
Hi Bruno, is that option available on 1.8.9 as well as on 2.X? Thanks for pointing me to it! It is difficult to do that in the customer's environment if we need to set this on both clients and servers, he has 3-4 Lustre filesystems (not all from us), a mix of versions, and 3.5k clients. But I'll try to find an opportunity to do it and discuss with the customer. Best regards, |
| Comment by Bruno Faccini (Inactive) [ 14/Jun/13 ] |
|
Hello Eric, |
| Comment by Bruno Faccini (Inactive) [ 12/Jul/13 ] |
|
Hello Eric, |
| Comment by Erich Focht [ 25/Jul/13 ] |
|
Hi Bruno, unfortunately we cannot use the module option there. It is a huge enironment with several Lustre setups and the customer is not willing to switch that option over everywhere. Which we'd need to do (as far as I understand) on clients as well as on servers. So we can't switch the clients selectively over. But we will test it as soon as we can on another (upcoming) installation. Regards, |
| Comment by John Fuchs-Chesney (Inactive) [ 08/Mar/14 ] |
|
Erich, |
| Comment by John Fuchs-Chesney (Inactive) [ 15/Mar/14 ] |
|
Customer was able to resolve problem. No more required here. |