[LU-11972] lustre IB client always hang when memory size small than 40GB Created: 15/Feb/19 Updated: 01/Apr/19 Resolved: 01/Apr/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Question/Request | Priority: | Minor |
| Reporter: | sebg-crd-pm (Inactive) | Assignee: | Peter Jones |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
[VM] |
||
| Attachments: |
|
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
I have created one VM with IB and mount lustre client ok. We tested lustre client io access in VM. (dd if=/dev/zero of=/mnt/lustre/testfile bs=1M ) This issue can not be reproduced when VM memory size is 80GB We have test with mlx5_core: 4.4-2.0.7 / 4.5-1.0.1.0 or Lustre : 2.10.5 / 2.10.6 , The VM syslog print these messages (see attached file) ib_err.txt ..... LustreError: 1854:0:(events.c:199:client_bulk_callback()) event type 1, status -5, desc ffff8bafb2976400
We have got respond from Mellanox >>Our RnD review the syslog and checked code, they give conclusion below, FYI. I think >>this issue is related with uplevel Lustre design, you can open defect for their community to fix.
|
| Comments |
| Comment by Patrick Farrell (Inactive) [ 15/Feb/19 ] |
|
In order to understand the error, we'd like to get some Lustre debug logs with appropriate tracing. Please run these commands on the client: lctl set_param debug=+rpctrace; lctl set_param debug=+net; lctl clear lctl mark "debug start"
dd if=/dev/zero of=/mnt/lustre/testfile bs=1M lctl mark "debug finish" lctl set_param debug=-rpctrace; lctl set_param debug=-net
lctl dk > /tmp/log
Please attach the log file to this ticket (you may need to compress it first). This will give us more info to go on. |
| Comment by sebg-crd-pm (Inactive) [ 20/Feb/19 ] |
|
update: 1This issue also happened when VM memory size is 80GB now. 2.It seems easily to reproduce after we add lnet router node. 3. attache log dbg1.tgz |
| Comment by sebg-crd-pm (Inactive) [ 20/Feb/19 ] |
| Comment by sebg-crd-pm (Inactive) [ 26/Feb/19 ] |
|
any suggestion? |
| Comment by sebg-crd-pm (Inactive) [ 01/Apr/19 ] |
|
The issue can not be reproduce in other server. It looks like hardware issue. So you can close it. Thanks. |