[LU-849] NFS server not responding when running parallel-scale test_iorfpp Created: 15/Nov/11 Updated: 16/Jan/12 Resolved: 16/Jan/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | Lai Siyao |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
server and client: RHEL6-x86_64 build https://newbuild.whamcloud.com/job/lustre-master/353/ |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 6524 |
| Description |
|
When running parallel-scale test_iorfpp over NFS v3 got "nfs server not responding" error Lustre: DEBUG MARKER: ---- |
| Comments |
| Comment by Sarah Liu [ 15/Nov/11 ] |
|
NFSv4 has this issue too. |
| Comment by Sarah Liu [ 16/Nov/11 ] |
|
the attached are the logs from lustre client(NFS server) |
| Comment by Peter Jones [ 16/Nov/11 ] |
|
Lai Could you please comment on this one? Thanks Peter |
| Comment by Johann Lombardi (Inactive) [ 24/Nov/11 ] |
|
hm, several nfsd threads seems to be stuck in splice_read (somewhere in cl_page_list_disown() although it is not clear if the stack is reliable). |
| Comment by Sarah Liu [ 27/Nov/11 ] |
|
sure, will keep you updated. |
| Comment by Sarah Liu [ 28/Nov/11 ] |
|
debug log of lustre client(NFS server) |
| Comment by Johann Lombardi (Inactive) [ 28/Nov/11 ] |
|
ah, this is with the default debug mask. Could you please collect one with the debug mask set to -1? Thanks in advance. |
| Comment by Sarah Liu [ 29/Nov/11 ] |
|
debug -1 log |
| Comment by Oleg Drokin [ 03/Jan/12 ] |
|
Lai, Jinshan, we need to look into this as this potentially underscores how recent clio changes introduced some deadlocks in sendfile path. |
| Comment by Lai Siyao [ 03/Jan/12 ] |
|
nfsd uses splice read/write interface on new kernels, I tested some splice test (from LTP), and it could pass. I'll do more test later. |
| Comment by Lai Siyao [ 09/Jan/12 ] |
|
In my local test, I found it's stack overflow too. I'll try to decrease stack size a bit and verify it. |
| Comment by Lai Siyao [ 10/Jan/12 ] |
|
According to Jinshan's comment on |
| Comment by Peter Jones [ 16/Jan/12 ] |
|
let's track this under lu-861 and then open a new ticket if the problems still exist with those fixes landed |