[LU-1322] 1.8 client hang with 1.8.4 server Created: 13/Apr/12 Updated: 22/Feb/13 Resolved: 04/Jan/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.6, Lustre 1.8.x (1.8.0 - 1.8.5) |
| Fix Version/s: | Lustre 1.8.9 |
| Type: | Bug | Priority: | Major |
| Reporter: | Peng Tao | Assignee: | Keith Mannthey (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | emc, patch | ||
| Environment: |
CentOS 5 with 1.8.6-WC1 clients and 1.8.4 servers. |
||
| Severity: | 3 |
| Epic: | interoperability |
| Rank (Obsolete): | 4028 |
| Description |
|
During running some tests, we found client hang during tests. Further investigation shows that it is because client is looping for ever in ll_readdir_page(). The first ll_dir_dentry is correct but some of following up ll_dir_dentry record is all NULL. crash> struct ll_dir_entry 0xffff8105f5818000 crash> struct ll_dir_entry 0xffff8105f5818a19 After applying bellow changes, tests passes smoothly and the debug message is printed a lot. diff --git a/lustre/llite/dir.c b/lustre/llite/dir.c if (de->lde_inode != 0) { It may not be the right fix as I didn't figure out why the page is partially zeroed. |
| Comments |
| Comment by Peng Tao [ 25/Jun/12 ] |
|
This is also reproduced with 1.8.8-WC1 clients. From a generated kernel dump, it appears that there are only three valid dentries in the page, starting from offset 0, 12, 24. However, in ll_readdir(), kernel is asking to read a dentry from page offset 90 and therefore read into garbage data. crash> struct ll_dir_entry 0xffff88051bd73000 Although it doesn't make much sense for application to seek randomly within dir page, Lustre should really work with the situation. |
| Comment by Peng Tao [ 25/Jun/12 ] |
|
patch has been uploaded to http://review.whamcloud.com/#change,3181 |
| Comment by Keith Mannthey (Inactive) [ 28/Nov/12 ] |
|
This patch has been properly acked for acceptace but not much code is being taken into 1.8 at this point. As this is a Major bug I am sure it is still under consideration. |
| Comment by Keith Mannthey (Inactive) [ 04/Jan/13 ] |
|
This code has been merged. |