[LU-3719] divide error in ldiskfs_mb_normalize_request on MDT Created: 07/Aug/13 Updated: 26/Oct/17 Resolved: 30/Sep/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.5 |
| Fix Version/s: | Lustre 2.5.0, Lustre 2.4.2, Lustre 2.11.0, Lustre 2.10.2 |
| Type: | Bug | Priority: | Major |
| Reporter: | Kit Westneat (Inactive) | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | mn1 | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9578 | ||||||||||||
| Description |
|
IU ran into an issue on their MDT where it would constantly crash after recovery. We finally got a good core dump and were able to get this bt: This looks almost identical to We were able to get the MDT mounted after running e2fsck and then tune2fs -E stripe_width=0,stride=0. Apparently mke2fs had set them based on values from LVM. I checked the line at ldiskfs_mb_normalize_request+244 in mballoc.c: It looks like the s_mb_prealloc_table isn't getting fully populated. I inspected it with crash, and that looks to be the case: 0xc00 (3072) was the reported stripe_width by dumpe2fs. It appears that ldiskfs_mb_init attempts to create three entries in the table, stripe * 1, *2, and * 4. However, ldiskfs_mb_prealloc_table_add can silently fail if the entry value is > (sbi->s_blocks_per_group - 1 - 1 - sbi->s_itb_per_group). This can cause a situation where the table size is still 3, but one or more entry is zero. I'm not sure what the best fix is. It seems as if returning an error from ldiskfs_mb_prealloc_table_add and adjusting the table size would be ideal. Alternatively, ldiskfs_mb_normalize_request could check to make sure the table doesn't have a zero, something like: Thanks. |
| Comments |
| Comment by Peter Jones [ 07/Aug/13 ] |
|
Thanks for the report Kit. |
| Comment by Peter Jones [ 09/Aug/13 ] |
|
Bobijam Could you please advise on this one? Thanks Peter |
| Comment by Zhenyu Xu [ 12/Aug/13 ] |
|
patch tracking at http://review.whamcloud.com/7297 |
| Comment by Zhenyu Xu [ 10/Sep/13 ] |
|
master version http://review.whamcloud.com/7591 |
| Comment by Peter Jones [ 24/Sep/13 ] |
|
Landed for 2.5.0 |
| Comment by Bob Glossman (Inactive) [ 24/Sep/13 ] |
|
I notice only the 6.4 version of the ldiskfs patch has been changed. Do other versions also need adjustment with similar changes? |
| Comment by James A Simmons [ 24/Sep/13 ] |
|
Yes the SLES11 platforms need to be updated as well. |
| Comment by Bob Glossman (Inactive) [ 24/Sep/13 ] |
|
Reopened to address other versions of the ldiskfs patch. In particular need similar changes in the sles11 version. |
| Comment by Bob Glossman (Inactive) [ 26/Sep/13 ] |
|
sles11 sp2/sp3 version |
| Comment by Peter Jones [ 30/Sep/13 ] |
|
Landed for 2.5 |
| Comment by Gerrit Updater [ 21/Jun/17 ] |
|
Yang Sheng (yang.sheng@intel.com) uploaded a new patch: https://review.whamcloud.com/27748 |
| Comment by Gerrit Updater [ 19/Jul/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27748/ |
| Comment by Gerrit Updater [ 26/Jul/17 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28228 |
| Comment by Gerrit Updater [ 26/Oct/17 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28228/ |