[LU-2480] divide error in ldiskfs_mb_normalize_request Created: 12/Dec/12 Updated: 09/Jan/20 Resolved: 09/Jan/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0, Lustre 2.1.1, Lustre 2.1.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Alexander Oltu | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CPU: AMD Opteron(tm) Processor 6204 |
||
| Severity: | 4 |
| Rank (Obsolete): | 5821 |
| Description |
|
We get OSS crash on any attempt to write data to our Lustre FS. The file system is created from scratch with version 2.1.3 package. We have tried all kernel version from the Env. field. Initially I thought this was a kernel bug fixed in RHEL kernels-2.6.32-279.10.1.el6
On write attempt we get OSS crashes with the following console message : divide error: 0000 1 SMP Pid: 29280, comm: ll_ost_io_127 Not tainted 2.6.32-279.14.1.el6.x86_64 #1 Dell Inc. PowerEdge R715/0C5MMK This error stops us from deploying new lustre setup. Any help is greatly appreciated. |
| Comments |
| Comment by Bruno Faccini (Inactive) [ 12/Dec/12 ] |
|
Hello, |
| Comment by Alexander Oltu [ 12/Dec/12 ] |
|
No messages prior to crash. Fsck is OK: e2fsck -f LABEL=workcmn1-OST0000 |
| Comment by Bruno Faccini (Inactive) [ 12/Dec/12 ] |
|
Ok, thank's, and BTW I forgot to ask if any crash-dump has been taken at the time of the crashes ?? |
| Comment by Alexander Oltu [ 12/Dec/12 ] |
|
Yes, I have crash dumps they are around 100 MB. I can make a new crash dump with whamcloud kernel so that you have all debug symbols and I can provide vmcore file. Plz let me know where I can upload the file. BTW. I have tried mounting OST as an ext4 and writing files locally on the OSS is happening fine. But as soon as I mount it as ldiskfs it crashes. |
| Comment by Alexander Oltu [ 12/Dec/12 ] |
|
Bruno, I have dd'ed with zeros begging of OST, reformatted with additional mkfs.options, disabled max_sectors optimizations and used mkfsoptions like stride and stripe_width with smaller values. The OST stopped crashing. I am going to add optimizations and reformat with proper stride, stripe_width and see. |
| Comment by Bruno Faccini (Inactive) [ 12/Dec/12 ] |
|
That was my next question, after your previous comment about ext4 mounts beeing ok but not ldiskfs mounts !!..., how did you format your OSTs ??? |
| Comment by Alexander Oltu [ 12/Dec/12 ] |
|
I didn't use any mkfsoptions initially, because I couldn't find real block size for our DDN S2A9550. And surprisingly the default is : So now I am using our cache block size on DDNs of 1024k and RAID 8+2 so using the following options: It is not crashing now. So I will reformat all our OSTs and will make a try. |
| Comment by Bruno Faccini (Inactive) [ 13/Dec/12 ] |
|
Have a look to the "Lustre Operations Manual", chapter "Configuring Storage on a Lustre File System" and adapt with your DDN S2A9550 OST/LUN design. But just to answer your last comment/question, the "stride" should be the size/4K-blocks to be written at a time on each of the disks which makes your RAID6/8+2 OSTs, and the "stripe-width" should be the optimal IO-size to fit your RAID design, so 2048=256x8 appear ok for me. |
| Comment by Alexander Oltu [ 13/Dec/12 ] |
|
Bruno, thank you for suggestions! I ended up using stripe_width=2048,stride=256. The only 2 open questions are:
|
| Comment by Andreas Dilger [ 09/Jan/20 ] |
|
Close old ticket. |