[LU-7186] division by zero in lod_declare_init_size() from HSM release Created: 18/Sep/15  Updated: 13/May/16  Resolved: 28/Oct/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: HSM

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

If a file is created without striping, truncated to a non-zero size, and archived then releasing the file causes a division by zero in lod_declare_init_size(). Assume HSM is running:

# export LUSTRE=/root/lustre-release/lustre
# cd /mnt/lustre
# $LUSTRE/tests/mcreate f0
# $LUSTRE/tests/truncate f0 42
# lfs hsm_archive f0
# sleep 5
# lfs hsm_release f0
[592171.190277] divide error: 0000 [#1] SMP
...
[592171.191894] Pid: 13109, comm: mdt_rdpg00_002 Not tainted 2.6.32-431.29.2.el6.lustre.x86_64 #1 Bochs Bochs
[592171.191894] RIP: 0010:[<ffffffffa13ec5e0>]  [<ffffffffa13ec5e0>] lod_declare_striped_object+0x4c0/0x810 [lod]
[592171.191894] RSP: 0000:ffff8800604ef9e0  EFLAGS: 00010246
[592171.191894] RAX: 0000000000000000 RBX: ffff88005b6c2a38 RCX: 0000000000010000
[592171.191894] RDX: 0000000000000000 RSI: 00000000000019ed RDI: 0000000000000000
[592171.191894] RBP: ffff8800604efa30 R08: 0000000000000000 R09: 0000000000000001
[592171.191894] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800685141d8
[592171.191894] R13: ffff8800685142a8 R14: ffff8800603746c0 R15: ffff880061c2bb50
[592171.191894] FS:  0000000000000000(0000) GS:ffff88002c200000(0000) knlGS:0000000000000000
[592171.191894] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[592171.191894] CR2: 0000003d04989708 CR3: 000000005ba62000 CR4: 00000000000006e0
[592171.191894] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[592171.191894] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[592171.191894] Process mdt_rdpg00_002 (pid: 13109, threadinfo ffff8800604ee000, task ffff8800604ec300)
[592171.191894] Stack:
[592171.191894]  ffffffffa12f38a8 ffff880062098058 ffff8800685142a8 ffff88006006c708
[592171.191894] <d> ffff8800604efa30 ffff88005b6c2a38 ffff8800603746c0 ffff8800685142a8
[592171.191894] <d> ffff880061c2bb50 ffff8800685141d8 ffff8800604efa90 ffffffffa13ecb68
[592171.191894] Call Trace:
[592171.191894]  [<ffffffffa13ecb68>] lod_declare_xattr_set+0x238/0x2a0 [lod]
[592171.191894]  [<ffffffffa12e2a3f>] mdd_declare_xattr_set+0x8f/0x260 [mdd]
[592171.191894]  [<ffffffffa12e1934>] ? mdo_xattr_get+0xa4/0x1b0 [mdd]
[592171.191894]  [<ffffffffa12e61e4>] mdd_swap_layouts+0x914/0x1240 [mdd]
[592171.191894]  [<ffffffffa12ee4fa>] ? mdd_trans_stop+0x1a/0x1c [mdd]
[592171.191894]  [<ffffffffa1348d43>] mo_swap_layouts+0x33/0xa0 [mdt]
[592171.191894]  [<ffffffffa134cd4a>] mdt_mfd_close+0x128a/0x1980 [mdt]
[592171.191894]  [<ffffffffa0a5a70d>] ? class_handle_unhash_nolock+0x2d/0x150 [obdclass]
[592171.191894]  [<ffffffffa134d654>] mdt_close_internal+0x214/0x4f0 [mdt]
[592171.191894]  [<ffffffffa134dbea>] mdt_close+0x2ba/0x900 [mdt]
[592171.191894]  [<ffffffffa0d0711f>] tgt_request_handle+0x8cf/0x1300 [ptlrpc]
[592171.191894]  [<ffffffffa0cb1aea>] ptlrpc_main+0xdaa/0x18b0 [ptlrpc]
[592171.191894]  [<ffffffffa0cb0d40>] ? ptlrpc_main+0x0/0x18b0 [ptlrpc]
[592171.191894]  [<ffffffff8109e856>] kthread+0x96/0xa0
[592171.191894]  [<ffffffff8100c30a>] child_rip+0xa/0x20
[592171.191894]  [<ffffffff815562e0>] ? _spin_unlock_irq+0x30/0x40
[592171.191894]  [<ffffffff8100bb10>] ? restore_args+0x0/0x30
[592171.191894]  [<ffffffff8109e7c0>] ? kthread+0x0/0xa0
[592171.191894]  [<ffffffff8100c300>] ? child_rip+0x0/0x20
[592171.191894] Code: 83 e2 01 e9 53 fd ff ff 0f 1f 00 8b 4b 44 48 89 f8 31 d2 0f b7 7b 40 49 c7 84 24 18 01 00 00 08 00 00 00 48 f7 f1 48 89 d6 31 d2 <48> f7 f7 4c 89 f7 48 0f af c1 4c 89 f9 48 01 c6 49 89 b4 24 d0
[592171.191894] RIP  [<ffffffffa13ec5e0>] lod_declare_striped_object+0x4c0/0x810 [lod]

t:~# xd lod_declare_striped_object+0x4c0/0x810 [lod]
lod_declare_init_size
/root/lustre-release/lustre/lod/lod_object.c:3387
lod_declare_striped_object
/root/lustre-release/lustre/lod/lod_object.c:3474

        stripe = ll_do_div64(size, (__u64) lo->ldo_stripenr);

I saw this on 2.7.59-44-g703195a it is likely present in several other versions.



 Comments   
Comment by Joseph Gmitter (Inactive) [ 18/Sep/15 ]

Hi Alex,
Can you take a look at this?
Thanks.
Joe

Comment by Andreas Dilger [ 18/Sep/15 ]

Looks like the stripe count is zero here:

 
     /* ll_do_div64(a, b) returns a % b, and a = a / b */
     ll_do_div64(size, (__u64) lo->ldo_stripe_size);
     stripe = ll_do_div64(size, (__u64) lo->ldo_stripenr);
Comment by Gerrit Updater [ 07/Oct/15 ]

Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/16743
Subject: LU-7186 lod: do not propagate size if stripeless
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bff343cc1dd6cf20a1e88aeb6fcdfb3e7046fa4c

Comment by Gerrit Updater [ 28/Oct/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16743/
Subject: LU-7186 lod: do not propagate size if stripeless
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7a6b48c2f97f165b4449f6283e313cfa33aea5a1

Comment by Joseph Gmitter (Inactive) [ 28/Oct/15 ]

Landed for 2.8

Generated at Sat Feb 10 02:06:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.