[LU-8030] hsm: inserting duplicate requests Created: 15/Apr/16  Updated: 01/Jun/16  Resolved: 01/Jun/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Minor
Reporter: Frank Zago (Inactive) Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

It is possible to insert duplicate HSM requests in the actions list:

# cd /mnt/lustre
# cp /bin/ls .
# cp /bin/ls ls2
# lfs hsm_archive ls2 ls2 ls2 ls2

# cat /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions
lrh=[type=10680000 len=136 idx=1/2] fid=[0x200000401:0x2:0x0] dfid=[0x200000401:0x2:0x0] compound/cookie=0x571142b4/0x571142af action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[]
lrh=[type=10680000 len=136 idx=1/3] fid=[0x200000401:0x2:0x0] dfid=[0x200000401:0x2:0x0] compound/cookie=0x571142b4/0x571142b0 action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[]
lrh=[type=10680000 len=136 idx=1/4] fid=[0x200000401:0x2:0x0] dfid=[0x200000401:0x2:0x0] compound/cookie=0x571142b4/0x571142b1 action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[]
lrh=[type=10680000 len=136 idx=1/5] fid=[0x200000401:0x2:0x0] dfid=[0x200000401:0x2:0x0] compound/cookie=0x571142b4/0x571142b2 action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[]

The copytool then successfully archives the same file 4 times.

The bug likely comes from mdt_hsm_add_actions() where duplicates are searched for between the already queued actions and the new ones in the HAL, but not between the actions in the HAL itself.



 Comments   
Comment by Frank Zago (Inactive) [ 15/Apr/16 ]

Restoring is a bit different. The first call works, and the file is restored once. A second call is locking up lustre. ls /mnt/lustre hangs too.

# cp /bin/ls ls2
# lfs hsm_archive ls2
# lfs hsm_release ls2
# lfs hsm_restore ls2 ls2 ls2 ls2
# lfs hsm_restore ls2 ls2 ls2 ls2
<hang>

with the following stack trace in the kernel log:

[  482.344410] INFO: task lfs:8517 blocked for more than 120 seconds.
[  482.346511] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  482.348516] lfs             D ffff88032deb9770     0  8517   2733 0x00000080
[  482.350453]  ffff880627973bd8 0000000000000046 ffff88062b550000 ffff880627973fd8
[  482.352449]  ffff880627973fd8 ffff880627973fd8 ffff88062b550000 ffff8805d5760218
[  482.354414]  ffff8805d5760900 ffff8805d5760908 0000000000000246 ffff88062b550000
[  482.356353] Call Trace:
[  482.357761]  [<ffffffff81662d39>] schedule_preempt_disabled+0x29/0x70
[  482.358795]  [<ffffffff8166075b>] mutex_lock_nested+0x18b/0x4b0
[  482.359860]  [<ffffffffa115799c>] ? ll_layout_refresh+0x1ac/0x290 [lustre]
[  482.360898]  [<ffffffffa115799c>] ? ll_layout_refresh+0x1ac/0x290 [lustre]
[  482.361901]  [<ffffffffa115799c>] ll_layout_refresh+0x1ac/0x290 [lustre]
[  482.362915]  [<ffffffffa119e03f>] vvp_io_init+0x3df/0x4c0 [lustre]
[  482.363967]  [<ffffffffa07382a8>] cl_io_init0.isra.15+0x88/0x160 [obdclass]
[  482.364955]  [<ffffffffa1194915>] ? cl_glimpse_size0+0x125/0x2b0 [lustre]
[  482.365937]  [<ffffffffa0738475>] cl_io_init+0x55/0xd0 [obdclass]
[  482.366901]  [<ffffffffa11949bc>] cl_glimpse_size0+0x1cc/0x2b0 [lustre]
[  482.367852]  [<ffffffffa11556a9>] ll_getattr+0x539/0x7c0 [lustre]
[  482.368770]  [<ffffffff8101cd35>] ? native_sched_clock+0x35/0x80
[  482.369667]  [<ffffffff811ef736>] vfs_getattr+0x46/0x80
[  482.370531]  [<ffffffff811ef865>] vfs_fstatat+0x75/0xc0
[  482.371345]  [<ffffffff811efe21>] SYSC_newlstat+0x31/0x60
[  482.372135]  [<ffffffff81314fd6>] ? lockdep_sys_exit_thunk+0x35/0x67
[  482.372944]  [<ffffffff816650ea>] ? error_exit+0x6a/0xb0
[  482.373727]  [<ffffffff811f00ae>] SyS_newlstat+0xe/0x10
[  482.374493]  [<ffffffff8166d589>] system_call_fastpath+0x16/0x1b
[  482.375181] 2 locks held by lfs/8517:
[  482.375912]  #0:  (&lli->lli_glimpse_sem){......}, at: [<ffffffffa115569f>] ll_getattr+0x52f/0x7c0 [lustre]
[  482.377354]  #1:  (&lli->lli_layout_mutex){......}, at: [<ffffffffa115799c>] ll_layout_refresh+0x1ac/0x290 [lustre]
Comment by Gerrit Updater [ 18/Apr/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/19635
Subject: LU-8030 hsm: prevent duplicated HSM requests
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c87677412ac3a73dc5638b759d9738f0657ca027

Comment by Gerrit Updater [ 31/May/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19635/
Subject: LU-8030 hsm: prevent duplicated HSM requests
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 62e93ef9a29341091c49d66a1535ea243ea950be

Comment by Joseph Gmitter (Inactive) [ 01/Jun/16 ]

Landed to master for 2.9.0

Generated at Sat Feb 10 02:14:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.