[LU-8030] hsm: inserting duplicate requests Created: 15/Apr/16 Updated: 01/Jun/16 Resolved: 01/Jun/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Frank Zago (Inactive) | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
It is possible to insert duplicate HSM requests in the actions list: # cd /mnt/lustre # cp /bin/ls . # cp /bin/ls ls2 # lfs hsm_archive ls2 ls2 ls2 ls2 # cat /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions lrh=[type=10680000 len=136 idx=1/2] fid=[0x200000401:0x2:0x0] dfid=[0x200000401:0x2:0x0] compound/cookie=0x571142b4/0x571142af action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[] lrh=[type=10680000 len=136 idx=1/3] fid=[0x200000401:0x2:0x0] dfid=[0x200000401:0x2:0x0] compound/cookie=0x571142b4/0x571142b0 action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[] lrh=[type=10680000 len=136 idx=1/4] fid=[0x200000401:0x2:0x0] dfid=[0x200000401:0x2:0x0] compound/cookie=0x571142b4/0x571142b1 action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[] lrh=[type=10680000 len=136 idx=1/5] fid=[0x200000401:0x2:0x0] dfid=[0x200000401:0x2:0x0] compound/cookie=0x571142b4/0x571142b2 action=ARCHIVE archive#=1 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[] The copytool then successfully archives the same file 4 times. The bug likely comes from mdt_hsm_add_actions() where duplicates are searched for between the already queued actions and the new ones in the HAL, but not between the actions in the HAL itself. |
| Comments |
| Comment by Frank Zago (Inactive) [ 15/Apr/16 ] |
|
Restoring is a bit different. The first call works, and the file is restored once. A second call is locking up lustre. ls /mnt/lustre hangs too. # cp /bin/ls ls2 # lfs hsm_archive ls2 # lfs hsm_release ls2 # lfs hsm_restore ls2 ls2 ls2 ls2 # lfs hsm_restore ls2 ls2 ls2 ls2 <hang> with the following stack trace in the kernel log: [ 482.344410] INFO: task lfs:8517 blocked for more than 120 seconds.
[ 482.346511] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 482.348516] lfs D ffff88032deb9770 0 8517 2733 0x00000080
[ 482.350453] ffff880627973bd8 0000000000000046 ffff88062b550000 ffff880627973fd8
[ 482.352449] ffff880627973fd8 ffff880627973fd8 ffff88062b550000 ffff8805d5760218
[ 482.354414] ffff8805d5760900 ffff8805d5760908 0000000000000246 ffff88062b550000
[ 482.356353] Call Trace:
[ 482.357761] [<ffffffff81662d39>] schedule_preempt_disabled+0x29/0x70
[ 482.358795] [<ffffffff8166075b>] mutex_lock_nested+0x18b/0x4b0
[ 482.359860] [<ffffffffa115799c>] ? ll_layout_refresh+0x1ac/0x290 [lustre]
[ 482.360898] [<ffffffffa115799c>] ? ll_layout_refresh+0x1ac/0x290 [lustre]
[ 482.361901] [<ffffffffa115799c>] ll_layout_refresh+0x1ac/0x290 [lustre]
[ 482.362915] [<ffffffffa119e03f>] vvp_io_init+0x3df/0x4c0 [lustre]
[ 482.363967] [<ffffffffa07382a8>] cl_io_init0.isra.15+0x88/0x160 [obdclass]
[ 482.364955] [<ffffffffa1194915>] ? cl_glimpse_size0+0x125/0x2b0 [lustre]
[ 482.365937] [<ffffffffa0738475>] cl_io_init+0x55/0xd0 [obdclass]
[ 482.366901] [<ffffffffa11949bc>] cl_glimpse_size0+0x1cc/0x2b0 [lustre]
[ 482.367852] [<ffffffffa11556a9>] ll_getattr+0x539/0x7c0 [lustre]
[ 482.368770] [<ffffffff8101cd35>] ? native_sched_clock+0x35/0x80
[ 482.369667] [<ffffffff811ef736>] vfs_getattr+0x46/0x80
[ 482.370531] [<ffffffff811ef865>] vfs_fstatat+0x75/0xc0
[ 482.371345] [<ffffffff811efe21>] SYSC_newlstat+0x31/0x60
[ 482.372135] [<ffffffff81314fd6>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 482.372944] [<ffffffff816650ea>] ? error_exit+0x6a/0xb0
[ 482.373727] [<ffffffff811f00ae>] SyS_newlstat+0xe/0x10
[ 482.374493] [<ffffffff8166d589>] system_call_fastpath+0x16/0x1b
[ 482.375181] 2 locks held by lfs/8517:
[ 482.375912] #0: (&lli->lli_glimpse_sem){......}, at: [<ffffffffa115569f>] ll_getattr+0x52f/0x7c0 [lustre]
[ 482.377354] #1: (&lli->lli_layout_mutex){......}, at: [<ffffffffa115799c>] ll_layout_refresh+0x1ac/0x290 [lustre]
|
| Comment by Gerrit Updater [ 18/Apr/16 ] |
|
Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/19635 |
| Comment by Gerrit Updater [ 31/May/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19635/ |
| Comment by Joseph Gmitter (Inactive) [ 01/Jun/16 ] |
|
Landed to master for 2.9.0 |