Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.5.1
Labels:
- patch
Environment:
Centos 6.5, Lustre 2.5.56

Severity:
3
Rank (Obsolete):
14550

Description

When a restoration doesn't complete properly by a copytool, it is not restarted, and processes get stuck in Lustre modules.

For instance, create a large file that takes a few seconds to archive/restore, then release it, and access it:

# dd if=/dev/urandom of=/mnt/lustre/bigf bs=1M count 1000
# lfs hsm_archive /mnt/lustre/bigf
# lfs hsm_release  /mnt/lustre/bigf
# sleep 5https://jira.hpdd.intel.com/browse/LU-5216#
# md5sum /mnt/lustre/bigf

During the restoration, kill the copytool, so no complete event is sent to the MDS.

Note that at this point, it is possible the copytool is unkillable, and the only fix is to reboot the client running that copytool.

When the copytool restarts, nothing happens. The process trying to read the file is stuck (apparently forever) there:

# cat /proc/1675/stack 
[<ffffffffa09dc04c>] ll_layout_refresh+0x25c/0xfe0 [lustre]
[<ffffffffa0a28240>] vvp_io_init+0x340/0x490 [lustre]
[<ffffffffa04dab68>] cl_io_init0+0x98/0x160 [obdclass]
[<ffffffffa04dd794>] cl_io_init+0x64/0xe0 [obdclass]
[<ffffffffa04debfd>] cl_io_rw_init+0x8d/0x200 [obdclass]
[<ffffffffa09cbe38>] ll_file_io_generic+0x208/0x710 [lustre]
[<ffffffffa09ccf8f>] ll_file_aio_read+0x13f/0x2c0 [lustre]
[<ffffffffa09cd27c>] ll_file_read+0x16c/0x2a0 [lustre]
[<ffffffff81189365>] vfs_read+0xb5/0x1a0
[<ffffffff811894a1>] sys_read+0x51/0x90
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

It is also unkillable.

If I issue the "lfs hsm_restore bigf.bin", lfs gets stuck too:

# cat /proc/1723/stack 
[<ffffffffa06ad29a>] ptlrpc_set_wait+0x2da/0x860 [ptlrpc]
[<ffffffffa06ad8a7>] ptlrpc_queue_wait+0x87/0x220 [ptlrpc]
[<ffffffffa0868913>] mdc_iocontrol+0x2113/0x27f0 [mdc]
[<ffffffffa0af5265>] obd_iocontrol+0xe5/0x360 [lmv]
[<ffffffffa0b0c145>] lmv_iocontrol+0x1c85/0x2b10 [lmv]
[<ffffffffa09bb235>] obd_iocontrol+0xe5/0x360 [lustre]
[<ffffffffa09c64d7>] ll_dir_ioctl+0x4237/0x5dc0 [lustre]
[<ffffffff8119d802>] vfs_ioctl+0x22/0xa0
[<ffffffff8119d9a4>] do_vfs_ioctl+0x84/0x580
[<ffffffff8119df21>] sys_ioctl+0x81/0xa0
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

However lfs hsm operation still works on other files.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

LU-5216_solution.doc
04/Mar/15 12:10 PM
15 kB
gaurav mahajan

Issue Links

is blocking

LU-10175 DoM:Full support for the LDLM lock convert

Resolved

is related to

LU-11284 Full lock convert conflicts with HSM

Open

LU-8905 tests: sanity-hsm test_3[3-6] does not use ps correctly

Closed

Activity

People

Assignee:: Jean-Baptiste Riaux (Inactive)

Reporter:: Frank Zago (Inactive)

Votes:: 3 Vote for this issue

Watchers:: 31 Start watching this issue

Dates

Created:: 17/Jun/14 10:14 PM

Updated:: 25/Aug/18 7:30 AM