[LU-5228] HSM: posix copytool can (and do) run out of file descriptors Created: 18/Jun/14  Updated: 18/Jun/14

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.1
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Frank Zago (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Centos 6.5
Lustre 2.5.56


Severity: 3
Rank (Obsolete): 14566

 Description   

When archiving a lot of files at once, the posix copytool can run out of file descriptors.

...
lhsmtool_posix[21880]: cannot open '/vsm/tasfs1/32d7/0000/0400/0000/0002/0000/0x200000400:0x32d7:0x0_tmp' for write: Too many open files (24)
lhsmtool_posix[21878]: cannot open '/vsm/tasfs1/32c2/0000/0400/0000/0002/0000/0x200000400:0x32c2:0x0_tmp' for write: Too many open files (24)
lhsmtool_posix[21894]: cannot open '/vsm/tasfs1/32d2/0000/0400/0000/0002/0000/0x200000400:0x32d2:0x0_tmp.lov': Too many open files (24)
lhsmtool_posix[21894]: cannot save file striping info of '/mnt/tas01/.lustre/fid/0x200000400:0x32d2:0x0' in '/vsm/tasfs1/32d2/0000/0400/0000/0002/0000/0x200000400:0x32d2:0x0_tmp': Too many open files (24)
...

The root cause is that there is no limit on the amount of threads created to process each request, which leads to the error.

The files in error are also not restarted, and the archive request is drop.

In my test, out of 11418 archive request, only 11159 were actually archived. The other requests were dropped.


Generated at Sat Feb 10 01:49:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.