[LU-4223] conf-sanity test_32c, test_32d: could not find any free loop device Created: 07/Nov/13 Updated: 16/Mar/16 Resolved: 18/Dec/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.4.2, Lustre 2.5.1, Lustre 2.8.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | dne, mn4, sdsc | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 11491 | ||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/751ee23a-4106-11e3-a1e8-52540035b04c. The sub-test test_32c failed with the following error:
Info required for matching: conf-sanity 32c |
| Comments |
| Comment by Sarah Liu [ 07/Nov/13 ] |
|
test log: CMD: wtm-32vm3 tunefs.lustre --quota /tmp/t32/ost wtm-32vm3: tunefs.lustre: out of loop devices! wtm-32vm3: wtm-32vm3: tunefs.lustre FATAL: Loop device setup for /tmp/t32/ost failed: Too many open files wtm-32vm3: tunefs.lustre: exiting with 24 (Too many open files) checking for existing Lustre data: found |
| Comment by Andreas Dilger [ 13/Nov/13 ] |
|
Oleg reports that he runs out of loop devices when running conf-sanity repeatedly on the same node. It seems there is a leak in the configuration of the loop devices, either by mount.lustre/unmount, or something else in conf-sanity. It is likely hitting DNE testing more than regular testing just due to DNE configurations using more MDT and OST devices. |
| Comment by Andreas Dilger [ 14/Nov/13 ] |
|
In the test output: Upgrading from disk2_3-ldiskfs.tar.bz2, created with:
Commit: 2.3.0
Kernel: 2.6.32-279.5.1.el6_lustre.gb16fe80.x86_64
Arch: x86_64
CMD: wtm-32vm3 tunefs.lustre --dryrun /tmp/t32/mdt
:
:
loop: can't delete device /dev/loop3: Device or resource busy
So it looks like the device is being referenced. I've noticed in recent DNE testing that the MDT device did not unmount cleanly, so this should be tested as the possible root cause of the problem. |
| Comment by Di Wang [ 15/Nov/13 ] |
|
Hmm, it seems a few tests did not do "umount -d" when umount. http://review.whamcloud.com/8296 |
| Comment by Andreas Dilger [ 15/Nov/13 ] |
|
This appears to be the most common failure preventing review-dne from passing all of its tests (7 of 11 failures in the past three days if one adds |
| Comment by Andreas Dilger [ 15/Nov/13 ] |
|
Di, please use More Actions->Link to link duplicate bugs. Just closing a bug as a duplicate does not link it to the original bug (unlike with Bugzilla). |
| Comment by Andreas Dilger [ 19/Nov/13 ] |
|
This patch should be landed to b2_4 and b2_5. |
| Comment by Jian Yu [ 22/Nov/13 ] |
|
The patch was cherry-picked to Lustre b2_4 branch. |
| Comment by John Hammond [ 26/Nov/13 ] |
|
In mount_utils_ldiskfs.c:is_feature_enabled(), popen() is used to invoke debugfs but fclose() is used to close the FILE * returned from popen(). Hence wait() is not called, debugfs may still be running (and holding the loop device open). This prevents losetup -d from detaching it. |
| Comment by Di Wang [ 27/Nov/13 ] |
|
John: I think you are right, and it should use pclose, instead of fclose. Good catch! |
| Comment by Di Wang [ 27/Nov/13 ] |
| Comment by John Hammond [ 27/Nov/13 ] |
|
Hi Di,
|
| Comment by Di Wang [ 27/Nov/13 ] |
|
John, yes, this makes sense, I updated the patch, and please have a look. |
| Comment by Andreas Dilger [ 18/Dec/13 ] |
|
This patch was landed on 2013-12-09 but conf-sanity is still reporting this bug for failures: https://maloo.whamcloud.com/test_sets/7d7a27aa-66ae-11e3-93e2-52540035b04c and others. |
| Comment by Andreas Dilger [ 18/Dec/13 ] |
|
False alarm - it is actually I'm closing this bug since it looks like it is not being hit in recent test runs. |
| Comment by Jian Yu [ 04/Jan/14 ] |
|
The above patch was not cherry-picked to Lustre b2_5 branch. The same failure occurred on Lustre b2_5 build #5: Here is the back-ported patch on Lustre b2_5 branch: http://review.whamcloud.com/8723 |
| Comment by Jian Yu [ 11/Jan/14 ] |
|
Landed for Lustre 2.5.1. |
| Comment by Gerrit Updater [ 07/Jan/15 ] |
|
Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/13265 |
| Comment by Gerrit Updater [ 03/Mar/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13265/ |