[LU-12831] sanity test_413b timed out - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.13.0
Labels:
- dne
- zfs

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

This issue was created by maloo for jianyu <yujian@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/a660d22a-e346-11e9-a0ba-52540065bddc

test_413b failed with the following error:

weight diff=-25% must be > 50% ...Fill MDT0 with 45712 files
weight diff=-25% must be > 50% ...Fill MDT0 with 45670 files
weight diff=-25% must be > 50% ...Fill MDT0 with 45712 files
Timeout occurred after 447 mins, last suite running was sanity, restarting cluster to continue tests

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_413b - Timeout occurred after 447 mins, last suite running was sanity, restarting cluster to continue tests

Attachments

Issue Links

is related to

LU-14659 sanity test_413a: subdirs shouldn't be evenly distributed

Resolved

mentioned in: Page No Confluence page found with the given URL.; Page No Confluence page found with the given URL.; Page No Confluence page found with the given URL.

Activity

[LU-12831] sanity test_413b timed out

Andreas Dilger added a comment - 11/May/20 11:03 PM

+1 on b2_12

https://testing.whamcloud.com/test_sets/3d7196ad-cecb-420d-8a02-a79f02ebc7ca

[ 9848.724723] Lustre: DEBUG MARKER: == sanity test 409: Large amount of cross-MDTs hard links on the same file ==== 14:53:05 (1589035985)
[ 9904.040345] LDISKFS-fs error (device dm-16) in ldiskfs_free_blocks:5463: IO failure
[ 9904.062336] Aborting journal on device dm-16-8.
[ 9904.063288] LDISKFS-fs error (device dm-16) in ldiskfs_orphan_add:3370: Journal has aborted
[ 9904.063461] LDISKFS-fs (dm-16): Remounting filesystem read-only
[ 9904.161794] LustreError: 28847:0:(ofd_dev.c:1804:ofd_destroy_hdl()) lustre-OST0005: error destroying object [0x440000402:0x11e:0x0]: -30

Andreas Dilger added a comment - 11/May/20 11:03 PM +1 on b2_12 https://testing.whamcloud.com/test_sets/3d7196ad-cecb-420d-8a02-a79f02ebc7ca [ 9848.724723] Lustre: DEBUG MARKER: == sanity test 409: Large amount of cross-MDTs hard links on the same file ==== 14:53:05 (1589035985) [ 9904.040345] LDISKFS-fs error (device dm-16) in ldiskfs_free_blocks:5463: IO failure [ 9904.062336] Aborting journal on device dm-16-8. [ 9904.063288] LDISKFS-fs error (device dm-16) in ldiskfs_orphan_add:3370: Journal has aborted [ 9904.063461] LDISKFS-fs (dm-16): Remounting filesystem read-only [ 9904.161794] LustreError: 28847:0:(ofd_dev.c:1804:ofd_destroy_hdl()) lustre-OST0005: error destroying object [0x440000402:0x11e:0x0]: -30

Andreas Dilger added a comment - 07/May/20 8:07 PM

In recent failures, this doesn't look like a timeout issue because of too many inodes, but rather a problem on the OST from an earlier test that causes the filesystem to be mounted read-only:

[ 9991.380259] Lustre: DEBUG MARKER: == sanity test 409: Large amount of cross-MDTs hard links on the same file =========================== 23:26:58 (1588807618)
[10021.751428] LDISKFS-fs error (device dm-17) in ldiskfs_free_blocks:5463: IO failure
[10021.755577] Aborting journal on device dm-17-8.
[10021.757692] LDISKFS-fs error (device dm-17): ldiskfs_journal_check_start:56: [10021.757918] LDISKFS-fs (dm-17): Remounting filesystem read-only
[10021.757922] LDISKFS-fs error (device dm-17) in ldiskfs_reserve_inode_write:5332: Journal has aborted
[10021.758024] LDISKFS-fs error (device dm-17) in ldiskfs_reserve_inode_write:5332: Journal has aborted

This happened on two separate test runs on b2_12:
https://testing.whamcloud.com/test_sets/9d1b5d16-9c7e-4138-9ed6-679db5cd60cc
https://testing.whamcloud.com/test_sets/dab2eaa6-de05-4385-87de-9d9e3ebbcdbc

Andreas Dilger added a comment - 07/May/20 8:07 PM In recent failures, this doesn't look like a timeout issue because of too many inodes, but rather a problem on the OST from an earlier test that causes the filesystem to be mounted read-only: [ 9991.380259] Lustre: DEBUG MARKER: == sanity test 409: Large amount of cross-MDTs hard links on the same file =========================== 23:26:58 (1588807618) [10021.751428] LDISKFS-fs error (device dm-17) in ldiskfs_free_blocks:5463: IO failure [10021.755577] Aborting journal on device dm-17-8. [10021.757692] LDISKFS-fs error (device dm-17): ldiskfs_journal_check_start:56: [10021.757918] LDISKFS-fs (dm-17): Remounting filesystem read-only [10021.757922] LDISKFS-fs error (device dm-17) in ldiskfs_reserve_inode_write:5332: Journal has aborted [10021.758024] LDISKFS-fs error (device dm-17) in ldiskfs_reserve_inode_write:5332: Journal has aborted This happened on two separate test runs on b2_12: https://testing.whamcloud.com/test_sets/9d1b5d16-9c7e-4138-9ed6-679db5cd60cc https://testing.whamcloud.com/test_sets/dab2eaa6-de05-4385-87de-9d9e3ebbcdbc

Andreas Dilger added a comment - 04/Oct/19 4:52 AM

There should probably be a limit on the number of files that need to be created to make the MDTs be imbalanced. That might be increased for SLOW=yes tests.

Andreas Dilger added a comment - 04/Oct/19 4:52 AM There should probably be a limit on the number of files that need to be created to make the MDTs be imbalanced. That might be increased for SLOW=yes tests.

sanity test_413b timed out

Details

Description

Attachments

Issue Links

Activity

People

Dates