[LU-12831] sanity test_413b timed out Created: 04/Oct/19  Updated: 28/Oct/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: dne, zfs

Issue Links:
Related
is related to LU-14659 sanity test_413a: subdirs shouldn't b... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for jianyu <yujian@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/a660d22a-e346-11e9-a0ba-52540065bddc

test_413b failed with the following error:

weight diff=-25% must be > 50% ...Fill MDT0 with 45712 files
weight diff=-25% must be > 50% ...Fill MDT0 with 45670 files
weight diff=-25% must be > 50% ...Fill MDT0 with 45712 files
Timeout occurred after 447 mins, last suite running was sanity, restarting cluster to continue tests

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_413b - Timeout occurred after 447 mins, last suite running was sanity, restarting cluster to continue tests



 Comments   
Comment by Andreas Dilger [ 04/Oct/19 ]

There should probably be a limit on the number of files that need to be created to make the MDTs be imbalanced. That might be increased for SLOW=yes tests. 

Comment by Andreas Dilger [ 07/May/20 ]

In recent failures, this doesn't look like a timeout issue because of too many inodes, but rather a problem on the OST from an earlier test that causes the filesystem to be mounted read-only:

[ 9991.380259] Lustre: DEBUG MARKER: == sanity test 409: Large amount of cross-MDTs hard links on the same file =========================== 23:26:58 (1588807618)
[10021.751428] LDISKFS-fs error (device dm-17) in ldiskfs_free_blocks:5463: IO failure
[10021.755577] Aborting journal on device dm-17-8.
[10021.757692] LDISKFS-fs error (device dm-17): ldiskfs_journal_check_start:56: [10021.757918] LDISKFS-fs (dm-17): Remounting filesystem read-only
[10021.757922] LDISKFS-fs error (device dm-17) in ldiskfs_reserve_inode_write:5332: Journal has aborted
[10021.758024] LDISKFS-fs error (device dm-17) in ldiskfs_reserve_inode_write:5332: Journal has aborted

This happened on two separate test runs on b2_12:
https://testing.whamcloud.com/test_sets/9d1b5d16-9c7e-4138-9ed6-679db5cd60cc
https://testing.whamcloud.com/test_sets/dab2eaa6-de05-4385-87de-9d9e3ebbcdbc

Comment by Andreas Dilger [ 11/May/20 ]

+1 on b2_12

https://testing.whamcloud.com/test_sets/3d7196ad-cecb-420d-8a02-a79f02ebc7ca

[ 9848.724723] Lustre: DEBUG MARKER: == sanity test 409: Large amount of cross-MDTs hard links on the same file ==== 14:53:05 (1589035985)
[ 9904.040345] LDISKFS-fs error (device dm-16) in ldiskfs_free_blocks:5463: IO failure
[ 9904.062336] Aborting journal on device dm-16-8.
[ 9904.063288] LDISKFS-fs error (device dm-16) in ldiskfs_orphan_add:3370: Journal has aborted
[ 9904.063461] LDISKFS-fs (dm-16): Remounting filesystem read-only
[ 9904.161794] LustreError: 28847:0:(ofd_dev.c:1804:ofd_destroy_hdl()) lustre-OST0005: error destroying object [0x440000402:0x11e:0x0]: -30
Generated at Sat Feb 10 02:56:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.