[LU-13230] sanity-benchmark test fsx hangs Created: 10/Feb/20 Updated: 11/Feb/20 Resolved: 11/Feb/20 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
sanity-benchmark test_fsx hangs. Looking at the hang at https://testing.whamcloud.com/test_sets/9c812454-4b41-11ea-a1c8-52540065bddc, the last thing seen in the client test_log is == sanity-benchmark test fsx: fsx ==================================================================== 15:41:36 (1581176496) debug=0 Using: fsx -c 50 -p 1000 -S 20400 -P /tmp -l 3438416 -N 100000 /mnt/lustre/f0.fsxfile Chance of close/open is 1 in 50 Seed set to 20400 truncating to largest ever: 0x1240bb truncating to largest ever: 0x1b0cac truncating to largest ever: 0x331506 truncating to largest ever: 0x338a0b truncating to largest ever: 0x3443e1 Looking at the console logs, there’s no call traces and not many error messages to understand why the test hangs. Looking at the client1 (vm6) console log, we see [31869.584851] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity-benchmark test fsx: fsx ==================================================================== 15:41:36 \(1581176496\) [31869.821328] Lustre: DEBUG MARKER: == sanity-benchmark test fsx: fsx ==================================================================== 15:41:36 (1581176496) [31869.901535] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200000401:0x6ac8:0x0], use llapi_layout_get_by_path() [31869.917359] Lustre: lustre-OST0002-osc-ffff98bc9b52c000: reconnect after 1s idle [31869.918649] Lustre: Skipped 5 previous similar messages [31916.662148] Lustre: 24080:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1581176536/real 1581176536] req@ffff98bc805cb600 x1657965685190528/t0(0) o400->lustre-MDT0000-mdc-ffff98bc9b52c000@10.9.5.70@tcp:12/10 lens 224/224 e 0 to 1 dl 1581176543 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [31916.667085] Lustre: lustre-MDT0000-mdc-ffff98bc9b52c000: Connection to lustre-MDT0000 (at 10.9.5.70@tcp) was lost; in progress operations using this service will wait for recovery to complete [31916.669901] LustreError: 166-1: MGC10.9.5.70@tcp: Connection to MGS (at 10.9.5.70@tcp) was lost; in progress operations using this service will fail <ConMan> Console [trevis-26vm6] disconnected from <trevis-26:6005> at 02-08 16:14. Although there are other examples of sanity-benchmark test fsx hanging in the past, there isn’t enough information here to match this hang to past failures. |
| Comments |
| Comment by Gerrit Updater [ 10/Feb/20 ] |
|
James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37522 |
| Comment by Andreas Dilger [ 11/Feb/20 ] |
|
There were also failures in sanityn fsx runs due to the landing of patch https://review.whamcloud.com/8201 " |
| Comment by Andreas Dilger [ 11/Feb/20 ] |
|
Never mind - that patch has not yet landed to b2_12. |
| Comment by James Nunez (Inactive) [ 11/Feb/20 ] |
|
Closing this ticket as a duplicate of LU-12234. In this case, look at the console logs for test_iozone and not test_fsx. |