[LU-13232] sanity test 160j fails with 'read changelog failed' Created: 10/Feb/20 Updated: 20/Feb/20 Resolved: 20/Feb/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0 |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | James Nunez (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | ppc | ||
| Environment: |
PPC clients |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
sanity test_160j fails with 'read changelog failed' for PPC client testing 100% of the time. Looking at a recent failure at https://testing.whamcloud.com/test_sets/d3720002-4a27-11ea-b69a-52540065bddc, the actual error is a problem with the input to cat Registered 1 changelog users: 'cl3' total: 2 create in 0.00 seconds: 1052.66 ops/second cat: -: Invalid argument sanity test_160j: @@@@@@ FAIL: read changelog failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6121:error() = /usr/lib64/lustre/tests/sanity.sh:14350:test_160j() The code that is failing in sanity test 160j is 14341 # read changelog 14342 cat <&4 >/dev/null || error "read changelog failed" Looking at the client1 (vm12) console log, we see [ 5314.374481] Lustre: DEBUG MARKER: == sanity test 160j: client can be umounted while its chanangelog is being used ===================== 01:24:59 (1581125099) [ 5314.494530] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre2 [ 5314.506580] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-10vm12@tcp:/lustre /mnt/lustre2 [ 5314.555637] Lustre: Mounted lustre-client [ 5315.555507] Lustre: 10940:0:(llog_cat.c:808:llog_cat_process_common()) lustre-MDT0000-mdc-c0000000b5687800: invalid record in catalog [0x5:0x0:0xa]:0: rc = -22 [ 5315.555690] LustreError: 10940:0:(mdc_changelog.c:295:chlg_load()) lustre-MDT0000-mdc-c0000000b5687800: fail to process llog: rc = -22 [ 5315.600825] Lustre: Unmounted lustre-client [ 5315.777197] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_160j: @@@@@@ FAIL: read changelog failed sanity test 160j started failing for PPC clients as soon as it was first introduced/landed on 27 SEPT 2019. Logs for more PPC client sanity test 160j failures are at |
| Comments |
| Comment by Andreas Dilger [ 12/Feb/20 ] |
|
This looks like it may be the root cause of many later failures. This test unmounts the client, then fails (likely because of unexpected output), then doesn't remount the client again. All of the later failures are because there is no Lustre client mounted. == sanity test 160j: client can be umounted while its chanangelog is being used CMD: trevis-77vm7.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-10vm12@tcp:/lustre /mnt/lustre2 : cat: -: Invalid argument sanity test_160j: @@@@@@ FAIL: read changelog failed |
| Comment by Andreas Dilger [ 12/Feb/20 ] |
|
Looking at the test itself, this is pretty clear:
# umount the first lustre mount
umount $MOUNT
This should have stack_trap calls to undo the various changes in the test, like mount the client, unmount client2, close the file descriptors, etc. rather than doing this manually at the end of the test. I suspect with a simple patch to clean up after this failure that many of the following failures will go away also. |
| Comment by Gerrit Updater [ 12/Feb/20 ] |
|
James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37550 |
| Comment by Gerrit Updater [ 20/Feb/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37550/ |
| Comment by Peter Jones [ 20/Feb/20 ] |
|
Landed for 2.14 |