Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.14.0
-
PPC clients
-
3
-
9223372036854775807
Description
sanity test_160j fails with 'read changelog failed' for PPC client testing 100% of the time.
Looking at a recent failure at https://testing.whamcloud.com/test_sets/d3720002-4a27-11ea-b69a-52540065bddc, the actual error is a problem with the input to cat
Registered 1 changelog users: 'cl3' total: 2 create in 0.00 seconds: 1052.66 ops/second cat: -: Invalid argument sanity test_160j: @@@@@@ FAIL: read changelog failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6121:error() = /usr/lib64/lustre/tests/sanity.sh:14350:test_160j()
The code that is failing in sanity test 160j is
14341 # read changelog 14342 cat <&4 >/dev/null || error "read changelog failed"
Looking at the client1 (vm12) console log, we see
[ 5314.374481] Lustre: DEBUG MARKER: == sanity test 160j: client can be umounted while its chanangelog is being used ===================== 01:24:59 (1581125099) [ 5314.494530] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre2 [ 5314.506580] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-10vm12@tcp:/lustre /mnt/lustre2 [ 5314.555637] Lustre: Mounted lustre-client [ 5315.555507] Lustre: 10940:0:(llog_cat.c:808:llog_cat_process_common()) lustre-MDT0000-mdc-c0000000b5687800: invalid record in catalog [0x5:0x0:0xa]:0: rc = -22 [ 5315.555690] LustreError: 10940:0:(mdc_changelog.c:295:chlg_load()) lustre-MDT0000-mdc-c0000000b5687800: fail to process llog: rc = -22 [ 5315.600825] Lustre: Unmounted lustre-client [ 5315.777197] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_160j: @@@@@@ FAIL: read changelog failed
sanity test 160j started failing for PPC clients as soon as it was first introduced/landed on 27 SEPT 2019.
Logs for more PPC client sanity test 160j failures are at
https://testing.whamcloud.com/test_sets/717d4832-1dba-11ea-80b4-52540065bddc
https://testing.whamcloud.com/test_sets/5e7bd63a-f7af-11e9-b62b-52540065bddc