Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.8.0
-
None
-
3
-
9223372036854775807
Description
In 24 hours DNE failover test. I found this on one of the MDT,
LustreError: 2758:0:(client.c:2869:ptlrpc_replay_interpret()) @@@ status -110, old was 0 req@ffff880feb148cc0 x1507974808149044/t25771723485(25771723485) o1000->lustre-MDT0003-osp-MDT0001@192.168.2.128@o2ib:24/4 lens 248/16576 e 1 to 0 dl 1438129486 ref 2 fl Interpret:R/4/0 rc -110/-110 Lustre: lustre-MDT0003-osp-MDT0001: Connection restored to lustre-MDT0003 (at 192.168.2.128@o2ib) LustreError: 3117:0:(mdt_open.c:1171:mdt_cross_open()) lustre-MDT0001: [0x240000406:0x167f1:0x0] doesn't exist!: rc = -14 Lustre: DEBUG MARKER: ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=27221 DURATION=86400 PERIOD=1800 Lustre: DEBUG MARKER: Client load failed on node c05, rc=1
Then on the client side, which cause dbench fails
2 7136 0.00 MB/sec execute 191 sec latency 272510.369 ms 2 7136 0.00 MB/sec execute 192 sec latency 273510.512 ms 2 7136 0.00 MB/sec execute 193 sec latency 274510.637 ms 2 7136 0.00 MB/sec execute 194 sec latency 275510.799 ms 2 7136 0.00 MB/sec execute 195 sec latency 276510.916 ms 2 7136 0.00 MB/sec execute 196 sec latency 277511.069 ms 2 7136 0.00 MB/sec execute 197 sec latency 278511.229 ms 2 7136 0.00 MB/sec execute 198 sec latency 279511.387 ms 2 7330 0.00 MB/sec execute 199 sec latency 280182.929 ms [9431] open ./clients/client1/~dmtmp/EXCEL/RESULTS.XLS failed for handle 11887 (Bad address) (9432) ERROR: handle 11887 was not found Child failed with status 1
Then the test fails.