Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
Lustre 2.8.0
-
lola
build: https://build.hpdd.intel.com/job/lustre-b2_8/8/
-
3
-
9223372036854775807
Description
Error happens during soak testing of build '20160224' (b2_8 RC2) (see:
https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola& spaceKey=Releases#SoakTestingonLola-20150224). DNE is enabled.
MDSes had been formatted using ldiskfs, OSTs using zfs. MDSes are configured in active-active HA failover configuration.
Error happened several times during execution of application mdtest (1 file per process) on client nodes lola-[33,34] and reads as:
JOBID ERROR - MESSAGE 445852 : 201602 25 21:12:01 : Process 1(lola-33.lola.whamcloud.com): FAILED in create_remove_items_helper, unable to remove directory: Input/output error
Lustre Error messages that can be correlated to the event are:
lola-10.log:Feb 25 21:12:01 lola-10 kernel: Lustre: soaked-MDT0003-osp-MDT0005: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10) lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 11-0: soaked-MDT0003-osp-MDT0006: operation out_update to node 192.168.1.109@o2ib10 failed: rc = -107 lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 167-0: soaked-MDT0003-osp-MDT0006: This client was evicted by soaked-MDT0003; in progress operations using this service will fail. lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:887:ldlm_resource_complain()) soaked-MDT0003-osp-MDT0006: namespace resource [0x2c000e6a3:0xa8a2:0x0].0x0 (ffff8806fb30dbc0) refcount nonzero (1) after lock cleanup; forcing cleanup. lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x2c000e6a3:0xa8a2:0x0].0x0 (ffff8806fb30dbc0) refcount = 2 lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order): lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1508:ldlm_resource_dump()) ### ### ns: soaked-MDT0003-osp-MDT0006 lock: ffff8806c99d0880/0xaacb8c6ebe9816d2 lrc: 2/0,1 mode: EX/EX res: [0x2c000e6a3:0xa8a2:0x0].0x0 bits 0x2 rrc: 2 type: IBT flags: 0x1106401000000 nid: local remote: 0x4af49d2c5913727e expref: -99 pid: 4773 timeout: 0 lvb_type: 0 lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x2c000e6a3:0xa92a:0x0].0x0 (ffff88077f835e40) refcount = 2 lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order): lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x2c000e6a3:0xa433:0x0].0x0 (ffff8806b09aa780) refcount = 2 lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order): lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x38000dec1:0x14895:0x0].0x0 (ffff8807ff657180) refcount = 2 lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order): lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x2c000e6a3:0xa8b9:0x0].0x0 (ffff8807d74972c0) refcount = 2 lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order): lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x2c000e6a3:0xa89b:0x0].0x0 (ffff8807e5c5f2c0) refcount = 2 lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order): lola-2.log:Feb 25 21:12:01 lola-2 kernel: Lustre: soaked-OST0000: deleting orphan objects from 0x400000403:1549433 to 0x400000403:1549585 lola-2.log:Feb 25 21:12:01 lola-2 kernel: Lustre: soaked-OST0004: deleting orphan objects from 0x500000405:1544672 to 0x500000405:1544849 lola-2.log:Feb 25 21:12:01 lola-2 kernel: Lustre: soaked-OST0008: deleting orphan objects from 0x600000402:1548216 to 0x600000402:1548417 lola-2.log:Feb 25 21:12:01 lola-2 kernel: Lustre: soaked-OST000c: deleting orphan objects from 0x700000401:1545068 to 0x700000401:1545153 lola-30.log:Feb 25 21:12:01 lola-30 kernel: Lustre: soaked-MDT0003-mdc-ffff88106fa1f800: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10) lola-31.log:Feb 25 21:12:01 lola-31 kernel: Lustre: soaked-MDT0003-mdc-ffff88086597e800: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10) lola-32.log:Feb 25 21:12:01 lola-32 kernel: Lustre: soaked-MDT0003-mdc-ffff88082f4c4000: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10) lola-32.log:Feb 25 21:12:01 lola-32 kernel: Lustre: Skipped 1 previous similar message lola-33.log:Feb 25 21:12:01 lola-33 kernel: LustreError: 11-0: soaked-MDT0003-mdc-ffff881032461c00: operation mds_reint to node 192.168.1.109@o2ib10 failed: rc = -107 lola-33.log:Feb 25 21:12:01 lola-33 kernel: LustreError: 167-0: soaked-MDT0003-mdc-ffff881032461c00: This client was evicted by soaked-MDT0003; in progress operations using this service will fail. lola-33.log:Feb 25 21:12:01 lola-33 kernel: LustreError: 157072:0:(lmv_obd.c:1325:lmv_fid_alloc()) Can't alloc new fid, rc -19 lola-33.log:Feb 25 21:12:01 lola-33 kernel: Lustre: soaked-MDT0003-mdc-ffff881032461c00: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10) lola-34.log:Feb 25 21:12:01 lola-34 kernel: Lustre: soaked-MDT0003-mdc-ffff88102fa38000: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10) lola-3.log:Feb 25 21:12:01 lola-3 kernel: Lustre: soaked-OST000d: deleting orphan objects from 0x740000405:1544703 to 0x740000405:1544833 lola-3.log:Feb 25 21:12:01 lola-3 kernel: Lustre: soaked-OST0005: deleting orphan objects from 0x540000403:1536015 to 0x540000403:1536097 lola-3.log:Feb 25 21:12:01 lola-3 kernel: Lustre: soaked-OST0001: deleting orphan objects from 0x440000401:1553755 to 0x440000401:1553873 lola-3.log:Feb 25 21:12:01 lola-3 kernel: Lustre: soaked-OST0009: deleting orphan objects from 0x640000402:1547689 to 0x640000402:1547777 lola-4.log:Feb 25 21:12:01 lola-4 kernel: Lustre: soaked-OST000e: deleting orphan objects from 0x780000403:1542237 to 0x780000403:1542337 lola-4.log:Feb 25 21:12:01 lola-4 kernel: Lustre: soaked-OST000a: deleting orphan objects from 0x6c0000401:1544440 to 0x6c0000401:1544513 lola-4.log:Feb 25 21:12:01 lola-4 kernel: Lustre: soaked-OST0002: deleting orphan objects from 0x480000401:1548270 to 0x480000401:1548385 lola-4.log:Feb 25 21:12:01 lola-4 kernel: Lustre: soaked-OST0006: deleting orphan objects from 0x580000405:1541804 to 0x580000405:1541889 lola-5.log:Feb 25 21:12:01 lola-5 kernel: Lustre: soaked-OST0003: deleting orphan objects from 0x4c0000401:1539783 to 0x4c0000401:1540289 lola-5.log:Feb 25 21:12:01 lola-5 kernel: Lustre: soaked-OST000f: deleting orphan objects from 0x7c0000403:1549006 to 0x7c0000403:1549265 lola-5.log:Feb 25 21:12:01 lola-5 kernel: Lustre: soaked-OST000b: deleting orphan objects from 0x680000401:1548710 to 0x680000401:1548801 lola-5.log:Feb 25 21:12:01 lola-5 kernel: Lustre: soaked-OST0007: deleting orphan objects from 0x5c0000405:1544139 to 0x5c0000405:1544513 lola-8.log:Feb 25 21:12:01 lola-8 kernel: LustreError: 11-0: soaked-MDT0003-osp-MDT0001: operation out_update to node 192.168.1.109@o2ib10 failed: rc = -107 lola-8.log:Feb 25 21:12:01 lola-8 kernel: LustreError: 167-0: soaked-MDT0003-osp-MDT0001: This client was evicted by soaked-MDT0003; in progress operations using this service will fail. lola-8.log:Feb 25 21:12:01 lola-8 kernel: Lustre: soaked-MDT0003-osp-MDT0000: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10) lola-8.log:Feb 25 21:12:01 lola-8 kernel: Lustre: Skipped 2 previous similar messages lola-9.log:Feb 25 21:12:01 lola-9 kernel: LustreError: 4484:0:(update_records.c:72:update_records_dump()) master transno = 98785366528 batchid = 73014961802 flags = 0 ops = 4 params = 7 lola-9.log:Feb 25 21:12:01 lola-9 kernel: LustreError: 4484:0:(update_records.c:72:update_records_dump()) master transno = 98785366528 batchid = 73014961809 flags = 0 ops = 4 params = 7 lola-9.log:Feb 25 21:12:01 lola-9 kernel: LustreError: 4484:0:(update_records.c:72:update_records_dump()) master transno = 98785366528 batchid = 73014961816 flags = 0 ops = 4 params = 7 lola-9.log:Feb 25 21:12:01 lola-9 kernel: LustreError: 4484:0:(update_records.c:72:update_records_dump()) master transno = 98785366528 batchid = 73014961822 flags = 0 ops = 4 params = 7 lola-9.log:Feb 25 21:12:01 lola-9 kernel: LustreError: 4484:0:(update_records.c:72:update_records_dump()) master transno = 98785366528 batchid = 73014961830 flags = 0 ops = 4 params = 7 lola-9.log:Feb 25 21:12:01 lola-9 kernel: LustreError: 4484:0:(update_records.c:72:update_records_dump()) master transno = 98785366576 batchid = 94491702660 flags = 0 ops = 53 params = 38 lola-9.log:Feb 25 21:12:01 lola-9 kernel: LustreError: 4484:0:(update_records.c:72:update_records_dump()) master transno = 98785366584 batchid = 94491702661 flags = 0 ops = 53 params = 38 lola-9.log:Feb 25 21:12:01 lola-9 kernel: LustreError: 4484:0:(update_records.c:72:update_records_dump()) master transno = 98785366585 batchid = 94491702662 flags = 0 ops = 53 params = 38 lola-9.log:Feb 25 21:12:01 lola-9 kernel: Lustre: soaked-MDT0003: disconnecting 7 stale clients lola-9.log:Feb 25 21:12:01 lola-9 kernel: Lustre: 4484:0:(ldlm_lib.c:1586:abort_req_replay_queue()) @@@ aborted: req@ffff88040a497c80 x1527085464220748/t0(98785366530) o101->bf8a1d5c-0dc5-b3c9-6b26-84d56ad880b2@192.168.1.126@o2ib100:585/0 lens 976/0 e 2 to 0 dl 1456463535 ref 1 fl Complete:/4/ffffffff rc 0/-1 lola-9.log:Feb 25 21:12:01 lola-9 kernel: Lustre: 4484:0:(ldlm_lib.c:2011:target_recovery_overseer()) recovery is aborted, evict exports in recovery lola-9.log:Feb 25 21:12:01 lola-9 kernel: Lustre: soaked-MDT0002: Client 2cb76067-9b42-1736-a64a-e2cc0037f63b (at 192.168.1.132@o2ib100) reconnecting, waiting for 16 clients in recovery for 2:09 lola-9.log:Feb 25 21:12:01 lola-9 kernel: Lustre: Skipped 5 previous similar messages
Immediately before a restart of MDT lola-9 finished
mds_restart : 2016-02-25 20:59:00,754 - 2016-02-25 21:11:17,795 lola-9