Details

    • 3
    • 9223372036854775807

    Description

      Error happens during soak testing of build '20160224' (b2_8 RC2) (see:
      https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola& spaceKey=Releases#SoakTestingonLola-20150224). DNE is enabled.
      MDSes had been formatted using ldiskfs, OSTs using zfs. MDSes are configured in active-active HA failover configuration.

      Error happened several times during execution of application mdtest (1 file per process) on client nodes lola-[33,34] and reads as:

       JOBID             ERROR - MESSAGE
      445852 :  201602 25 21:12:01 : Process 1(lola-33.lola.whamcloud.com): FAILED in create_remove_items_helper, unable to remove directory: Input/output error
      

      Lustre Error messages that can be correlated to the event are:

      lola-10.log:Feb 25 21:12:01 lola-10 kernel: Lustre: soaked-MDT0003-osp-MDT0005: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10)
      lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 11-0: soaked-MDT0003-osp-MDT0006: operation out_update to node 192.168.1.109@o2ib10 failed: rc = -107
      lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 167-0: soaked-MDT0003-osp-MDT0006: This client was evicted by soaked-MDT0003; in progress operations using this service will fail.
      lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:887:ldlm_resource_complain()) soaked-MDT0003-osp-MDT0006: namespace resource [0x2c000e6a3:0xa8a2:0x0].0x0 (ffff8806fb30dbc0) refcount nonzero (1) after lock cleanup; forcing cleanup.
      lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x2c000e6a3:0xa8a2:0x0].0x0 (ffff8806fb30dbc0) refcount = 2
      lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order):
      lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1508:ldlm_resource_dump()) ### ### ns: soaked-MDT0003-osp-MDT0006 lock: ffff8806c99d0880/0xaacb8c6ebe9816d2 lrc: 2/0,1 mode: EX/EX res: [0x2c000e6a3:0xa8a2:0x0].0x0 bits 0x2 rrc: 2 type: IBT flags: 0x1106401000000 nid: local remote: 0x4af49d2c5913727e expref: -99 pid: 4773 timeout: 0 lvb_type: 0
      lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x2c000e6a3:0xa92a:0x0].0x0 (ffff88077f835e40) refcount = 2
      lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order):
      lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x2c000e6a3:0xa433:0x0].0x0 (ffff8806b09aa780) refcount = 2
      lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order):
      lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x38000dec1:0x14895:0x0].0x0 (ffff8807ff657180) refcount = 2
      lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order):
      lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x2c000e6a3:0xa8b9:0x0].0x0 (ffff8807d74972c0) refcount = 2
      lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order):
      lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x2c000e6a3:0xa89b:0x0].0x0 (ffff8807e5c5f2c0) refcount = 2
      lola-11.log:Feb 25 21:12:01 lola-11 kernel: LustreError: 8253:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order):
      lola-2.log:Feb 25 21:12:01 lola-2 kernel: Lustre: soaked-OST0000: deleting orphan objects from 0x400000403:1549433 to 0x400000403:1549585
      lola-2.log:Feb 25 21:12:01 lola-2 kernel: Lustre: soaked-OST0004: deleting orphan objects from 0x500000405:1544672 to 0x500000405:1544849
      lola-2.log:Feb 25 21:12:01 lola-2 kernel: Lustre: soaked-OST0008: deleting orphan objects from 0x600000402:1548216 to 0x600000402:1548417
      lola-2.log:Feb 25 21:12:01 lola-2 kernel: Lustre: soaked-OST000c: deleting orphan objects from 0x700000401:1545068 to 0x700000401:1545153
      lola-30.log:Feb 25 21:12:01 lola-30 kernel: Lustre: soaked-MDT0003-mdc-ffff88106fa1f800: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10)
      lola-31.log:Feb 25 21:12:01 lola-31 kernel: Lustre: soaked-MDT0003-mdc-ffff88086597e800: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10)
      lola-32.log:Feb 25 21:12:01 lola-32 kernel: Lustre: soaked-MDT0003-mdc-ffff88082f4c4000: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10)
      lola-32.log:Feb 25 21:12:01 lola-32 kernel: Lustre: Skipped 1 previous similar message
      lola-33.log:Feb 25 21:12:01 lola-33 kernel: LustreError: 11-0: soaked-MDT0003-mdc-ffff881032461c00: operation mds_reint to node 192.168.1.109@o2ib10 failed: rc = -107
      lola-33.log:Feb 25 21:12:01 lola-33 kernel: LustreError: 167-0: soaked-MDT0003-mdc-ffff881032461c00: This client was evicted by soaked-MDT0003; in progress operations using this service will fail.
      lola-33.log:Feb 25 21:12:01 lola-33 kernel: LustreError: 157072:0:(lmv_obd.c:1325:lmv_fid_alloc()) Can't alloc new fid, rc -19
      lola-33.log:Feb 25 21:12:01 lola-33 kernel: Lustre: soaked-MDT0003-mdc-ffff881032461c00: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10)
      lola-34.log:Feb 25 21:12:01 lola-34 kernel: Lustre: soaked-MDT0003-mdc-ffff88102fa38000: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10)
      lola-3.log:Feb 25 21:12:01 lola-3 kernel: Lustre: soaked-OST000d: deleting orphan objects from 0x740000405:1544703 to 0x740000405:1544833
      lola-3.log:Feb 25 21:12:01 lola-3 kernel: Lustre: soaked-OST0005: deleting orphan objects from 0x540000403:1536015 to 0x540000403:1536097
      lola-3.log:Feb 25 21:12:01 lola-3 kernel: Lustre: soaked-OST0001: deleting orphan objects from 0x440000401:1553755 to 0x440000401:1553873
      lola-3.log:Feb 25 21:12:01 lola-3 kernel: Lustre: soaked-OST0009: deleting orphan objects from 0x640000402:1547689 to 0x640000402:1547777
      lola-4.log:Feb 25 21:12:01 lola-4 kernel: Lustre: soaked-OST000e: deleting orphan objects from 0x780000403:1542237 to 0x780000403:1542337
      lola-4.log:Feb 25 21:12:01 lola-4 kernel: Lustre: soaked-OST000a: deleting orphan objects from 0x6c0000401:1544440 to 0x6c0000401:1544513
      lola-4.log:Feb 25 21:12:01 lola-4 kernel: Lustre: soaked-OST0002: deleting orphan objects from 0x480000401:1548270 to 0x480000401:1548385
      lola-4.log:Feb 25 21:12:01 lola-4 kernel: Lustre: soaked-OST0006: deleting orphan objects from 0x580000405:1541804 to 0x580000405:1541889
      lola-5.log:Feb 25 21:12:01 lola-5 kernel: Lustre: soaked-OST0003: deleting orphan objects from 0x4c0000401:1539783 to 0x4c0000401:1540289
      lola-5.log:Feb 25 21:12:01 lola-5 kernel: Lustre: soaked-OST000f: deleting orphan objects from 0x7c0000403:1549006 to 0x7c0000403:1549265
      lola-5.log:Feb 25 21:12:01 lola-5 kernel: Lustre: soaked-OST000b: deleting orphan objects from 0x680000401:1548710 to 0x680000401:1548801
      lola-5.log:Feb 25 21:12:01 lola-5 kernel: Lustre: soaked-OST0007: deleting orphan objects from 0x5c0000405:1544139 to 0x5c0000405:1544513
      lola-8.log:Feb 25 21:12:01 lola-8 kernel: LustreError: 11-0: soaked-MDT0003-osp-MDT0001: operation out_update to node 192.168.1.109@o2ib10 failed: rc = -107
      lola-8.log:Feb 25 21:12:01 lola-8 kernel: LustreError: 167-0: soaked-MDT0003-osp-MDT0001: This client was evicted by soaked-MDT0003; in progress operations using this service will fail.
      lola-8.log:Feb 25 21:12:01 lola-8 kernel: Lustre: soaked-MDT0003-osp-MDT0000: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10)
      lola-8.log:Feb 25 21:12:01 lola-8 kernel: Lustre: Skipped 2 previous similar messages
      lola-9.log:Feb 25 21:12:01 lola-9 kernel: LustreError: 4484:0:(update_records.c:72:update_records_dump()) master transno = 98785366528 batchid = 73014961802 flags = 0 ops = 4 params = 7
      lola-9.log:Feb 25 21:12:01 lola-9 kernel: LustreError: 4484:0:(update_records.c:72:update_records_dump()) master transno = 98785366528 batchid = 73014961809 flags = 0 ops = 4 params = 7
      lola-9.log:Feb 25 21:12:01 lola-9 kernel: LustreError: 4484:0:(update_records.c:72:update_records_dump()) master transno = 98785366528 batchid = 73014961816 flags = 0 ops = 4 params = 7
      lola-9.log:Feb 25 21:12:01 lola-9 kernel: LustreError: 4484:0:(update_records.c:72:update_records_dump()) master transno = 98785366528 batchid = 73014961822 flags = 0 ops = 4 params = 7
      lola-9.log:Feb 25 21:12:01 lola-9 kernel: LustreError: 4484:0:(update_records.c:72:update_records_dump()) master transno = 98785366528 batchid = 73014961830 flags = 0 ops = 4 params = 7
      lola-9.log:Feb 25 21:12:01 lola-9 kernel: LustreError: 4484:0:(update_records.c:72:update_records_dump()) master transno = 98785366576 batchid = 94491702660 flags = 0 ops = 53 params = 38
      lola-9.log:Feb 25 21:12:01 lola-9 kernel: LustreError: 4484:0:(update_records.c:72:update_records_dump()) master transno = 98785366584 batchid = 94491702661 flags = 0 ops = 53 params = 38
      lola-9.log:Feb 25 21:12:01 lola-9 kernel: LustreError: 4484:0:(update_records.c:72:update_records_dump()) master transno = 98785366585 batchid = 94491702662 flags = 0 ops = 53 params = 38
      lola-9.log:Feb 25 21:12:01 lola-9 kernel: Lustre: soaked-MDT0003: disconnecting 7 stale clients
      lola-9.log:Feb 25 21:12:01 lola-9 kernel: Lustre: 4484:0:(ldlm_lib.c:1586:abort_req_replay_queue()) @@@ aborted:  req@ffff88040a497c80 x1527085464220748/t0(98785366530) o101->bf8a1d5c-0dc5-b3c9-6b26-84d56ad880b2@192.168.1.126@o2ib100:585/0 lens 976/0 e 2 to 0 dl 1456463535 ref 1 fl Complete:/4/ffffffff rc 0/-1
      lola-9.log:Feb 25 21:12:01 lola-9 kernel: Lustre: 4484:0:(ldlm_lib.c:2011:target_recovery_overseer()) recovery is aborted, evict exports in recovery
      lola-9.log:Feb 25 21:12:01 lola-9 kernel: Lustre: soaked-MDT0002: Client 2cb76067-9b42-1736-a64a-e2cc0037f63b (at 192.168.1.132@o2ib100) reconnecting, waiting for 16 clients in recovery for 2:09
      lola-9.log:Feb 25 21:12:01 lola-9 kernel: Lustre: Skipped 5 previous similar messages
      

      Immediately before a restart of MDT lola-9 finished

      mds_restart      : 2016-02-25 20:59:00,754 - 2016-02-25 21:11:17,795    lola-9
      

      Attachments

        Activity

          People

            wc-triage WC Triage
            heckes Frank Heckes (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: