Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5532

LustreError: 77234:0:(ldlm_lockd.c:460:__ldlm_add_waiting_lock()) ### requested timeout 755, more than at_max 600

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • Lustre 2.4.3
    • None
    • Lustre 2.4.3, RHEL 6.4, kernel 2.6.32-358.23.2.el6.atlas.x86_64
    • 2
    • 15399

    Description

      Lustre OSS reports lustre is unhealthy after issuing the following sequence of messages. Second occurrence in the last 24 hours.

      Aug 21 14:00:23 atlas-oss1c7.ccs.ornl.gov kernel: [3794666.155556] Lustre: atlas1-OST0016: Client 1942a1b8-14c2-1c85-f1cc-f5a627755ef9 (at 10.38.145.2@o2ib4) reconnecting
      Aug 21 14:00:23 atlas-oss1c7.ccs.ornl.gov kernel: [3794666.185134] Lustre: Skipped 6 previous similar messages
      Aug 21 14:02:53 atlas-oss1c7.ccs.ornl.gov kernel: [3794816.201589] LustreError: 137-5: atlas1-OST00a2_UUID: not available for connect from 10.38.145.2@o2ib4 (no target)
      Aug 21 14:12:20 atlas-oss1c7.ccs.ornl.gov kernel: [3795383.544447] Lustre: atlas1-OST0016: Client 1942a1b8-14c2-1c85-f1cc-f5a627755ef9 (at 10.38.145.2@o2ib4) reconnecting
      Aug 21 14:12:20 atlas-oss1c7.ccs.ornl.gov kernel: [3795383.567821] Lustre: Skipped 5 previous similar messages
      Aug 21 14:19:20 atlas-oss1c7.ccs.ornl.gov kernel: [3795803.534475] LustreError: 137-5: atlas1-OST00a2_UUID: not available for connect from 10.38.145.2@o2ib4 (no target)
      Aug 21 14:19:20 atlas-oss1c7.ccs.ornl.gov kernel: [3795803.566134] LustreError: Skipped 3 previous similar messages
      Aug 21 14:26:17 atlas-oss1c7.ccs.ornl.gov kernel: [3796220.710761] Lustre: atlas1-OST01c6: Client 5d5389e1-62ad-c671-5318-48ff669e4a6e (at 10.38.145.2@o2ib4) reconnecting
      Aug 21 14:26:17 atlas-oss1c7.ccs.ornl.gov kernel: [3796220.734900] Lustre: Skipped 11 previous similar messages
      Aug 21 14:31:17 atlas-oss1c7.ccs.ornl.gov kernel: [3796520.850175] LustreError: 137-5: atlas1-OST00a2_UUID: not available for connect from 10.38.145.2@o2ib4 (no target)
      Aug 21 14:31:17 atlas-oss1c7.ccs.ornl.gov kernel: [3796520.878544] LustreError: Skipped 4 previous similar messages
      Aug 21 14:38:14 atlas-oss1c7.ccs.ornl.gov kernel: [3796937.996148] Lustre: atlas1-OST0136: Client 5d5389e1-62ad-c671-5318-48ff669e4a6e (at 10.38.145.2@o2ib4) reconnecting
      Aug 21 14:38:14 atlas-oss1c7.ccs.ornl.gov kernel: [3796938.026732] Lustre: Skipped 6 previous similar messages
      Aug 21 14:44:39 atlas-oss1c7.ccs.ornl.gov kernel: [3797323.165332] LustreError: 33287:0:(ldlm_lockd.c:460:__ldlm_add_waiting_lock()) ### requested timeout 603, more than at_max 600
      Aug 21 14:44:39 atlas-oss1c7.ccs.ornl.gov kernel: [3797323.165334] ns: filter-atlas1-OST0376_UUID lock: ffff88034b20a900/0xecd0a12120f17d4c lrc: 4/0,0 mode: PW/PW res: [0x5c2a0f:0x0:0x0].0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->16383) flags: 0x10020 nid: 10.36.205.208@o2ib remote: 0xc39a22a84c361c61 expref: 26 pid: 14426 timeout: 8090929608 lvb_type: 0
      Aug 21 14:44:39 atlas-oss1c7.ccs.ornl.gov kernel: [3797323.292254] LustreError: 33287:0:(ldlm_lockd.c:460:__ldlm_add_waiting_lock()) Skipped 3 previous similar messages
      Aug 21 14:44:57 atlas-oss1c7.ccs.ornl.gov kernel: [3797340.909561] Lustre: atlas1-OST0136: Slow creates, 128/256 objects created at a rate of 2/s
      Aug 21 14:45:53 atlas-oss1c7.ccs.ornl.gov kernel: [3797397.137591] LustreError: 137-5: atlas1-OST0372_UUID: not available for connect from 10.38.145.2@o2ib4 (no target)
      Aug 21 14:45:53 atlas-oss1c7.ccs.ornl.gov kernel: [3797397.161035] LustreError: Skipped 2 previous similar messages
      Aug 21 14:48:24 atlas-oss1c7.ccs.ornl.gov kernel: [3797548.207840] Lustre: atlas1-OST01c6: Client 1942a1b8-14c2-1c85-f1cc-f5a627755ef9 (at 10.38.145.2@o2ib4) reconnecting
      Aug 21 14:48:24 atlas-oss1c7.ccs.ornl.gov kernel: [3797548.238152] Lustre: Skipped 6 previous similar messages
      Aug 21 14:52:26 atlas-oss1c7.ccs.ornl.gov kernel: [3797790.627280] LustreError: 45633:0:(service.c:3216:ptlrpc_svcpt_health_check()) ost_io: unhealthy - request has been waiting 610s

      Then followed by several messages like this:

      Aug 21 14:54:27 atlas-oss1c7.ccs.ornl.gov kernel: [3797911.436996] Lustre: 33219:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/3), not sending early reply
      Aug 21 14:54:27 atlas-oss1c7.ccs.ornl.gov kernel: [3797911.436998] req@ffff8802f640e400 x1476723491621465/t0(0) o3->6ae34883-e4de-08ed-e3e9-ccdd41e9934d@9719@gni108:0/0 lens 448/432 e 0 to 0 dl 1408647272 ref 2 fl Interpret:/0/0 rc 0/0
      Aug 21 14:54:32 atlas-oss1c7.ccs.ornl.gov kernel: [3797916.390933] Lustre: atlas1-OST0136: Bulk IO read error with 711e8a57-ae3b-7204-47a9-6a996887d00c (at 7253@gni108), client will retry: rc -110
      Aug 21 14:54:32 atlas-oss1c7.ccs.ornl.gov kernel: [3797916.392844] LustreError: 33261:0:(ldlm_lib.c:2702:target_bulk_io()) @@@ timeout on bulk PUT after 0+0s req@ffff8803bab27000 x1476723483171409/t0(0) o3->711e8a57-ae3b-7204-47a9-6a996887d00c@7253@gni108:0/0 lens 448/432 e 0 to 0 dl 1408647272 ref 1 fl Interpret:/0/0 rc 0/0

      Full syslog to follow.

      Attachments

        Activity

          People

            green Oleg Drokin
            hilljjornl Jason Hill (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: