Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7308

LustreError: 16956:0:(ost_handler.c:1764:ost_blocking_ast()) Error -2 syncing data on lock cancel

    XMLWordPrintable

Details

    • Question/Request
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.5.1
    • None
    • CentOS release 6.4, lustre client 2.5.1, lustre server 2.6.32-358.23.2.el6_lustre.es52.x86_64
    • 9223372036854775807

    Description

      We are facing some issue with Lustre clients (Compute nodes).

      PBS jobs are getting killed due to Lustre error on scratch file system. Scratch area has been defined in PBS and due to Lustre error PBS service is shutting down and jobs are killed.

      See below the errors of Lustre in messages logs of specific compute nodes, This one is from compute node 34 but the same is happening in others also.

      ------------------------------------------------------------------------------------------
      OSS/MDS Server Error-

      Oct 11 21:38:03 cn034 kernel: Lustre: Lustre: Build Version: 2.5.1.ddn1-g45c890c-PRISTINE-2.6.32-431.el6.x86_64
      Oct 11 21:38:03 cn034 kernel: LNet: Added LNI 10.20.30.34@o2ib [8/256/0/180] Oct 11 21:38:03 cn034 kernel: Lustre: Layout lock feature supported.
      Oct 11 21:38:03 cn034 kernel: LustreError: 11-0: scratch-OST0004-osc-ffff880c05bbbc00: Communicating with 10.20.30.103@o2ib, operation ost_connect failed with -19.
      Oct 11 21:38:03 cn034 kernel: LustreError: 11-0: scratch-OST0003-osc-ffff880c05bbbc00: Communicating with 10.20.30.103@o2ib, operation ost_connect failed with -19.
      Oct 11 21:38:03 cn034 kernel: LustreError: Skipped 1 previous similar message
      -----------------------------------------------------------------------------

      ------------------------------------------------------------------------------
      Compute Nodes Error:-

      Oct 11 21:37:59 cn034 modprobe: FATAL: Error inserting padlock_sha (/lib/modules/2.6.32-431.el6.x86_64/kernel/drivers/cryp
      to/padlock-sha.ko): No such device
      Oct 11 21:38:03 cn034 kernel: Lustre: Lustre: Build Version: 2.5.1.ddn1-g45c890c-PRISTINE-2.6.32-431.el6.x86_64
      Oct 11 21:38:03 cn034 kernel: LNet: Added LNI 10.20.30.34@o2ib [8/256/0/180]
      Oct 11 21:38:03 cn034 kernel: Lustre: Layout lock feature supported.
      Oct 11 21:38:03 cn034 kernel: LustreError: 11-0: scratch-OST0004-osc-ffff880c05bbbc00: Communicating with 10.20.30.103@o2i
      b, operation ost_connect failed with -19.
      Oct 11 21:38:03 cn034 kernel: LustreError: 11-0: scratch-OST0003-osc-ffff880c05bbbc00: Communicating with 10.20.30.103@o2i
      b, operation ost_connect failed with -19.
      -------------------------------------------------------------------------------------
      Attached logs files from MDS1 and MDS2 and compute nodes. Kindly let me know if you need more details.

      Looking forward for your support on the same.

      Thank you

      Attachments

        1. lctl.txt
          2 kB
          Amol Thute
        2. mds1-dmesg.txt
          17 kB
          Amol Thute
        3. mds1-messages.txt
          11 kB
          Amol Thute
        4. mds2-dmesg.txt
          16 kB
          Amol Thute
        5. node34-logs.txt
          2 kB
          Amol Thute
        6. node41-dmesg.txt
          3 kB
          Amol Thute
        7. node41-logs.txt
          2 kB
          Amol Thute

        Issue Links

          Activity

            People

              wc-triage WC Triage
              amolthute Amol Thute (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: