Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7106

Lustre client fail with error vvp_io.c:1081:vvp_io_commit_write()) even went there are space in OST and MDT

Details

    • Bug
    • Resolution: Incomplete
    • Major
    • None
    • Lustre 2.7.0
    • None
    • 4
    • 9223372036854775807

    Description

      Clients are getting error when writing to Lustre server (build 2.7.56):

      commands like "cp" will return "no space left on device" error.
      Here are the corresponding logs:

      Sep 4 10:06:16 oasis-dm1 kernel: LustreError: 4891:0:(vvp_io.c:1081:vvp_io_commit_write()) Write page 9477610 of inode ffff8803d965b138 failed -28
      Sep 4 10:06:16 oasis-dm1 kernel: LustreError: 4891:0:(vvp_io.c:1081:vvp_io_commit_write()) Skipped 1 previous similar message
      Sep 4 10:19:39 oasis-dm1 kernel: LustreError: 5492:0:(vvp_io.c:1081:vvp_io_commit_write()) Write page 804864 of inode ffff88049e96abb8 failed -28
      Sep 4 10:19:39 oasis-dm1 kernel: LustreError: 5492:0:(vvp_io.c:1081:vvp_io_commit_write()) Skipped 3 previous similar messages
      Sep 4 10:41:32 oasis-dm1 kernel: LustreError: 7446:0:(vvp_io.c:1081:vvp_io_commit_write()) Write page 8473626 of inode ffff88016af646b8 failed -28
      Sep 4 10:41:32 oasis-dm1 kernel: LustreError: 7446:0:(vvp_io.c:1081:vvp_io_commit_write()) Skipped 6 previous similar messages
      Sep 4 12:00:54 oasis-dm1 kernel: LustreError: 17162:0:(vvp_io.c:1081:vvp_io_commit_write()) Write page 3805354 of inode ffff880940c2cb38 failed -28
      Sep 4 12:00:54 oasis-dm1 kernel: LustreError: 17162:0:(vvp_io.c:1081:vvp_io_commit_write()) Skipped 1 previous similar message
      Sep 4 12:04:37 oasis-dm1 kernel: LustreError: 17541:0:(vvp_io.c:1081:vvp_io_commit_write()) Write page 6265883 of inode ffff880254611138 failed -28
      Sep 4 12:04:37 oasis-dm1 kernel: LustreError: 17541:0:(vvp_io.c:1081:vvp_io_commit_write()) Skipped 1 previous similar message

      OST/MDT are not lack space/inode ( avail ~16TB / 10+ million on average), checked from client with

      grep '[0-9]' /proc/fs/lustre/osc/*/kbytes

      {free,avail,total}
      grep '[0-9]' /proc/fs/lustre/osc/*/files{free,total}
      grep '[0-9]' /proc/fs/lustre/mdc/*/kbytes{free,avail,total}

      grep '[0-9]' /proc/fs/lustre/mdc/*/files

      {free,total}

      Attachments

        Activity

          [LU-7106] Lustre client fail with error vvp_io.c:1081:vvp_io_commit_write()) even went there are space in OST and MDT

          Haisong,

          I am marking this one as resolved/incomplete. If you would prefer that we do some more work on this issue, just let us know, and provide the information that Yang Sheng has asked for above and we will try to make more progress.

          Many thanks,
          ~ jfc.

          jfc John Fuchs-Chesney (Inactive) added a comment - Haisong, I am marking this one as resolved/incomplete. If you would prefer that we do some more work on this issue, just let us know, and provide the information that Yang Sheng has asked for above and we will try to make more progress. Many thanks, ~ jfc.
          ys Yang Sheng added a comment - - edited

          Hi Haisong,

          Could you please give us a status update for this ticket? Does it still need further work or should we close it?

          Thanks,
          YangSheng

          ys Yang Sheng added a comment - - edited Hi Haisong, Could you please give us a status update for this ticket? Does it still need further work or should we close it? Thanks, YangSheng
          ys Yang Sheng added a comment -

          Hi, Haisong,

          Looks this is a zfs backend. So could you tell the zfs version? I was confused by your server kernel version. It has a 'el6' name but with a '3.10.73' version number. I would very appreciated if you can provided lustre debuglog while the issue hit on. Both server & client is best.

          Thanks,
          YangSheng

          ys Yang Sheng added a comment - Hi, Haisong, Looks this is a zfs backend. So could you tell the zfs version? I was confused by your server kernel version. It has a 'el6' name but with a '3.10.73' version number. I would very appreciated if you can provided lustre debuglog while the issue hit on. Both server & client is best. Thanks, YangSheng
          pjones Peter Jones added a comment -

          Yang Sheng

          Could you please help with this issue?

          Thanks

          Peter

          pjones Peter Jones added a comment - Yang Sheng Could you please help with this issue? Thanks Peter

          A strace taken from an application to an OST:

          read(3, "\305\251z:8\36]?\216\17\273\203v{\267Yo4\207\347g\227{\7#7\37~#\17\26v"..., 4194304) = 4194304
          write(4, "\305\251z:8\36]?\216\17\273\203v{\267Yo4\207\347g\227{\7#7\37~#\17\26v"..., 4194304) = 2985984
          write(4, "[\345&\247\201?\377>\371\205/\0\277~\220\10\206\300\252\221\1,OT \350w\26\355\254\213\301"..., 1208320) = -1 ENOSPC (No space left on device)

          Yet

          lfs getstripe NA19240.chrom6.SOLID.bfast.YRI.high_coverage.20100311.bam
          NA19240.chrom6.SOLID.bfast.YRI.high_coverage.20100311.bam
          lmm_stripe_count: 1
          lmm_stripe_size: 1048576
          lmm_pattern: 1
          lmm_layout_gen: 0
          lmm_stripe_offset: 1
          obdidx objid objid group
          1 2862322 0x2bacf2 0

          panda-OST0001_UUID 28497036288 10651272192 17823039488 37% /oasis/scratch/comet[OST:1]

          haisong Haisong Cai (Inactive) added a comment - A strace taken from an application to an OST: read(3, "\305\251z:8\36]?\216\17\273\203v{\267Yo4\207\347g\227{\7#7\37~#\17\26v"..., 4194304) = 4194304 write(4, "\305\251z:8\36]?\216\17\273\203v{\267Yo4\207\347g\227{\7#7\37~#\17\26v"..., 4194304) = 2985984 write(4, "[\345&\247\201?\377>\371\205/\0\277~\220\10\206\300\252\221\1,OT \350w\26\355\254\213\301"..., 1208320) = -1 ENOSPC (No space left on device) Yet lfs getstripe NA19240.chrom6.SOLID.bfast.YRI.high_coverage.20100311.bam NA19240.chrom6.SOLID.bfast.YRI.high_coverage.20100311.bam lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 1 obdidx objid objid group 1 2862322 0x2bacf2 0 panda-OST0001_UUID 28497036288 10651272192 17823039488 37% /oasis/scratch/comet [OST:1]

          People

            ys Yang Sheng
            haisong Haisong Cai (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: