Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13588

sigbus sent to mmap writer that is a long way below quota

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 2.10.8, Lustre 2.12.4
    • None
    • centos 7.8, zfs 0.8.3, lustre 2.12.4 on servers
      zfs compression is enabled on OSTs.
      centos 7.8, lustre 2.10.8 on clients
      all x86_64
      group block and inode quotas set and enforcing
    • 3
    • 9223372036854775807

    Description

      Hi,

      we've been seeing SIGBUS from a tensorflow build, and possibly other builds and codes, since moving to 2.12.4 on servers. we moved to centos 7.8 on servers and clients at the same time. our previous Lustre version on servers was 2.10.5 (plus many patches) and zfs 0.7.9. the old server lustre versions had no issues with SIGBUS that we know of. we have been running 2.10.8 on clients for about 6 months and that is unchanged.

      after a week or so narrowing down the issue, we have found a reproducer in a tensorflow build ld step that will reliably SIGBUS, and have also found that this is related to group block quotas.

      the .so file that ld (ld.gold, collect2) is writing into is initially nulls and sparse, is about 210M in size, is mapp'd, and probably receives a lot (>600k) small memcpy/memset's into the file before it gets a SIGBUS.

      a strace -f -t snippet is

      62275 16:15:47 mmap(NULL, 258627360, PROT_READ|PROT_WRITE, MAP_SHARED, 996, 0) = 0x2b3b15905000
      62275 16:15:48 --- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRERR, si_addr=0x2b3b23a7cd23} ---
      62275 16:15:48 +++ killed by SIGBUS +++
      

      if that value of si_addr is correct, then it's well within the size of the file, so it doesn't look like ld is writing in the wrong place. ltrace also shows no addresses out of bounds.

      it gets interesting if we change the group quota limit on the account. if there is less than ~9TB of group quota free in the account, then we reliably get a SIGBUS.
      ie. anywhere in the range ->

      # lfs setquota  -g oz997 -B 6000000000 /fred
      

      to

      # lfs setquota  -g oz997 -B 14000000000 /fred
      

      where only about 5TB is actually used in the account ->

      # lfs quota  -g oz997 /fred
      Disk quotas for grp oz997 (gid 10273):
           Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
                /fred 5166964968       0 6000000000       -  936838       0 2000000       -
      

      then we get a sigbus ->

       $ /apps/skylake/software/core/gcccore/6.4.0/bin/gcc @bazel-out/k8-py2-opt/bin/tensorflow/python/_pywrap_tensorflow_internal.so-2.params
      collect2: fatal error: ld terminated with signal 7 [Bus error]
      compilation terminated.
      

      but when there is ~9TB free quota, or more ->

      # lfs setquota  -g oz997 -B 14000000000 /fred
      

      then we do not see a SIGBUS and the ld step completes ok.

      other things to mention

      • we have tried with various different (much newer) gcc's and they see the same thing.
      • as far as I can tell from strace and ltrace output of the memcpy/memsets, all of the addresses it is writing to are well within the bounds of the file and so should not be getting SIGBUS. ie. it's probably not a bug in ld.gold.
      • ld.gold is the default linker. if we pick ld.bfd instead, then ld.bfd does ordinary (not mmap'd) i/o to the output .so and that succeeds with the smallest quota above, so this just seems to affect mmap'd i/o.
      • we've tried a couple of different user and group accounts and the pattern is similar, so I don't think it's anything odd in an account's limits or settings.
      • another user with a much larger quota is also seeing SIGBUS on a build, but that group is within 30T of a 2P quota, so are "close" to over by some measure. I haven't dug into this bug report yet, but I suspect it's the same issue as this one.
      • builds to XFS work ok. I haven't tried XFS with a quota.

      on the surface this seems similar to LU-13228 but we do not set any soft quotas, and the accounts are many TB away from being over quota. also our only recent lustre changes have been on the server side, and AFAICT that ticket is a client side fix.

      as we have a reproducer and a test methodology, we could probably build a 2.12.4 client image and try that if you would find that useful. we weren't planning to move to 2.12.x client in production just yet, but we could try it as an experiment.

      cheers,
      robin

      Attachments

        Activity

          [LU-13588] sigbus sent to mmap writer that is a long way below quota
          pjones Peter Jones added a comment -

          Good news! Thanks

          pjones Peter Jones added a comment - Good news! Thanks
          scadmin SC Admin added a comment -

          Hi Oleg and Peter,

          we have all clients at 2.12.5 now, and no sign of SIGBUS.

          please close this ticket.
          thanks!

          cheers,
          robin

          scadmin SC Admin added a comment - Hi Oleg and Peter, we have all clients at 2.12.5 now, and no sign of SIGBUS. please close this ticket. thanks! cheers, robin
          pjones Peter Jones added a comment -

          Robin

          2.12.5 is now GA

          Peter

          pjones Peter Jones added a comment - Robin 2.12.5 is now GA Peter
          pjones Peter Jones added a comment -

          Robin

          We're at an advanced stage of release testing on RC1 and so far so good

          Peter

          pjones Peter Jones added a comment - Robin We're at an advanced stage of release testing on RC1 and so far so good Peter
          pjones Peter Jones added a comment -

          scadmin yes 2.12.5 should be out soon - we're aiming to have an RC next week

          pjones Peter Jones added a comment - scadmin yes 2.12.5 should be out soon - we're aiming to have an RC next week
          scadmin SC Admin added a comment -

          Hi Oleg,

          2.12.4 + the patch in https://review.whamcloud.com/38292 seems to have fixed it. thanks!

          BTW any idea if 2.12.5 is out soon?
          it would be good to have all those fixes as well as this one before we make the jump to 2.12 clients.

          cheers,
          robin

          scadmin SC Admin added a comment - Hi Oleg, 2.12.4 + the patch in https://review.whamcloud.com/38292 seems to have fixed it. thanks! BTW any idea if 2.12.5 is out soon? it would be good to have all those fixes as well as this one before we make the jump to 2.12 clients. cheers, robin
          green Oleg Drokin added a comment -

          no, it's a client only patch

          green Oleg Drokin added a comment - no, it's a client only patch
          scadmin SC Admin added a comment -

          Hi Oleg,

          I tried 2.12.4 client (no patches except a build patch for rhel7.8) instead of 2.10.8, and the sigbus issue is still there.

          do I need to apply the patch in https://review.whamcloud.com/38292 to the servers as well as clients?

          cheers,
          robin

          scadmin SC Admin added a comment - Hi Oleg, I tried 2.12.4 client (no patches except a build patch for rhel7.8) instead of 2.10.8, and the sigbus issue is still there. do I need to apply the patch in https://review.whamcloud.com/38292 to the servers as well as clients? cheers, robin
          green Oleg Drokin added a comment -

          any chance you can give this patch a try still? https://review.whamcloud.com/38292

          I have seen it failing even in total absence of quotas just based on grant dynamics which is the other way how you can get that codepath triggered

          green Oleg Drokin added a comment - any chance you can give this patch a try still? https://review.whamcloud.com/38292 I have seen it failing even in total absence of quotas just based on grant dynamics which is the other way how you can get that codepath triggered

          People

            green Oleg Drokin
            scadmin SC Admin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: