Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6747

Intermittent rc=-EROFS from lod_statfs_and_check

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • Lustre 2.7.0
    • lustre 2.7.54
      spl/zfs 0.6.4.1
      single MDS with one MDT and MGS
      two OSS's with one OST each
    • 3
    • 9223372036854775807

    Description

      Running mdtest I get failures like this:

      06/18/2015 16:22:12: Process 26(zwicky29): FAILED in create_remove_items_helper, unable to create file file.mdtest.26.0 (cwd=/p/lburn/faaland1/zfs-crada/mdtest/2/#test-dir.0/mdtest_tree.26.0): No space left on device
      06/18/2015 16:22:12: Process 19(zwicky22): FAILED in create_remove_items_helper, unable to create file file.mdtest.19.0 (cwd=/p/lburn/faaland1/zfs-crada/mdtest/2/#test-dir.0/mdtest_tree.19.0): No space left on device

      The three servers involved are using the zfs backend and their pools have lots of free space; all are <1% full.

      In the lustre debug log on the MDS, I'm seeing
      lod_qos.c:238:lod_statfs_and_check() return -30 (-LUSTRE_EROFS)
      lod_qos.c:1016:lod_alloc_rr() return -28 (-ENOSPC)

      Many other functions report exiting with -28 as well:

      lod_object.c:2104:lod_declare_xattr_set()
      lod_object.c:3352:lod_declare_striped_object()
      lod_object.c:3384:lod_declare_striped_object()
      lod_object.c:3463:lod_declare_object_create()
      lod_qos.c:1913:lod_qos_prep_create()
      mdd_dir.c:1786:mdd_create_data()
      mdd_dir.c:1807:mdd_create_data()
      mdd_dir.c:1983:mdd_declare_object_create()
      mdd_dir.c:2054:mdd_declare_create()
      mdd_dir.c:2354:mdd_create()
      mdd_object.c:352:mdd_declare_object_create_internal()
      mdt_open.c:1105:mdt_open_by_fid_lock()
      mdt_open.c:1255:mdt_reint_open()
      mdt_open.c:1374:mdt_reint_open()
      mdt_open.c:138:mdt_create_data()
      mdt_open.c:347:mdt_mfd_open()
      mdt_open.c:607:mdt_finish_open()
      mdt_reint.c:1997:mdt_reint_rec()

      I've attached a few thousand lines of debug output from the mds with both debug and debug_subsys set to -1. I can reproduce easily, so I can get debug output with specific subsystems turned off or on.

      Attachments

        Issue Links

          Activity

            [LU-6747] Intermittent rc=-EROFS from lod_statfs_and_check
            pjones Peter Jones added a comment -

            Landed for 2.8

            pjones Peter Jones added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15346/
            Subject: LU-6747 osd-zfs: initialize obd_statfs in osd_statfs()
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1ef0185e8c12aa11a4c87c4956b1ba408c0e3d08

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15346/ Subject: LU-6747 osd-zfs: initialize obd_statfs in osd_statfs() Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1ef0185e8c12aa11a4c87c4956b1ba408c0e3d08

            The patch appears to have resolved the problem; mdtest has been running now for several hours without encountering the issue, and ran for several hours last night as well.

            I'm surprised I didn't see this problem earlier. I ran mdtest in the same manner, against the same SPL/ZFS/Lustre versions, on the same clients and servers, with SPL and ZFS built using the --enable-debug configure option. I didn't observe this problem. I then rebuilt SPL and ZFS without debug, and rebuilt lustre against it, which is when I began to see this.

            ofaaland Olaf Faaland added a comment - The patch appears to have resolved the problem; mdtest has been running now for several hours without encountering the issue, and ran for several hours last night as well. I'm surprised I didn't see this problem earlier. I ran mdtest in the same manner, against the same SPL/ZFS/Lustre versions, on the same clients and servers, with SPL and ZFS built using the --enable-debug configure option. I didn't observe this problem. I then rebuilt SPL and ZFS without debug, and rebuilt lustre against it, which is when I began to see this.
            ofaaland Olaf Faaland added a comment -

            Nathaniel,

            The patch had an error in its usage of offsetof(), which I fixed, but the patch still fails to build on SLES12. It builds fine on RHEL. I looked at the SLES12 build console output but am not able to tell what the problem is. Can you take a look?

            thanks,
            Olaf

            ofaaland Olaf Faaland added a comment - Nathaniel, The patch had an error in its usage of offsetof(), which I fixed, but the patch still fails to build on SLES12. It builds fine on RHEL. I looked at the SLES12 build console output but am not able to tell what the problem is. Can you take a look? thanks, Olaf
            ofaaland Olaf Faaland added a comment -

            Both OSTs are indeed reporting a nonzero state intermittently:

            zwicky-lburn-oss1.out:00002000:00000020:0.0:1434584977.251447:0:12631:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821990 free, 176813342 avail; 5658325578 objects: 5658303680 free; state 9125230e
            zwicky-lburn-oss1.out:00002000:00000020:0.0:1434585002.260232:0:12631:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821990 free, 176821809 avail; 5658327093 objects: 5658303680 free; state 9125230e
            zwicky-lburn-oss1.out:00002000:00000020:9.0:1434585002.890186:0:12554:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821990 free, 176805734 avail; 5658327093 objects: 5658303680 free; state 9125230e
            zwicky-lburn-oss1.out:00002000:00000020:9.0:1434585007.890294:0:12554:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821988 free, 176812034 avail; 5658330734 objects: 5658303616 free; state 9125230e
            zwicky-lburn-oss1.out:00002000:00000020:0.0:1434585017.266203:0:12631:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821815 avail; 5658321151 objects: 5658303744 free; state 9125230e
            zwicky-lburn-oss1.out:00002000:00000020:10.0:1434585017.891472:0:12649:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821736 avail; 5658321151 objects: 5658303744 free; state 9125230e
            zwicky-lburn-oss1.out:00002000:00000020:0.0:1434585022.267691:0:12631:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821815 avail; 5658321151 objects: 5658303744 free; state 9125230e
            zwicky-lburn-oss1.out:00002000:00000020:10.0:1434585022.912333:0:12649:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821736 avail; 5658321151 objects: 5658303744 free; state 9125230e
            zwicky-lburn-oss1.out:00002000:00000020:0.0:1434585027.269289:0:12631:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821815 avail; 5658321151 objects: 5658303744 free; state 9125230e
            zwicky-lburn-oss1.out:00002000:00000020:10.0:1434585027.913373:0:12649:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821736 avail; 5658321151 objects: 5658303744 free; state 9125230e
            zwicky-lburn-oss1.out:00002000:00000020:0.0:1434585032.270956:0:12631:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821815 avail; 5658321151 objects: 5658303744 free; state 9125230e
            zwicky-lburn-oss1.out:00002000:00000020:10.0:1434585032.913344:0:12649:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821736 avail; 5658321151 objects: 5658303744 free; state 9125230e
            zwicky-lburn-oss1.out:00002000:00000020:0.0:1434585037.272373:0:12631:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821815 avail; 5658321151 objects: 5658303744 free; state 9125230e
            zwicky-lburn-oss1.out:00002000:00000020:10.0:1434585037.914346:0:12649:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821736 avail; 5658321151 objects: 5658303744 free; state 9125230e
            zwicky-lburn-oss1.out:00002000:00000020:0.0:1434585042.273878:0:12631:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821815 avail; 5658321151 objects: 5658303744 free; state 9125230e
            zwicky-lburn-oss1.out:00002000:00000020:10.0:1434585042.914405:0:12649:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821736 avail; 5658321151 objects: 5658303744 free; state 9125230e
            zwicky-lburn-oss2.out:00002000:00000020:1.0:1434584984.055391:0:12541:0:(ofd_obd.c:841:ofd_statfs()) 176822073 blocks: 176821934 free, 176821857 avail; 5658316107 objects: 5658301888 free; state c9c4f476
            zwicky-lburn-oss2.out:00002000:00000020:1.0:1434584989.057246:0:12541:0:(ofd_obd.c:841:ofd_statfs()) 176822076 blocks: 176821943 free, 176821866 avail; 5658315946 objects: 5658302176 free; state c9c4f476
            zwicky-lburn-oss2.out:00002000:00000020:1.0:1434585014.066824:0:12541:0:(ofd_obd.c:841:ofd_statfs()) 176822076 blocks: 176821940 free, 176821863 avail; 5658319856 objects: 5658302080 free; state c9c4f476
            zwicky-lburn-oss2.out:00002000:00000020:1.0:1434585019.068752:0:12541:0:(ofd_obd.c:841:ofd_statfs()) 176822076 blocks: 176821940 free, 176821863 avail; 5658319856 objects: 5658302080 free; state c9c4f476
            zwicky-lburn-oss2.out:00002000:00000020:1.0:1434585029.072516:0:12541:0:(ofd_obd.c:841:ofd_statfs()) 176822076 blocks: 176821921 free, 176821844 avail; 5658339248 objects: 5658301472 free; state c9c4f476
            zwicky-lburn-oss2.out:00002000:00000020:8.0:1434585031.588445:0:12764:0:(ofd_obd.c:841:ofd_statfs()) 176822076 blocks: 176821922 free, 176790984 avail; 5658338073 objects: 5658301504 free; state c9c4f476
            zwicky-lburn-oss2.out:00002000:00000020:1.0:1434585034.074483:0:12541:0:(ofd_obd.c:841:ofd_statfs()) 176822076 blocks: 176821924 free, 176795098 avail; 5658334103 objects: 5658301568 free; state c9c4f476
            zwicky-lburn-oss2.out:00002000:00000020:4.0:1434585044.078171:0:12541:0:(ofd_obd.c:841:ofd_statfs()) 176822073 blocks: 176821940 free, 176820539 avail; 4863994422 objects: 4863984771 free; state c9c4f476
            zwicky-lburn-oss2.out:00002000:00000020:4.0:1434585049.080189:0:12541:0:(ofd_obd.c:841:ofd_statfs()) 176822076 blocks: 176821945 free, 176821872 avail; 3106822606 objects: 3106816830 free; state c9c4f476

            I'll try the patch.

            ofaaland Olaf Faaland added a comment - Both OSTs are indeed reporting a nonzero state intermittently: zwicky-lburn-oss1.out:00002000:00000020:0.0:1434584977.251447:0:12631:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821990 free, 176813342 avail; 5658325578 objects: 5658303680 free; state 9125230e zwicky-lburn-oss1.out:00002000:00000020:0.0:1434585002.260232:0:12631:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821990 free, 176821809 avail; 5658327093 objects: 5658303680 free; state 9125230e zwicky-lburn-oss1.out:00002000:00000020:9.0:1434585002.890186:0:12554:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821990 free, 176805734 avail; 5658327093 objects: 5658303680 free; state 9125230e zwicky-lburn-oss1.out:00002000:00000020:9.0:1434585007.890294:0:12554:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821988 free, 176812034 avail; 5658330734 objects: 5658303616 free; state 9125230e zwicky-lburn-oss1.out:00002000:00000020:0.0:1434585017.266203:0:12631:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821815 avail; 5658321151 objects: 5658303744 free; state 9125230e zwicky-lburn-oss1.out:00002000:00000020:10.0:1434585017.891472:0:12649:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821736 avail; 5658321151 objects: 5658303744 free; state 9125230e zwicky-lburn-oss1.out:00002000:00000020:0.0:1434585022.267691:0:12631:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821815 avail; 5658321151 objects: 5658303744 free; state 9125230e zwicky-lburn-oss1.out:00002000:00000020:10.0:1434585022.912333:0:12649:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821736 avail; 5658321151 objects: 5658303744 free; state 9125230e zwicky-lburn-oss1.out:00002000:00000020:0.0:1434585027.269289:0:12631:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821815 avail; 5658321151 objects: 5658303744 free; state 9125230e zwicky-lburn-oss1.out:00002000:00000020:10.0:1434585027.913373:0:12649:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821736 avail; 5658321151 objects: 5658303744 free; state 9125230e zwicky-lburn-oss1.out:00002000:00000020:0.0:1434585032.270956:0:12631:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821815 avail; 5658321151 objects: 5658303744 free; state 9125230e zwicky-lburn-oss1.out:00002000:00000020:10.0:1434585032.913344:0:12649:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821736 avail; 5658321151 objects: 5658303744 free; state 9125230e zwicky-lburn-oss1.out:00002000:00000020:0.0:1434585037.272373:0:12631:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821815 avail; 5658321151 objects: 5658303744 free; state 9125230e zwicky-lburn-oss1.out:00002000:00000020:10.0:1434585037.914346:0:12649:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821736 avail; 5658321151 objects: 5658303744 free; state 9125230e zwicky-lburn-oss1.out:00002000:00000020:0.0:1434585042.273878:0:12631:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821815 avail; 5658321151 objects: 5658303744 free; state 9125230e zwicky-lburn-oss1.out:00002000:00000020:10.0:1434585042.914405:0:12649:0:(ofd_obd.c:841:ofd_statfs()) 176822077 blocks: 176821992 free, 176821736 avail; 5658321151 objects: 5658303744 free; state 9125230e zwicky-lburn-oss2.out:00002000:00000020:1.0:1434584984.055391:0:12541:0:(ofd_obd.c:841:ofd_statfs()) 176822073 blocks: 176821934 free, 176821857 avail; 5658316107 objects: 5658301888 free; state c9c4f476 zwicky-lburn-oss2.out:00002000:00000020:1.0:1434584989.057246:0:12541:0:(ofd_obd.c:841:ofd_statfs()) 176822076 blocks: 176821943 free, 176821866 avail; 5658315946 objects: 5658302176 free; state c9c4f476 zwicky-lburn-oss2.out:00002000:00000020:1.0:1434585014.066824:0:12541:0:(ofd_obd.c:841:ofd_statfs()) 176822076 blocks: 176821940 free, 176821863 avail; 5658319856 objects: 5658302080 free; state c9c4f476 zwicky-lburn-oss2.out:00002000:00000020:1.0:1434585019.068752:0:12541:0:(ofd_obd.c:841:ofd_statfs()) 176822076 blocks: 176821940 free, 176821863 avail; 5658319856 objects: 5658302080 free; state c9c4f476 zwicky-lburn-oss2.out:00002000:00000020:1.0:1434585029.072516:0:12541:0:(ofd_obd.c:841:ofd_statfs()) 176822076 blocks: 176821921 free, 176821844 avail; 5658339248 objects: 5658301472 free; state c9c4f476 zwicky-lburn-oss2.out:00002000:00000020:8.0:1434585031.588445:0:12764:0:(ofd_obd.c:841:ofd_statfs()) 176822076 blocks: 176821922 free, 176790984 avail; 5658338073 objects: 5658301504 free; state c9c4f476 zwicky-lburn-oss2.out:00002000:00000020:1.0:1434585034.074483:0:12541:0:(ofd_obd.c:841:ofd_statfs()) 176822076 blocks: 176821924 free, 176795098 avail; 5658334103 objects: 5658301568 free; state c9c4f476 zwicky-lburn-oss2.out:00002000:00000020:4.0:1434585044.078171:0:12541:0:(ofd_obd.c:841:ofd_statfs()) 176822073 blocks: 176821940 free, 176820539 avail; 4863994422 objects: 4863984771 free; state c9c4f476 zwicky-lburn-oss2.out:00002000:00000020:4.0:1434585049.080189:0:12541:0:(ofd_obd.c:841:ofd_statfs()) 176822076 blocks: 176821945 free, 176821872 avail; 3106822606 objects: 3106816830 free; state c9c4f476 I'll try the patch.
            pjones Peter Jones added a comment -

            Assigning to Nathaniel for any follow on questions

            pjones Peter Jones added a comment - Assigning to Nathaniel for any follow on questions

            People

              utopiabound Nathaniel Clark
              ofaaland Olaf Faaland
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: