Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4554

OI scrub always runs on ldiskfs MDS start up

Details

    • 3
    • 12442

    Description

      We are running Lustre 2.4.0-21chaos (see http://github/com/chaos/lustre), and most likely of particular interest are these two patches that we are carrying:

      • LU-3934 scrub: detect upgraded from 1.8 correctly
      • LU-3420 scrub: trigger OI scrub properly

      We now find that, at least on the ldiskfs MDS nodes, OI scrub runs on every start up of the MDS. The console message looks something like this:

      2014-01-28 09:27:28 sumom-mds1 login: LustreError: 0-0: lsc-MDT0000: trigger OI scrub by RPC for [0x7e4d2310f09:0x2ddf:0x0], rc = 0 [1]

      Given the frequency of MDS reboots (i.e. often) required lately for other bugs, OI scrub is running far too much.

      Attachments

        Activity

          [LU-4554] OI scrub always runs on ldiskfs MDS start up
          pjones Peter Jones added a comment -

          Landed for 2.5.1 and 2.6

          pjones Peter Jones added a comment - Landed for 2.5.1 and 2.6

          This patch hasn't landed to master yet, but I created a b2_5 and b2_4 patch at:

          b2_4 - http://review.whamcloud.com/#/c/9140/
          b2_5 - http://review.whamcloud.com/#/c/9139/

          jamesanunez James Nunez (Inactive) added a comment - This patch hasn't landed to master yet, but I created a b2_5 and b2_4 patch at: b2_4 - http://review.whamcloud.com/#/c/9140/ b2_5 - http://review.whamcloud.com/#/c/9139/

          James, Lai is on holiday this week. Could you please cherry-pick this patch to b2_4 and b2_5 once it has landed to master. This can now be done directly in Gerrit. Please also add the "Lustre-change:" and "Lustre-commit:" tags to the commit messages as described on https://wiki.hpdd.intel.com/display/PUB/Commit+Comments

          adilger Andreas Dilger added a comment - James, Lai is on holiday this week. Could you please cherry-pick this patch to b2_4 and b2_5 once it has landed to master. This can now be done directly in Gerrit. Please also add the "Lustre-change:" and "Lustre-commit:" tags to the commit messages as described on https://wiki.hpdd.intel.com/display/PUB/Commit+Comments
          nedbass Ned Bass (Inactive) added a comment - Patch for master: http://review.whamcloud.com/#/c/9067/

          I peeked at the /OI_scrub file while an auto-scrub was running. It showed that bit 0 was set in sf->sf_oi_bitmap. This is wrong, because the OI already exists and OI_scrub has already run to completion several times.

          I think I see the problem in osd_oi_table_open(). Note the format string assumes the OI containers have names like oi.16.0, oi.16.1, and so on. However, for our upgraded filesystems we have only one OI container named oi.16. So osd_oi_open() returns ENOENT and we proceed to set the "recreated" bit in the bitmap.

          nedbass Ned Bass (Inactive) added a comment - I peeked at the /OI_scrub file while an auto-scrub was running. It showed that bit 0 was set in sf->sf_oi_bitmap. This is wrong, because the OI already exists and OI_scrub has already run to completion several times. I think I see the problem in osd_oi_table_open() . Note the format string assumes the OI containers have names like oi.16.0, oi.16.1, and so on. However, for our upgraded filesystems we have only one OI container named oi.16. So osd_oi_open() returns ENOENT and we proceed to set the "recreated" bit in the bitmap.

          I also notice osd_fid_lookup() starts the scrub using osd_scrub_start(dev), which only enables the flag SS_AUTO. So even though (ldiskfs_test_bit(osd_oi_fid2idx(dev,fid), sf->sf_oi_bitmap)) is true, (unless I misunderstand something) we would not see the "recreated" flag in oi_scrub.

          nedbass Ned Bass (Inactive) added a comment - I also notice osd_fid_lookup() starts the scrub using osd_scrub_start(dev) , which only enables the flag SS_AUTO . So even though (ldiskfs_test_bit(osd_oi_fid2idx(dev,fid), sf->sf_oi_bitmap)) is true, (unless I misunderstand something) we would not see the "recreated" flag in oi_scrub .

          Fan Yong, the -1 debug log is from a classified system, so I can't send it, but if you have specific questions about it I can look for you. While the scrub was in progress, the flags field only had 'auto'. An example FID from that system that followed the "trigger" path was [0x1a89082ad98:0x4d:0x0].

          nedbass Ned Bass (Inactive) added a comment - Fan Yong, the -1 debug log is from a classified system, so I can't send it, but if you have specific questions about it I can look for you. While the scrub was in progress, the flags field only had 'auto'. An example FID from that system that followed the "trigger" path was [0x1a89082ad98:0x4d:0x0] .

          People

            jamesanunez James Nunez (Inactive)
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: