Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.5.3
    • None
    • 1
    • 9223372036854775807

    Description

      User reported file corruption as shown bellow. The file is striped across 4 osts at 1MB.

      The corruption is 4KB in size and the end aligns with a ost stripe boundary. The corrupted data is from a process that run on the oss writing and reading data to the local oss filesystem.

      We have a cron job that dumps ost metadata once a day like so:

      /sbin/dumpe2fs /dev/ostdevice > /root/ostdevice.meta 2>/dev/null
      

      The output file is read 15min and inodes are caches on the oss.

        0.1926E-04  0.8636E-05 -0.5430E-05 -0.1747E-04 -0.2318E-04
       -0.2108E-04 -0.1270E-04 -0.1492E-05  0.8965E-05  0.1638E-04
        0.2025E-04  0.2143E-04  0.2111E-04  0.2007E-04  0.1847E-04
        0.1629E-04  0.1384E-04  0.1204E-04  0.1206E-
      ^@^@T^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^B^@^@^@^@^@^@^@^@^@^@^@^D^@^@^@^@^@^@^@^@^@^@^@^H^@^@^@^@^@^@^@^@^@^@^@  ^@^@^@^A^@^@^@E~L~G^@   ^@^@^@^A^@^@^@E~L~G^@^L^@^@^@^@^@^@^@^@^@^@^@^L^@^@^@^@^@^@^@^@^@^@^@^L^@^@^@^@^@^@^@^@^@^@^^
      @^L^@^@^@^@^@^@^@^@^@^@^@^L^@^@^@^@^@^@^@^@^@^@k bitmap at 2181038173 (bg #66560 + 93), Inode bitmap at 2181038429 (bg #66560 + 349)
        Inode table at 2181039336-2181039343 (bg #66560 + 1256)
        1780 free blocks, 128 free inodes, 0 directories, 128 unused inodes
        Free blocks: 2184085760-2184086271, 2184094208-2184094451, 2184097792-2184098815
        Free inodes: 8531585-8531712
      Group 66654: (Blocks 2184118272-2184151039) [INODE_UNINIT, ITABLE_ZEROED]
        Checksum 0xdc38, unused inodes 128
        Block bitmap at 2181038174 (bg #66560 + 94), Inode bitmap at 2181038430 (bg #66560 + 350)
        Inode table at 2181039344-2181039351 (bg #66560 + 1264)
        239 free blocks, 128 free inodes, 0 directories, 128 unused inodes
        Free blocks: 2184121088-2184121321, 2184121339-2184121343
        Free inodes: 8531713-8531840
      Group 66655: (Blocks 2184151040-2184183807) [INODE_UNINIT, ITABLE_ZEROED]
        Checksum 0xc8b0, unused inodes 128
        Block bitmap at 2181038175 (bg #66560 + 95), Inode bitmap at 2181038431 (bg #66560 + 351)
        Inode table at 2181039352-2181039359 (bg #66560 + 1272)
        5119 free blocks, 128 free inodes, 0 directories, 128 unused inodes
        Free blocks: 2184151297-2184151551, 2184154624-2184155135, 2184165376-2184166399, 2184167168-2184167423, 2184171520-2184172543, 2184179712-2184181759
        Free inodes: 8531841-8531968
      Group 66656: (Blocks 2184183808-2184216575) [INODE_UNINIT, ITABLE_ZEROED]
        Checksum 0x7ce0, unused inodes 128
        Block bitmap at 2181038176 (bg #66560 + 96), Inode bitmap at 2181038432 (bg #66560 + 352)
        Inode table at 2181039360-2181039367 (bg #66560 + 1280)
        2816 free blocks, 128 free inodes, 0 directories, 128 unused inodes
        Free blocks: 2184184832-2184185855, 2184198144-2184198911, 2184205312-2184206335
        Free inodes: 8531969-8532096
      Group 66657: (Blocks 2184216576-2184249343) [INODE_UNINIT, ITABLE_ZEROED]
        Checksum 0xe3a2, unused inodes 128
        Block bitmap at 2181038177 (bg #66560 + 97), Inode bitmap at 2181038433 (bg #66560 + 353)
        Inode table at 2181039368-2181039375 (bg #66560 + 1288)
        2574 free blocks, 128 free inodes, 0 directories, 128 unused inodes
        Free blocks: 2184217600-2184218623, 2184221416-2184221437, 2184236544-2184237045, 2184237054-2184237055, 2184240128-2184241151
        Free inodes: 8532097-8532224
      Group 66658: (Blocks 2184249344-2184282111) [INODE_UNINIT, ITABLE_ZEROED]
        Checksum 0xe5c6, unused inodes 128
        Block bitmap at 2181038178 (bg #66560 + 98), Inode bitmap at 2181038434 (bg #66560 + 354)
        Inode table at 2181039376-2181039383 (bg #66560 + 1296)
        5426 free blocks, 128 free inodes, 0 directories, 128 unused inodes
        Free blocks: 2184251392-2184251647, 2184252160-2184252407, 2184252413-2184252415, 2184253440-2184254463, 2184255488-2184255743, 2184255924-2184256511, 2184259584-2184260095, 2184260352-2184260602, 2184260608-2184261631, 2184272896-2184273919, 2184276992-2184277229, 2184277246-2184277247
        Free inodes: 8532225-8532352
      Group 66659: (Blocks 2184282112-2184314879) [INODE_UNINIT, ITABLE_ZEROED]
        Checksum 0x16f0, unused inodes 128
        Block bitmap at 2181038179 (bg #66560 + 99), Inode bitmap at 2181038435 (bg #66560 + 355)
        Inode table at 2181039384-2181039391 (bg #66560 + 1304)
        3751 free blocks, 128 free inodes, 0 directories, 128 unused inodes
        Free blocks: 2184288256-2184289279, 2184292355-2184292607, 2184293376-2184294000, 2184294867-2184294911, 2184297216-2184298495, 2184299491-2184299519, 2184302848-2184303094, 2184303870-2184304116, 2184304127
        Free inodes: 8532353-8532480
      Group 66660: (Blocks 2184314880-2184347647) [INODE_UNINIT, ITABLE_ZEROED]
        Checksum 0x7c1a, unused inodes 128
        Block bitmap at 2181038180 (bg #66560 + 100), Inode bitmap at 2181038436 (bg #66560 + 356)
        Inode table at 2181039392-2181039399 (bg #66560 + 1312)
        9197 free blocks, 128 free inodes, 0 directories, 128 unused inodes
        Free blocks: 2184320256-2184321023, 2184322048-2184323071, 2184323585-2184324055, 2184324057, 2184324074-2184324095, 2184324354-2184325119, 2184325632-2184326143, 2184326655-2184355
       .1385E-04
        0.2720E-04  0.3428E-04  0.3470E-04  0.3125E-04  0.2717E-04
        0.2375E-04  0.1968E-04  0.1258E-04  0.1537E-05 -0.1135E-04
       -0.2146E-04
        0.2531E-04  0.2365E-04  0.2503E-04  0.2827E-04  0.2984E-04
        0.2598E-04  0.1521E-04 -0.2827E-06 -0.1534E-04 -0.2416E-04
      

      No errors are logged on the OSS.

      Attachments

        Issue Links

          Activity

            [LU-6925] oss buffer cache corruption
            pjones Peter Jones added a comment -

            As per NASA fix worked

            pjones Peter Jones added a comment - As per NASA fix worked

            LU-6768 can happen to an empty filesystem as well. it's just easier to hit when a filesystem is nearly full (blocks are reallocated quickly). I'd think this can be a result of LU-6758. truncate is required to hit that though. probably it makes sense to trace the application to verify this.

            bzzz Alex Zhuravlev added a comment - LU-6768 can happen to an empty filesystem as well. it's just easier to hit when a filesystem is nearly full (blocks are reallocated quickly). I'd think this can be a result of LU-6758 . truncate is required to hit that though. probably it makes sense to trace the application to verify this.
            green Oleg Drokin added a comment -

            Alex, what do you think on this? I imagine quota might cause writes to fail at times too even if otherwise there's plenty of space?

            green Oleg Drokin added a comment - Alex, what do you think on this? I imagine quota might cause writes to fail at times too even if otherwise there's plenty of space?

            Could enabling quota enforcement increase the likely hood of hitting this bug?

            mhanafi Mahmoud Hanafi added a comment - Could enabling quota enforcement increase the likely hood of hitting this bug?

            I posted a request of b2_5 port of LU-6768 patch in that LU.

            jaylan Jay Lan (Inactive) added a comment - I posted a request of b2_5 port of LU-6768 patch in that LU.
            green Oleg Drokin added a comment -

            Yes, I did mean disk space since this is what was reported as one of preconditions in LU-6768 that looks pretty similar to what you seems to have experienced, but I guess it's just something that makes the condition to trigger more easy to trigger?

            green Oleg Drokin added a comment - Yes, I did mean disk space since this is what was reported as one of preconditions in LU-6768 that looks pretty similar to what you seems to have experienced, but I guess it's just something that makes the condition to trigger more easy to trigger?

            'low space?" do you mean ost disk space? I don't think we where low on disk space but there was a large spike in load and most of the memory was consumed in page/buffer cache.

            mhanafi Mahmoud Hanafi added a comment - 'low space?" do you mean ost disk space? I don't think we where low on disk space but there was a large spike in load and most of the memory was consumed in page/buffer cache.
            green Oleg Drokin added a comment -

            Hm. This is quite a mystery indeed.

            the OST where this occurred on (the corrupted stripe), did it happen to be low on space? There's LU-6768 that I think could lead to what you describe, a dirty page from page cache appropriated.

            green Oleg Drokin added a comment - Hm. This is quite a mystery indeed. the OST where this occurred on (the corrupted stripe), did it happen to be low on space? There's LU-6768 that I think could lead to what you describe, a dirty page from page cache appropriated.

            Sorry may I am not explaining well. This is a very strange issue....

            The user was running a job on a lustre client writing the file to lustre. The corruption is in the user file on lustre. But the data that is inserted into the users file is data that is read and written on the local filesystem of the OSS. So some how data that is being read and written on the OSS root filesystem corrupted part of the user file on lustre. The corruption was exactly 4KB and it was at the end of a OST stripe.

            mhanafi Mahmoud Hanafi added a comment - Sorry may I am not explaining well. This is a very strange issue.... The user was running a job on a lustre client writing the file to lustre. The corruption is in the user file on lustre. But the data that is inserted into the users file is data that is read and written on the local filesystem of the OSS. So some how data that is being read and written on the OSS root filesystem corrupted part of the user file on lustre. The corruption was exactly 4KB and it was at the end of a OST stripe.
            green Oleg Drokin added a comment -

            I guess I am just confused - if the write target is local filesystem - then there could not be any "ost stripe boundary" in there?
            Or do you also see corruptions in the files on Lustre itself?

            green Oleg Drokin added a comment - I guess I am just confused - if the write target is local filesystem - then there could not be any "ost stripe boundary" in there? Or do you also see corruptions in the files on Lustre itself?

            People

              green Oleg Drokin
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: