Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14598

Too many FIDs to precreate OST replaced. Wrong IDIF logic.

Details

    • 3
    • 9223372036854775807

    Description

      During testing, job failure was observed

      V-1: Entering mdtest_stat...
      12/01/2020 04:46:28: Process 30(nid00514): FAILED in mdtest_stat, unable to stat file: No such file or directory
      Rank 30 [Tue Dec  1 04:46:28 2020] [c0-1c2s0n2] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 30
      12/01/2020 04:46:29: Process 3(nid00514): FAILED in mdtest_stat, unable to stat file: No such file or directory
      Rank 3 [Tue Dec  1 04:46:29 2020] [c0-1c2s0n2] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
      _pmiu_daemon(SIGCHLD): [NID 00514] [c0-1c2s0n2] [Tue Dec  1 04:46:29 2020] PE RANK 3 exit signal Aborted
      [NID 00514] 2020-12-01 04:46:29 Apid 7421368: initiated application termination
      Application 7421368 exit codes: 134
      Application 7421368 resources: utime ~0s, stime ~8s, Rss ~9528, inblocks ~0, outblocks ~0
      Job Script: command stopped at Tue Dec 1 04:46:36 CST 2020
      Job Script: command runtime was 240 seconds
      drwxr-xr-x 3 vers tsttool 4096 Dec  1 04:42 /lus/snx11281/ostest.vers/alsorun.20201130141403.23327.saturn-p4/CL_mdtest_pfl_1s_fo.6.cdR49p.1606819353/CL_mdtest_pfl_1s_fo
      ls: cannot access '/lus/snx11281/ostest.vers/alsorun.20201130141403.23327.saturn-p4/CL_mdtest_pfl_1s_fo.6.cdR49p.1606819353/CL_mdtest_pfl_1s_fo/#test-dir.0/mdtest_tree.0.0/file.mdtest.0.5023': No such file or directory
      ls: cannot access '/lus/snx11281/ostest.vers/alsorun.20201130141403.23327.saturn-p4/CL_mdtest_pfl_1s_fo.6.cdR49p.1606819353/CL_mdtest_pfl_1s_fo/#test-dir.0/mdtest_tree.0.0/file.mdtest.0.4854': No such file or directory
      ...
      Dec  1 04:45:12 snx11281n002 kernel: Lustre: 15664:0:(client.c:2188:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1606819417/real 1606819417]  req@ffff94b0eb7d9680 x1684868787957760/t0(0) o5->snx11281-OST0001-osc-MDT0000@10.10.100.6@o2ib3:28/4 lens 432/432 e 0 to 1 dl 1606819512 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 job:''
      Dec  1 04:45:12 snx11281n002 kernel: Lustre: snx11281-OST0001-osc-MDT0000: Connection to snx11281-OST0001 (at 10.10.100.6@o2ib3) was lost; in progress operations using this service will wait for recovery to complete
      Dec  1 04:45:12 snx11281n002 kernel: LustreError: 15664:0:(osp_precreate.c:687:osp_precreate_send()) snx11281-OST0001-osc-MDT0000: can't precreate: rc = -11
      Dec  1 04:45:12 snx11281n002 kernel: LustreError: 15664:0:(osp_precreate.c:1358:osp_precreate_thread()) snx11281-OST0001-osc-MDT0000: cannot precreate objects: rc = -11
      

      With a debug patch we have catch vmcore with additional info.

      [ 1401.294579] Lustre: format at ofd_dev.c:1651:ofd_create_hdl doesn't end in newline
      [ 1401.294587] Lustre: snx11281-OST0004: Too many FIDs to precreate OST replaced or reformatted: LFSCK will clean up
      [ 1401.294587] LustreError: 19714:0:(ofd_dev.c:1665:ofd_create_hdl()) ASSERTION( diff <= 5 * OST_MAX_PRECREATE ) failed: snx11281-OST0004: precreate FID 0x0:4395085801 is over 100000 larger than the LAST_ID 0x0:4294967297 diff 100118504
      [ 1401.294589] LustreError: 19714:0:(ofd_dev.c:1665:ofd_create_hdl()) LBUG
      

      Here is a root cause.
      ofd_preprw_write() uses a wrong logic for getting sequence from a fid.
      During recovery ofd_preprw_write took a wrong sequence 0x100040001 from fid, and loaded it for ofd device. After that OST got creation request from MDT0, ofd_create_hdl found loaded IDIF sequence and used it. Since sequence was a new it has a large oid difference with MDT0.

      Attachments

        Issue Links

          Activity

            [LU-14598] Too many FIDs to precreate OST replaced. Wrong IDIF logic.

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49586/
            Subject: LU-14598 tests: skip conf-sanity test_122b in interop
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: 22d3efff0df2365812ce23ea22c79ce13f9379ab

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49586/ Subject: LU-14598 tests: skip conf-sanity test_122b in interop Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: 22d3efff0df2365812ce23ea22c79ce13f9379ab

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49583/
            Subject: LU-14598 tests: skip conf-sanity test_122b in interop
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a9f83af896f2ce20e9ee430ad371b707c0c140cc

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49583/ Subject: LU-14598 tests: skip conf-sanity test_122b in interop Project: fs/lustre-release Branch: master Current Patch Set: Commit: a9f83af896f2ce20e9ee430ad371b707c0c140cc

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49586
            Subject: LU-14598 tests: skip conf-sanity test_122b in interop
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: 988f86ee62c39488e47dfad7392f43e7fad02d17

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49586 Subject: LU-14598 tests: skip conf-sanity test_122b in interop Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 988f86ee62c39488e47dfad7392f43e7fad02d17

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49583
            Subject: LU-14598 tests: skip conf-sanity test_122b in interop
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8e15572e836a75bd52e6d249b59b76f340b97de3

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49583 Subject: LU-14598 tests: skip conf-sanity test_122b in interop Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8e15572e836a75bd52e6d249b59b76f340b97de3

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43541/
            Subject: LU-14598 ofd: fix for IDIF sequence at ofd_preprw_write
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 4a0290c6be47e3ecd8108f78961afb2e54d6fddf

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43541/ Subject: LU-14598 ofd: fix for IDIF sequence at ofd_preprw_write Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 4a0290c6be47e3ecd8108f78961afb2e54d6fddf

            Etienne AUJAMES (eaujames@ddn.com) uploaded a new patch: https://review.whamcloud.com/43541
            Subject: LU-14598 ofd: fix for IDIF sequence at ofd_preprw_write
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: f9b834eaf9d8706accd9b45f2c5670b8b908c596

            gerrit Gerrit Updater added a comment - Etienne AUJAMES (eaujames@ddn.com) uploaded a new patch: https://review.whamcloud.com/43541 Subject: LU-14598 ofd: fix for IDIF sequence at ofd_preprw_write Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: f9b834eaf9d8706accd9b45f2c5670b8b908c596
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43248/
            Subject: LU-14598 ofd: fix for IDIF sequence at ofd_preprw_write
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 747fed818be5a4e09281ab1d9fd5b3a13763ab40

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43248/ Subject: LU-14598 ofd: fix for IDIF sequence at ofd_preprw_write Project: fs/lustre-release Branch: master Current Patch Set: Commit: 747fed818be5a4e09281ab1d9fd5b3a13763ab40

            Related to this issue, I see one more problem,

            ofd_preprw_write() has a loop for creation of missing objects, without limits with 100k like osp has. It could hang a recovery with a small numbers, and break recovery when millions object should be created. May be it is better to return old logic, when a single object is created for this write IO?

            aboyko Alexander Boyko added a comment - Related to this issue, I see one more problem, ofd_preprw_write() has a loop for creation of missing objects, without limits with 100k like osp has. It could hang a recovery with a small numbers, and break recovery when millions object should be created. May be it is better to return old logic, when a single object is created for this write IO?

            @Andreas Dilger, yes, the right sequence should be used - 0x100040000 instead of 0x100040001. I'm adding a bit more info from vmcore/debugfs. It shows that a new sequence was loaded at ofd and new LAST_ID was created.

            At vmcore OST0004 has the next loaded sequences. And it shows a freshly allocated file/sequence with oid 1 for MDT0.

            crash> list ofd_seq.os_list -s ofd_seq.os_oi,os_lastid_obj -H ffffa13a6823f630 -x
            ffffa143f47e5e80
              os_oi = {
                {
                  oi = {
                    oi_id = 0x400000400,
                    oi_seq = 0xd6b439bb
                  },
                  oi_fid = {
                    f_seq = 0x400000400,
                    f_oid = 0xd6b439bb,
                    f_ver = 0x0
                  }
                }
              }
              os_lastid_obj = 0xffffa13d3808ce00
            ffffa139ff995480
              os_oi = {
                {
                  oi = {
                    oi_id = 0x100040001,
                    oi_seq = 0x1
                  },
                  oi_fid = {
                    f_seq = 0x100040001,
                    f_oid = 0x1,
                    f_ver = 0x0
                  }
                }
              }
              os_lastid_obj = 0xffffa13da8c5b700
            crash> osd_object 0xffffa13da8c5b700
            struct osd_object {
              oo_dt = {
                do_lu = {
                  lo_header = 0xffffa13d4ca06d20,
                  lo_dev = 0xffffa13db9aa5000,
                  lo_ops = 0xffffffffc1f3b100 <osd_lu_obj_ops>,
                  lo_linkage = {
                    next = 0xffffa13d4ca06d60,
                    prev = 0xffffa13d4ca06d88
                  },
                  lo_dev_ref = {<No data fields>}
                },
                do_ops = 0xffffffffc1f3b000 <osd_obj_ops>,
                do_body_ops = 0xffffffffc1f3cf40 <osd_body_ops>,
                do_index_ops = 0x0
              },
              oo_inode = 0xffffa13eae1bedc8,
            ....
            crash> inode 0xffffa13eae1bedc8
            struct inode {
              i_mode = 33188,
              i_opflags = 0,
              i_uid = {
                val = 0
              },
              i_gid = {
                val = 0
              },
              i_flags = 128,
              i_acl = 0xffffffffffffffff,
              i_default_acl = 0xffffffffffffffff,
              i_op = 0xffffffffc24b5b40 <ldiskfs_file_inode_operations>,
              i_sb = 0xffffa13f77734000,
              i_mapping = 0xffffa13eae1bef18,
              i_security = 0x0,
              i_ino = 426717,
             

            The creation time of inode 426717 is exact crash time. The sequence loaded at ofd was a new one and LAST_ID was created for it, instead of O/0/LAST_ID. 

            # debugfs /dev/md0
            debugfs 1.45.6.cr1 (14-Aug-2020)
            debugfs:  stat <426717>
            Inode: 426717   Type: regular    Mode:  0644   Flags: 0x84000
            Generation: 1901991170    Version: 0x00000000:00000000
            User:     0   Group:     0   Project:     0   Size: 8
            File ACL: 0
            Links: 1   Blockcount: 8
            Fragment:  Address: 0    Number: 0    Size: 0
             ctime: 0x605a7ccc:a3bdb8b8 -- Tue Mar 23 18:42:04 2021
             atime: 0x605a7ccc:a3bdb8b8 -- Tue Mar 23 18:42:04 2021
             mtime: 0x605a7ccc:a3bdb8b8 -- Tue Mar 23 18:42:04 2021
            crtime: 0x605a7ccc:a3bdb8b8 -- Tue Mar 23 18:42:04 2021
            Size of extra inode fields: 32
            Extended attributes:
              trusted.lma (24) = 08 00 00 00 00 00 00 00 01 00 04 00 01 00 00 00 00 00 00 00 00 00 00 00 
              lma: fid=[0x100040001:0x0:0x0] compat=8 incompat=0
            EXTENTS:
            (0):1193148674
            debugfs:  stat O/0/LAST_ID
            Inode: 96   Type: regular    Mode:  0644   Flags: 0x84000
            Generation: 132363868    Version: 0x00000000:00000000
            User:     0   Group:     0   Project:     0   Size: 8
            File ACL: 0
            Links: 1   Blockcount: 8
            Fragment:  Address: 0    Number: 0    Size: 0
             ctime: 0x5e9f512f:d323ef10 -- Tue Apr 21 15:01:51 2020
             atime: 0x5d016f96:3b027e8c -- Wed Jun 12 16:33:10 2019
             mtime: 0x5d016f96:3b027e8c -- Wed Jun 12 16:33:10 2019
            crtime: 0x5d016f96:3b027e8c -- Wed Jun 12 16:33:10 2019
            Size of extra inode fields: 32
            Extended attributes:
              trusted.lma (24) = 08 00 00 00 00 00 00 00 00 00 04 00 01 00 00 00 00 00 00 00 00 00 00 00 
              lma: fid=[0x100040000:0x0:0x0] compat=8 incompat=0
            EXTENTS:
            (0):36993
            
            aboyko Alexander Boyko added a comment - @Andreas Dilger, yes, the right sequence should be used - 0x100040000 instead of 0x100040001. I'm adding a bit more info from vmcore/debugfs. It shows that a new sequence was loaded at ofd and new LAST_ID was created. At vmcore OST0004 has the next loaded sequences. And it shows a freshly allocated file/sequence with oid 1 for MDT0. crash> list ofd_seq.os_list -s ofd_seq.os_oi,os_lastid_obj -H ffffa13a6823f630 -x ffffa143f47e5e80 os_oi = { { oi = { oi_id = 0x400000400, oi_seq = 0xd6b439bb }, oi_fid = { f_seq = 0x400000400, f_oid = 0xd6b439bb, f_ver = 0x0 } } } os_lastid_obj = 0xffffa13d3808ce00 ffffa139ff995480 os_oi = { { oi = { oi_id = 0x100040001, oi_seq = 0x1 }, oi_fid = { f_seq = 0x100040001, f_oid = 0x1, f_ver = 0x0 } } } os_lastid_obj = 0xffffa13da8c5b700 crash> osd_object 0xffffa13da8c5b700 struct osd_object { oo_dt = { do_lu = { lo_header = 0xffffa13d4ca06d20, lo_dev = 0xffffa13db9aa5000, lo_ops = 0xffffffffc1f3b100 <osd_lu_obj_ops>, lo_linkage = { next = 0xffffa13d4ca06d60, prev = 0xffffa13d4ca06d88 }, lo_dev_ref = {<No data fields>} }, do_ops = 0xffffffffc1f3b000 <osd_obj_ops>, do_body_ops = 0xffffffffc1f3cf40 <osd_body_ops>, do_index_ops = 0x0 }, oo_inode = 0xffffa13eae1bedc8, .... crash> inode 0xffffa13eae1bedc8 struct inode { i_mode = 33188, i_opflags = 0, i_uid = { val = 0 }, i_gid = { val = 0 }, i_flags = 128, i_acl = 0xffffffffffffffff, i_default_acl = 0xffffffffffffffff, i_op = 0xffffffffc24b5b40 <ldiskfs_file_inode_operations>, i_sb = 0xffffa13f77734000, i_mapping = 0xffffa13eae1bef18, i_security = 0x0, i_ino = 426717, The creation time of inode 426717 is exact crash time. The sequence loaded at ofd was a new one and LAST_ID was created for it, instead of O/0/LAST_ID.  # debugfs /dev/md0 debugfs 1.45.6.cr1 (14-Aug-2020) debugfs: stat <426717> Inode: 426717 Type: regular Mode: 0644 Flags: 0x84000 Generation: 1901991170 Version: 0x00000000:00000000 User: 0 Group: 0 Project: 0 Size: 8 File ACL: 0 Links: 1 Blockcount: 8 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x605a7ccc:a3bdb8b8 -- Tue Mar 23 18:42:04 2021 atime: 0x605a7ccc:a3bdb8b8 -- Tue Mar 23 18:42:04 2021 mtime: 0x605a7ccc:a3bdb8b8 -- Tue Mar 23 18:42:04 2021 crtime: 0x605a7ccc:a3bdb8b8 -- Tue Mar 23 18:42:04 2021 Size of extra inode fields: 32 Extended attributes: trusted.lma (24) = 08 00 00 00 00 00 00 00 01 00 04 00 01 00 00 00 00 00 00 00 00 00 00 00 lma: fid=[0x100040001:0x0:0x0] compat=8 incompat=0 EXTENTS: (0):1193148674 debugfs: stat O/0/LAST_ID Inode: 96 Type: regular Mode: 0644 Flags: 0x84000 Generation: 132363868 Version: 0x00000000:00000000 User: 0 Group: 0 Project: 0 Size: 8 File ACL: 0 Links: 1 Blockcount: 8 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5e9f512f:d323ef10 -- Tue Apr 21 15:01:51 2020 atime: 0x5d016f96:3b027e8c -- Wed Jun 12 16:33:10 2019 mtime: 0x5d016f96:3b027e8c -- Wed Jun 12 16:33:10 2019 crtime: 0x5d016f96:3b027e8c -- Wed Jun 12 16:33:10 2019 Size of extra inode fields: 32 Extended attributes: trusted.lma (24) = 08 00 00 00 00 00 00 00 00 00 04 00 01 00 00 00 00 00 00 00 00 00 00 00 lma: fid=[0x100040000:0x0:0x0] compat=8 incompat=0 EXTENTS: (0):36993

            People

              aboyko Alexander Boyko
              aboyko Alexander Boyko
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: