[LU-14598] Too many FIDs to precreate OST replaced. Wrong IDIF logic. Created: 09/Apr/21  Updated: 11/Apr/23  Resolved: 28/Apr/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0, Lustre 2.12.6, Lustre 2.15.0
Fix Version/s: Lustre 2.12.8, Lustre 2.15.0

Type: Bug Priority: Major
Reporter: Alexander Boyko Assignee: Alexander Boyko
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Related
is related to LU-9547 LBUG osp_dev.c:755:osp_statfs()) ASSE... Resolved
is related to LU-14658 conf-sanity test_122b: failed to writ... Resolved
is related to LU-16456 Interop conf-sanity test_132: Can not... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

During testing, job failure was observed

V-1: Entering mdtest_stat...
12/01/2020 04:46:28: Process 30(nid00514): FAILED in mdtest_stat, unable to stat file: No such file or directory
Rank 30 [Tue Dec  1 04:46:28 2020] [c0-1c2s0n2] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 30
12/01/2020 04:46:29: Process 3(nid00514): FAILED in mdtest_stat, unable to stat file: No such file or directory
Rank 3 [Tue Dec  1 04:46:29 2020] [c0-1c2s0n2] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
_pmiu_daemon(SIGCHLD): [NID 00514] [c0-1c2s0n2] [Tue Dec  1 04:46:29 2020] PE RANK 3 exit signal Aborted
[NID 00514] 2020-12-01 04:46:29 Apid 7421368: initiated application termination
Application 7421368 exit codes: 134
Application 7421368 resources: utime ~0s, stime ~8s, Rss ~9528, inblocks ~0, outblocks ~0
Job Script: command stopped at Tue Dec 1 04:46:36 CST 2020
Job Script: command runtime was 240 seconds
drwxr-xr-x 3 vers tsttool 4096 Dec  1 04:42 /lus/snx11281/ostest.vers/alsorun.20201130141403.23327.saturn-p4/CL_mdtest_pfl_1s_fo.6.cdR49p.1606819353/CL_mdtest_pfl_1s_fo
ls: cannot access '/lus/snx11281/ostest.vers/alsorun.20201130141403.23327.saturn-p4/CL_mdtest_pfl_1s_fo.6.cdR49p.1606819353/CL_mdtest_pfl_1s_fo/#test-dir.0/mdtest_tree.0.0/file.mdtest.0.5023': No such file or directory
ls: cannot access '/lus/snx11281/ostest.vers/alsorun.20201130141403.23327.saturn-p4/CL_mdtest_pfl_1s_fo.6.cdR49p.1606819353/CL_mdtest_pfl_1s_fo/#test-dir.0/mdtest_tree.0.0/file.mdtest.0.4854': No such file or directory
...
Dec  1 04:45:12 snx11281n002 kernel: Lustre: 15664:0:(client.c:2188:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1606819417/real 1606819417]  req@ffff94b0eb7d9680 x1684868787957760/t0(0) o5->snx11281-OST0001-osc-MDT0000@10.10.100.6@o2ib3:28/4 lens 432/432 e 0 to 1 dl 1606819512 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 job:''
Dec  1 04:45:12 snx11281n002 kernel: Lustre: snx11281-OST0001-osc-MDT0000: Connection to snx11281-OST0001 (at 10.10.100.6@o2ib3) was lost; in progress operations using this service will wait for recovery to complete
Dec  1 04:45:12 snx11281n002 kernel: LustreError: 15664:0:(osp_precreate.c:687:osp_precreate_send()) snx11281-OST0001-osc-MDT0000: can't precreate: rc = -11
Dec  1 04:45:12 snx11281n002 kernel: LustreError: 15664:0:(osp_precreate.c:1358:osp_precreate_thread()) snx11281-OST0001-osc-MDT0000: cannot precreate objects: rc = -11

With a debug patch we have catch vmcore with additional info.

[ 1401.294579] Lustre: format at ofd_dev.c:1651:ofd_create_hdl doesn't end in newline
[ 1401.294587] Lustre: snx11281-OST0004: Too many FIDs to precreate OST replaced or reformatted: LFSCK will clean up
[ 1401.294587] LustreError: 19714:0:(ofd_dev.c:1665:ofd_create_hdl()) ASSERTION( diff <= 5 * OST_MAX_PRECREATE ) failed: snx11281-OST0004: precreate FID 0x0:4395085801 is over 100000 larger than the LAST_ID 0x0:4294967297 diff 100118504
[ 1401.294589] LustreError: 19714:0:(ofd_dev.c:1665:ofd_create_hdl()) LBUG

Here is a root cause.
ofd_preprw_write() uses a wrong logic for getting sequence from a fid.
During recovery ofd_preprw_write took a wrong sequence 0x100040001 from fid, and loaded it for ofd device. After that OST got creation request from MDT0, ofd_create_hdl found loaded IDIF sequence and used it. Since sequence was a new it has a large oid difference with MDT0.



 Comments   
Comment by Gerrit Updater [ 09/Apr/21 ]

Alexander Boyko (alexander.boyko@hpe.com) uploaded a new patch: https://review.whamcloud.com/43248
Subject: LU-14598 ofd: fix for IDIF sequence at ofd_preprw_write
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1b0024d41bd648b3ae63d85ba8b689eff85355d3

Comment by Andreas Dilger [ 14/Apr/21 ]

Looking at this, I'm trying to understand the details correctly. The "IDIF" sequence actually has 2^48 OIDs for each OST, so the code should not be interpreting "seq" 0x100040001 as a different sequence number from 0x100040000, but rather the high 16-bit value = 0x0001 of the OID for OST0004.

Comment by Alexander Boyko [ 14/Apr/21 ]

@Andreas Dilger, yes, the right sequence should be used - 0x100040000 instead of 0x100040001. I'm adding a bit more info from vmcore/debugfs. It shows that a new sequence was loaded at ofd and new LAST_ID was created.

At vmcore OST0004 has the next loaded sequences. And it shows a freshly allocated file/sequence with oid 1 for MDT0.

crash> list ofd_seq.os_list -s ofd_seq.os_oi,os_lastid_obj -H ffffa13a6823f630 -x
ffffa143f47e5e80
  os_oi = {
    {
      oi = {
        oi_id = 0x400000400,
        oi_seq = 0xd6b439bb
      },
      oi_fid = {
        f_seq = 0x400000400,
        f_oid = 0xd6b439bb,
        f_ver = 0x0
      }
    }
  }
  os_lastid_obj = 0xffffa13d3808ce00
ffffa139ff995480
  os_oi = {
    {
      oi = {
        oi_id = 0x100040001,
        oi_seq = 0x1
      },
      oi_fid = {
        f_seq = 0x100040001,
        f_oid = 0x1,
        f_ver = 0x0
      }
    }
  }
  os_lastid_obj = 0xffffa13da8c5b700
crash> osd_object 0xffffa13da8c5b700
struct osd_object {
  oo_dt = {
    do_lu = {
      lo_header = 0xffffa13d4ca06d20,
      lo_dev = 0xffffa13db9aa5000,
      lo_ops = 0xffffffffc1f3b100 <osd_lu_obj_ops>,
      lo_linkage = {
        next = 0xffffa13d4ca06d60,
        prev = 0xffffa13d4ca06d88
      },
      lo_dev_ref = {<No data fields>}
    },
    do_ops = 0xffffffffc1f3b000 <osd_obj_ops>,
    do_body_ops = 0xffffffffc1f3cf40 <osd_body_ops>,
    do_index_ops = 0x0
  },
  oo_inode = 0xffffa13eae1bedc8,
....
crash> inode 0xffffa13eae1bedc8
struct inode {
  i_mode = 33188,
  i_opflags = 0,
  i_uid = {
    val = 0
  },
  i_gid = {
    val = 0
  },
  i_flags = 128,
  i_acl = 0xffffffffffffffff,
  i_default_acl = 0xffffffffffffffff,
  i_op = 0xffffffffc24b5b40 <ldiskfs_file_inode_operations>,
  i_sb = 0xffffa13f77734000,
  i_mapping = 0xffffa13eae1bef18,
  i_security = 0x0,
  i_ino = 426717,
 

The creation time of inode 426717 is exact crash time. The sequence loaded at ofd was a new one and LAST_ID was created for it, instead of O/0/LAST_ID. 

# debugfs /dev/md0
debugfs 1.45.6.cr1 (14-Aug-2020)
debugfs:  stat <426717>
Inode: 426717   Type: regular    Mode:  0644   Flags: 0x84000
Generation: 1901991170    Version: 0x00000000:00000000
User:     0   Group:     0   Project:     0   Size: 8
File ACL: 0
Links: 1   Blockcount: 8
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x605a7ccc:a3bdb8b8 -- Tue Mar 23 18:42:04 2021
 atime: 0x605a7ccc:a3bdb8b8 -- Tue Mar 23 18:42:04 2021
 mtime: 0x605a7ccc:a3bdb8b8 -- Tue Mar 23 18:42:04 2021
crtime: 0x605a7ccc:a3bdb8b8 -- Tue Mar 23 18:42:04 2021
Size of extra inode fields: 32
Extended attributes:
  trusted.lma (24) = 08 00 00 00 00 00 00 00 01 00 04 00 01 00 00 00 00 00 00 00 00 00 00 00 
  lma: fid=[0x100040001:0x0:0x0] compat=8 incompat=0
EXTENTS:
(0):1193148674
debugfs:  stat O/0/LAST_ID
Inode: 96   Type: regular    Mode:  0644   Flags: 0x84000
Generation: 132363868    Version: 0x00000000:00000000
User:     0   Group:     0   Project:     0   Size: 8
File ACL: 0
Links: 1   Blockcount: 8
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x5e9f512f:d323ef10 -- Tue Apr 21 15:01:51 2020
 atime: 0x5d016f96:3b027e8c -- Wed Jun 12 16:33:10 2019
 mtime: 0x5d016f96:3b027e8c -- Wed Jun 12 16:33:10 2019
crtime: 0x5d016f96:3b027e8c -- Wed Jun 12 16:33:10 2019
Size of extra inode fields: 32
Extended attributes:
  trusted.lma (24) = 08 00 00 00 00 00 00 00 00 00 04 00 01 00 00 00 00 00 00 00 00 00 00 00 
  lma: fid=[0x100040000:0x0:0x0] compat=8 incompat=0
EXTENTS:
(0):36993
Comment by Alexander Boyko [ 14/Apr/21 ]

Related to this issue, I see one more problem,

ofd_preprw_write() has a loop for creation of missing objects, without limits with 100k like osp has. It could hang a recovery with a small numbers, and break recovery when millions object should be created. May be it is better to return old logic, when a single object is created for this write IO?

Comment by Gerrit Updater [ 28/Apr/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43248/
Subject: LU-14598 ofd: fix for IDIF sequence at ofd_preprw_write
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 747fed818be5a4e09281ab1d9fd5b3a13763ab40

Comment by Peter Jones [ 28/Apr/21 ]

Landed for 2.15

Comment by Gerrit Updater [ 04/May/21 ]

Etienne AUJAMES (eaujames@ddn.com) uploaded a new patch: https://review.whamcloud.com/43541
Subject: LU-14598 ofd: fix for IDIF sequence at ofd_preprw_write
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: f9b834eaf9d8706accd9b45f2c5670b8b908c596

Comment by Gerrit Updater [ 14/Nov/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43541/
Subject: LU-14598 ofd: fix for IDIF sequence at ofd_preprw_write
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 4a0290c6be47e3ecd8108f78961afb2e54d6fddf

Comment by Gerrit Updater [ 09/Jan/23 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49583
Subject: LU-14598 tests: skip conf-sanity test_122b in interop
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8e15572e836a75bd52e6d249b59b76f340b97de3

Comment by Gerrit Updater [ 10/Jan/23 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49586
Subject: LU-14598 tests: skip conf-sanity test_122b in interop
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 988f86ee62c39488e47dfad7392f43e7fad02d17

Comment by Gerrit Updater [ 19/Jan/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49583/
Subject: LU-14598 tests: skip conf-sanity test_122b in interop
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a9f83af896f2ce20e9ee430ad371b707c0c140cc

Comment by Gerrit Updater [ 11/Apr/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49586/
Subject: LU-14598 tests: skip conf-sanity test_122b in interop
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: 22d3efff0df2365812ce23ea22c79ce13f9379ab

Generated at Sat Feb 10 03:11:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.