Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13061

osd_fid_lookup()) ASSERTION( fid_is_sane(fid) || fid_is_idif(fid) ) failed: [0x0:0x68:0x0]

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.14.0, Lustre 2.12.4
    • Lustre 2.12.3
    • None
    • 3
    • 9223372036854775807

    Description

      If a system has hit LU-12593 with a corrupt block in the llog file, it may trigger an LASSERT() because of a bad FID found in the unintialized part of the block. Applying the patch from that ticket is too late to fix the problem.

      LustreError: 17438:0:(osd_handler.c:1077:osd_fid_lookup()) ASSERTION( fid_is_sane(fid) || fid_is_idif(fid) ) failed: [0x0:0x68:0x0]
      LustreError: 17438:0:(osd_handler.c:1077:osd_fid_lookup()) LBUG
      Pid: 17438, comm: llog_process_th 3.10.0-1062.1.1.el7_lustre.x86_64
      Call Trace:
       libcfs_call_trace+0x8c/0xc0 [libcfs]
       lbug_with_loc+0x4c/0xa0 [libcfs]
       osd_fid_lookup+0xc8/0x1c60 [osd_ldiskfs]
       osd_object_init+0x61/0x110 [osd_ldiskfs]
       lu_object_start.isra.35+0x8b/0x120 [obdclass]  
       lu_object_find_at+0x1e1/0xa60 [obdclass]
       dt_locate_at+0x1d/0xb0 [obdclass]
       llog_osd_open+0x50e/0xf30 [obdclass]
       llog_open+0x15a/0x3e0 [obdclass]
       osp_sync_init+0x44a/0xe20 [osp]
       osp_init0.isra.19+0x1aed/0x1f60 [osp]
       osp_device_alloc+0x86/0x130 [osp]
       obd_setup+0x119/0x280 [obdclass]
       class_setup+0x2a8/0x840 [obdclass]
       class_process_config+0x1726/0x2830 [obdclass]
       class_config_llog_handler+0x819/0x1520 [obdclass]
       llog_process_thread+0x82f/0x18e0 [obdclass]
       llog_process_thread_daemonize+0x9f/0xe0 [obdclass]
      

      Lustre should avoid ever triggering an LASSERT() on data read from disk or from the network. In this case, it probably makes sense to add a check in llog_osd_open() with fid_is_sane() before it uses the FID, and just return an error rather than crashing.

      Attachments

        Issue Links

          Activity

            [LU-13061] osd_fid_lookup()) ASSERTION( fid_is_sane(fid) || fid_is_idif(fid) ) failed: [0x0:0x68:0x0]

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37185/
            Subject: LU-13061 osp: check catlog FID after reading in
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 055eab6bd4c29bc961a10824ffa44323cce7640c

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37185/ Subject: LU-13061 osp: check catlog FID after reading in Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 055eab6bd4c29bc961a10824ffa44323cce7640c

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37185
            Subject: LU-13061 osp: check catlog FID after reading in
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 0d2e23757ef6663f8b06b935253835937d8ca1f3

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37185 Subject: LU-13061 osp: check catlog FID after reading in Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 0d2e23757ef6663f8b06b935253835937d8ca1f3
            pjones Peter Jones added a comment -

            Landed for 2.14

            pjones Peter Jones added a comment - Landed for 2.14

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36998/
            Subject: LU-13061 osp: check catlog FID after reading in
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 4597fa7d884de0f1a1b030052d4d34983fed6109

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36998/ Subject: LU-13061 osp: check catlog FID after reading in Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4597fa7d884de0f1a1b030052d4d34983fed6109

            Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36998
            Subject: LU-13061 osp: check catlog FID after reading in
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 7a5df8c9c0fa2cab2135e68e305a9a850263d720

            gerrit Gerrit Updater added a comment - Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36998 Subject: LU-13061 osp: check catlog FID after reading in Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 7a5df8c9c0fa2cab2135e68e305a9a850263d720

            the OID field is "0x400000068", but is truncated to "0x68" during assigning it to lu_fid.f_oid(32bits).
            the data seems to be a normal FID (f_seq=0x400000068, f_oid=0, f_ver=0)

            No, the 0x400000068 field is the oi_id field (mapped to f_oid), and the f_seq field is 0x0, which is what triggers the LASSERT. Valid records look like the following with oi_seq = f_seq = 0x1 = FID_SEQ_LLOG:

            000080 00000000000004ef 0000000000000001
            000090 0000000000000000 0000000000000000
            

            In this case, the FID is [0x1:0x4ef:0x0] and would map to object O/1/d15/1263 on the MDT.

            adilger Andreas Dilger added a comment - the OID field is "0x400000068", but is truncated to "0x68" during assigning it to lu_fid.f_oid(32bits). the data seems to be a normal FID (f_seq=0x400000068, f_oid=0, f_ver=0) No, the 0x400000068 field is the oi_id field (mapped to f_oid ), and the f_seq field is 0x0 , which is what triggers the LASSERT. Valid records look like the following with oi_seq = f_seq = 0x1 = FID_SEQ_LLOG : 000080 00000000000004ef 0000000000000001 000090 0000000000000000 0000000000000000 In this case, the FID is [0x1:0x4ef:0x0] and would map to object O/1/d15/1263 on the MDT.

            as per the dump of the "CATALOGS", the content of the "llog_catid" is

            000020 0000000400000068 0000000000000000
            000030 0000000000000000 0000000000000000
            

            the OID field is "0x400000068", but is truncated to "0x68" during assigning it to lu_fid.f_oid(32bits).
            the data seems to be a normal FID (f_seq=0x400000068, f_oid=0, f_ver=0)

            hongchao.zhang Hongchao Zhang added a comment - as per the dump of the "CATALOGS", the content of the "llog_catid" is 000020 0000000400000068 0000000000000000 000030 0000000000000000 0000000000000000 the OID field is "0x400000068", but is truncated to "0x68" during assigning it to lu_fid.f_oid(32bits). the data seems to be a normal FID (f_seq=0x400000068, f_oid=0, f_ver=0)

            People

              hongchao.zhang Hongchao Zhang
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: