Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0
    • Lustre 2.10.5
    • None
    • 1
    • 9223372036854775807

    Description

      server keeps crashing with the following error.

      [  981.957669] Lustre: nbp13-OST0008: trigger OI scrub by RPC for the [0x100080000:0x217edd:0x0] with flags 0x4a, rc = 0
      [  981.989579] Lustre: Skipped 11 previous similar messages
      [ 1045.404615] ------------[ cut here ]------------
      [ 1045.418484] kernel BUG at /tmp/rpmbuild-lustre-jlan-ItUrr9b3/BUILD/lustre-2.10.5/ldiskfs/ldiskfs.h:1907!
      [ 1045.446989] invalid opcode: 0000 [#1] SMP 
      [ 1045.459302] Modules linked in: ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) dm_service_time ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) lpfc ib_iser(OE) libiscsi scsi_transport_iscsi crct10dif_generic scsi_transport_fc scsi_tgt rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) bonding ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) sunrpc dm_mirror dm_region_hash dm_log mlx5_ib(OE) ib_core(OE) intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul mgag200 ghash_clmulni_intel i2c_algo_bit ttm dm_multipath aesni_intel drm_kms_helper lrw syscopyarea gf128mul sysfillrect sysimgblt glue_helper fb_sys_fops ablk_helper mlx5_core(OE) mlxfw(OE) tg3 ses cryptd mlx_compat(OE) drm ptp ipmi_si enclosure mei_me i2c_core pps_core hpwdt hpilo ipmi_devintf lpc_ich dm_mod mfd_core mei shpchp pcspkr wmi ipmi_msghandler acpi_power_meter binfmt_misc tcp_bic ip_tables virtio_scsi virtio_ring virtio xfs libcrc32c ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_common sg usb_storage smartpqi(E) crc32c_intel scsi_transport_sas [last unloaded: pps_core]
      [ 1045.776428] CPU: 5 PID: 11348 Comm: lfsck Tainted: G           OE  ------------   3.10.0-693.21.1.el7.20180508.x86_64.lustre2105 #1
      [ 1045.811992] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 06/15/2018
      [ 1045.837624] task: ffff882ddca23f40 ti: ffff882bd280c000 task.ti: ffff882bd280c000
      [ 1045.860117] RIP: 0010:[<ffffffffa10fbd04>]  [<ffffffffa10fbd04>] ldiskfs_rec_len_to_disk.part.9+0x4/0x10 [ldiskfs]
      [ 1045.891259] RSP: 0018:ffff882bd280f980  EFLAGS: 00010207
      [ 1045.907218] RAX: 0000000000000000 RBX: ffff882bd280fb58 RCX: ffff882bd280f994
      [ 1045.928666] RDX: 00000000ffffffac RSI: ffffffffffffff81 RDI: 00000000ffffff81
      [ 1045.950113] RBP: ffff882bd280f980 R08: 00000000ffffff81 R09: ffffffffa10fded0
      [ 1045.971560] R10: ffff88303f803b00 R11: 0000000000ffffff R12: 000000000000003c
      [ 1045.993006] R13: ffff881e2eae7708 R14: ffff881e2eae7690 R15: 0000000000000000
      [ 1046.014452] FS:  0000000000000000(0000) GS:ffff882f7ef40000(0000) knlGS:0000000000000000
      [ 1046.038775] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1046.056039] CR2: 00007ffff20df034 CR3: 0000002ef4268000 CR4: 00000000003607e0
      [ 1046.077485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1046.098932] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 1046.120378] Call Trace:
      [ 1046.127717]  [<ffffffffa10fe245>] htree_inlinedir_to_tree+0x445/0x450 [ldiskfs]
      [ 1046.149690]  [<ffffffff8123002e>] ? __generic_file_splice_read+0x4ee/0x5e0
      [ 1046.170356]  [<ffffffff81234cdd>] ? __getblk+0x2d/0x2e0
      [ 1046.186052]  [<ffffffff81234c4c>] ? __find_get_block+0xbc/0x120
      [ 1046.203841]  [<ffffffff81234cdd>] ? __getblk+0x2d/0x2e0
      [ 1046.219541]  [<ffffffffa10cdfa0>] ? __ldiskfs_get_inode_loc+0x110/0x3e0 [ldiskfs]
      [ 1046.242039]  [<ffffffffa10c89ef>] ? ldiskfs_xattr_find_entry+0x9f/0x130 [ldiskfs]
      [ 1046.264536]  [<ffffffffa10c0277>] ldiskfs_htree_fill_tree+0x137/0x2f0 [ldiskfs]
      [ 1046.286507]  [<ffffffff811df826>] ? kmem_cache_alloc_trace+0x1d6/0x200
      [ 1046.306126]  [<ffffffffa10ae5ec>] ldiskfs_readdir+0x61c/0x850 [ldiskfs]
      [ 1046.326012]  [<ffffffffa1147640>] ? osd_declare_ref_del+0x130/0x130 [osd_ldiskfs]
      [ 1046.348507]  [<ffffffff812256b2>] ? generic_getxattr+0x52/0x70
      [ 1046.366036]  [<ffffffffa1145cde>] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs]
      [ 1046.387747]  [<ffffffffa1145eb7>] osd_it_ea_load+0x37/0x100 [osd_ldiskfs]
      [ 1046.408158]  [<ffffffffa122808c>] lfsck_open_dir+0x11c/0x3a0 [lfsck]
      [ 1046.427257]  [<ffffffffa1228cb2>] lfsck_master_oit_engine+0x9a2/0x1190 [lfsck]
      [ 1046.448969]  [<ffffffff816946f7>] ? __schedule+0x477/0xa30
      [ 1046.465453]  [<ffffffffa1229d96>] lfsck_master_engine+0x8f6/0x1360 [lfsck]
      [ 1046.486120]  [<ffffffff810c4d40>] ? wake_up_state+0x20/0x20
      [ 1046.502865]  [<ffffffffa12294a0>] ? lfsck_master_oit_engine+0x1190/0x1190 [lfsck]
      [ 1046.525360]  [<ffffffff810b1131>] kthread+0xd1/0xe0
      [ 1046.540011]  [<ffffffff810b1060>] ? insert_kthread_work+0x40/0x40
      [ 1046.558323]  [<ffffffff816a14dd>] ret_from_fork+0x5d/0xb0
      [ 1046.574540]  [<ffffffff810b1060>] ? insert_kthread_work+0x40/0x40
      [ 1046.592852] Code: 44 04 02 48 8d 44 03 c8 48 01 c7 e8 b7 f6 22 e0 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 0f 0b 0f 1f 40 00 55 48 89 e5 <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 85 f6 48 
      [ 1046.650192] RIP  [<ffffffffa10fbd04>] ldiskfs_rec_len_to_disk.part.9+0x4/0x10 [ldiskfs]
      
      

      Attachments

        1. debug-lfsck-nbp15-MDT0000.gz
          60 kB
        2. dumpe2fs.out
          36 kB
        3. nbp13.debug.gz
          24.76 MB
        4. nbp13.lfsck.debug.out1.gz
          297 kB
        5. nbp13.lfsck.debug.out2.gz
          4 kB
        6. oi_scrub.out
          6 kB

        Issue Links

          Activity

            [LU-11584] kernel BUG at ldiskfs.h:1907!

            I was able to find all the inodes with bad LMA and delete them via ldiskfs. So what we have left are files that trigger OI scrub and that report "?" for size/uid/etc. The user has been able recover all the effected files, so we just need a way to delete the files.

            If we delete the files via ldiskfs how can we make sure that the objects will be cleaned up.

             

            mhanafi Mahmoud Hanafi added a comment - I was able to find all the inodes with bad LMA and delete them via ldiskfs. So what we have left are files that trigger OI scrub and that report "?" for size/uid/etc. The user has been able recover all the effected files, so we just need a way to delete the files. If we delete the files via ldiskfs how can we make sure that the objects will be cleaned up.  
            bzzz Alex Zhuravlev added a comment - - edited

            I still don't understand why the nbp13 log doesn't contain "unsupported incompat LMA feature" message.

            bzzz Alex Zhuravlev added a comment - - edited I still don't understand why the nbp13 log doesn't contain "unsupported incompat LMA feature" message.

            I'm modifying the test to simulate additional broken LinkEA, going to report results ASAP.

            bzzz Alex Zhuravlev added a comment - I'm modifying the test to simulate additional broken LinkEA, going to report results ASAP.

            Any comments on the output of nbp13.lfsck?

            mhanafi Mahmoud Hanafi added a comment - Any comments on the output of nbp13.lfsck?

            Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/33576
            Subject: LU-11584 e2fsck: check xattr 'system.data' before setting inline_data feature
            Project: tools/e2fsprogs
            Branch: master-lustre
            Current Patch Set: 1
            Commit: 64b71635ffa84a01946199e3cd31b1ee9fd9a15f

            gerrit Gerrit Updater added a comment - Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/33576 Subject: LU-11584 e2fsck: check xattr 'system.data' before setting inline_data feature Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: 1 Commit: 64b71635ffa84a01946199e3cd31b1ee9fd9a15f

            Here is the nbp13 lfsck runs.

             nbp13-srv1 ~ # lctl get_param -n  mdd.*.lfsck_namespace
            name: lfsck_namespace
            magic: 0xa0621a0b
            version: 2
            status: completed
            flags: inconsistent
            param: dryrun
            last_completed_time: 1541281433
            time_since_last_completed: 341 seconds
            latest_start_time: 1541281072
            time_since_latest_start: 702 seconds
            last_checkpoint_time: 1541281433
            time_since_last_checkpoint: 341 seconds
            latest_start_position: 77, N/A, N/A
            last_checkpoint_position: 317719759, N/A, N/A
            first_failure_position: 153388517, [0x2000020af:0x39d9:0x0], 0x753a410c57f07b3
            checked_phase1: 30987846
            checked_phase2: 111
            inconsistent_phase1: 2
            inconsistent_phase2: 3
            failed_phase1: 21
            failed_phase2: 3
            directories: 2709152
            dirent_inconsistent: 0
            linkea_inconsistent: 2
            nlinks_inconsistent: 0
            multiple_linked_checked: 5
            multiple_linked_inconsistent: 0
            unknown_inconsistency: 0
            unmatched_pairs_inconsistent: 0
            dangling_inconsistent: 0
            multiple_referenced_inconsistent: 3
            bad_file_type_inconsistent: 0
            lost_dirent_inconsistent: 0
            local_lost_found_scanned: 3
            local_lost_found_moved: 3
            local_lost_found_skipped: 0
            local_lost_found_failed: 0
            striped_dirs_scanned: 0
            striped_dirs_inconsistent: 0
            striped_dirs_failed: 0
            striped_dirs_disabled: 0
            striped_dirs_skipped: 0
            striped_shards_scanned: 0
            striped_shards_inconsistent: 0
            striped_shards_failed: 0
            striped_shards_skipped: 0
            name_hash_inconsistent: 0
            linkea_overflow_inconsistent: 0
            success_count: 3
            run_time_phase1: 362 seconds
            run_time_phase2: 0 seconds
            average_speed_phase1: 85601 items/sec
            average_speed_phase2: 111 objs/sec
            average_speed_total: 85366 items/sec
            real_time_speed_phase1: N/A
            real_time_speed_phase2: N/A
            current_position: N/A
            

            nbp13.lfsck.debug.out2.gz nbp13.lfsck.debug.out1.gz

            mhanafi Mahmoud Hanafi added a comment - Here is the nbp13 lfsck runs. nbp13-srv1 ~ # lctl get_param -n mdd.*.lfsck_namespace name: lfsck_namespace magic: 0xa0621a0b version: 2 status: completed flags: inconsistent param: dryrun last_completed_time: 1541281433 time_since_last_completed: 341 seconds latest_start_time: 1541281072 time_since_latest_start: 702 seconds last_checkpoint_time: 1541281433 time_since_last_checkpoint: 341 seconds latest_start_position: 77, N/A, N/A last_checkpoint_position: 317719759, N/A, N/A first_failure_position: 153388517, [0x2000020af:0x39d9:0x0], 0x753a410c57f07b3 checked_phase1: 30987846 checked_phase2: 111 inconsistent_phase1: 2 inconsistent_phase2: 3 failed_phase1: 21 failed_phase2: 3 directories: 2709152 dirent_inconsistent: 0 linkea_inconsistent: 2 nlinks_inconsistent: 0 multiple_linked_checked: 5 multiple_linked_inconsistent: 0 unknown_inconsistency: 0 unmatched_pairs_inconsistent: 0 dangling_inconsistent: 0 multiple_referenced_inconsistent: 3 bad_file_type_inconsistent: 0 lost_dirent_inconsistent: 0 local_lost_found_scanned: 3 local_lost_found_moved: 3 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_inconsistent: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 0 striped_shards_inconsistent: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_inconsistent: 0 linkea_overflow_inconsistent: 0 success_count: 3 run_time_phase1: 362 seconds run_time_phase2: 0 seconds average_speed_phase1: 85601 items/sec average_speed_phase2: 111 objs/sec average_speed_total: 85366 items/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A nbp13.lfsck.debug.out2.gz nbp13.lfsck.debug.out1.gz

            I did see something interesting in the debug log... One of the files that LFSCK complained about was:

            osd_handler.c:6401:osd_dirent_check_repair()) nbp15-MDT0000: the target inode does not recognize the dirent, dir = 237857984/19940587,  name = kplr011027624-2012004120508_llc.fits, ino = 237860402, [0x2000013af:0x8f13:0x0]: rc = -61
            osd_handler.c:6401:osd_dirent_check_repair()) nbp15-MDT0000: the target inode does not recognize the dirent, dir = 238340766/19942571,  name = kplr005385471-2009259160929_llc.fits, ino = 238345690, [0x2000013ae:0x8f1c:0x0]: rc = -61
            

            The filenames both end in "llc.fits" which is the same ASCII string that was corrupting the LMA FID. This is returning "-61 = -ENODATA" which Alex's patch is supposed to do when it finds a corrupted LMA FID, but it doesn't look like it repaired them:

                    rc = osd_get_lma(info, inode, dentry, &info->oti_ost_attrs);
                    if (rc == -ENODATA || !fid_is_sane(&lma->lma_self_fid))
                            lma = NULL;
                    :
                    :
                    if (!fid_is_zero(fid)) {
                            rc = osd_verify_ent_by_linkea(env, inode, pfid, ent->oied_name,
                                                          ent->oied_namelen);
                            if (rc == -ENOENT ||
                                (rc == -ENODATA &&
                                 !(dev->od_scrub.os_scrub.os_file.sf_flags & SF_UPGRADE))) {
                                    /*
                                     * linkEA does not recognize the dirent entry,
                                     * it may because the dirent entry corruption
                                     * and points to other's inode.
                                     */
                                    CDEBUG(D_LFSCK, "%s: the target inode does not "
                                           "recognize the dirent, dir = %lu/%u, "
                                           " name = %.*s, ino = %llu, "
                                           DFID": rc = %d\n", devname, dir->i_ino,
                                           dir->i_generation, ent->oied_namelen,
                                           ent->oied_name, ent->oied_ino, PFID(fid), rc);
                                    *attr |= LUDA_UNKNOWN;
            
                                    GOTO(out, rc = 0);
                            }
            
            

            I'd suspect that this is because the linkEA ("link" xattr which is also stored in the inode) is also missing? It looks like we need to set the SF_UPGRADE flag (maybe renamed to "SF_REBUILD_LMA") if the LMA has been removed (rc = -ENODATA) so that we fall through to the LMA repair code further down? We can't check for the LMAC_INIT_FID flag, since it is stored in the LMA itself, which is missing here.

            adilger Andreas Dilger added a comment - I did see something interesting in the debug log... One of the files that LFSCK complained about was: osd_handler.c:6401:osd_dirent_check_repair()) nbp15-MDT0000: the target inode does not recognize the dirent, dir = 237857984/19940587, name = kplr011027624-2012004120508_llc.fits, ino = 237860402, [0x2000013af:0x8f13:0x0]: rc = -61 osd_handler.c:6401:osd_dirent_check_repair()) nbp15-MDT0000: the target inode does not recognize the dirent, dir = 238340766/19942571, name = kplr005385471-2009259160929_llc.fits, ino = 238345690, [0x2000013ae:0x8f1c:0x0]: rc = -61 The filenames both end in " llc.fits " which is the same ASCII string that was corrupting the LMA FID. This is returning " -61 = -ENODATA " which Alex's patch is supposed to do when it finds a corrupted LMA FID, but it doesn't look like it repaired them: rc = osd_get_lma(info, inode, dentry, &info->oti_ost_attrs); if (rc == -ENODATA || !fid_is_sane(&lma->lma_self_fid)) lma = NULL; : : if (!fid_is_zero(fid)) { rc = osd_verify_ent_by_linkea(env, inode, pfid, ent->oied_name, ent->oied_namelen); if (rc == -ENOENT || (rc == -ENODATA && !(dev->od_scrub.os_scrub.os_file.sf_flags & SF_UPGRADE))) { /* * linkEA does not recognize the dirent entry, * it may because the dirent entry corruption * and points to other's inode. */ CDEBUG(D_LFSCK, "%s: the target inode does not " "recognize the dirent, dir = %lu/%u, " " name = %.*s, ino = %llu, " DFID ": rc = %d\n" , devname, dir->i_ino, dir->i_generation, ent->oied_namelen, ent->oied_name, ent->oied_ino, PFID(fid), rc); *attr |= LUDA_UNKNOWN; GOTO(out, rc = 0); } I'd suspect that this is because the linkEA (" link " xattr which is also stored in the inode) is also missing? It looks like we need to set the SF_UPGRADE flag (maybe renamed to " SF_REBUILD_LMA ") if the LMA has been removed (rc = -ENODATA) so that we fall through to the LMA repair code further down? We can't check for the LMAC_INIT_FID flag, since it is stored in the LMA itself, which is missing here.
            nbp13-srv1 ~ # objid=`printf "%i" 0x2155af`
            nbp13-srv1 ~ # debugfs -c -R "stat O/0/d$((objid % 32))/$objid" /dev/mapper/nbp13_1-OST1
            

            FYI, if you have the hex value for the object ID, you could directly use:

            debugfs -c -R "stat O/0/d$((0x2155af % 32))/$((0x2155af))" /dev/mapper/nbp13_1-OST1
            

            In any case, what is strange is that this is object ID being looked up is 0x2155af, but the object that is found reports itself to be 0x2155ae:

            Extended attributes:
              lma: fid=[0x100010000:0x2155ae:0x0] compat=8 incompat=0
            

            Based on the fid2path output, it looks like this object is actually 0x2155ae, so it should be renamed from "/O/0/d15/218463" to "/O/0/d14/2184622". It isn't clear why OI Scrub is not repairing this automatically.

            adilger Andreas Dilger added a comment - nbp13-srv1 ~ # objid=`printf "%i" 0x2155af` nbp13-srv1 ~ # debugfs -c -R "stat O/0/d$((objid % 32))/$objid" /dev/mapper/nbp13_1-OST1 FYI, if you have the hex value for the object ID, you could directly use: debugfs -c -R "stat O/0/d$((0x2155af % 32))/$((0x2155af))" /dev/mapper/nbp13_1-OST1 In any case, what is strange is that this is object ID being looked up is 0x2155af , but the object that is found reports itself to be 0x2155ae : Extended attributes: lma: fid=[0x100010000:0x2155ae:0x0] compat=8 incompat=0 Based on the fid2path output, it looks like this object is actually 0x2155ae , so it should be renamed from " /O/0/d15/218463 " to " /O/0/d14/2184622 ". It isn't clear why OI Scrub is not repairing this automatically.

            i ran lfsck on nbp15 which has the same issues as 13. We are planing on reformatting it.

             

            debug-lfsck-nbp15-MDT0000.gz

            mhanafi Mahmoud Hanafi added a comment - i ran lfsck on nbp15 which has the same issues as 13. We are planing on reformatting it.   debug-lfsck-nbp15-MDT0000.gz

            We haven't ran the new code but here is one more example: Is this bad lma on the OST object?

            [325981.396812] Lustre: Skipped 3 previous similar messages
            [326747.450553] Lustre: nbp13-OST0001: trigger OI scrub by RPC for the [0x100010000:0x2155af:0x0] with flags 0x4a, rc = 0
            [326747.482740] Lustre: Skipped 3 previous similar messages
            [327512.978588] Lustre: nbp13-OST0001: trigger OI scrub by RPC for the [0x100010000:0x2155af:0x0] with flags 0x4a, rc = 0
            [327513.010762] Lustre: Skipped 3 previous similar messages
            [328279.688198] Lustre: nbp13-OST0001: trigger OI scrub by RPC for the [0x100010000:0x2155af:0x0] with flags 0x4a, rc = 0
            [328279.720378] Lustre: Skipped 3 previous similar messages
            nbp13-srv1 ~ # objid=`printf "%i" 0x2155af`
            nbp13-srv1 ~ # debugfs -c -R "stat O/0/d$((objid % 32))/$objid" /dev/mapper/nbp13_1-OST1
            debugfs 1.44.3.wc1 (23-July-2018)
            /dev/mapper/nbp13_1-OST1: catastrophic mode - not reading inode or group bitmaps
            Inode: 1673602   Type: regular    Mode:  0666   Flags: 0x80000
            Generation: 2828099384    Version: 0x00000003:005e7593
            User: 30757   Group: 41548   Project:     0   Size: 2180
            File ACL: 0
            Links: 2   Blockcount: 8
            Fragment:  Address: 0    Number: 0    Size: 0
             ctime: 0x5bd11ce7:00000000 -- Wed Oct 24 18:31:19 2018
             atime: 0x5bd11ce8:00000000 -- Wed Oct 24 18:31:20 2018
             mtime: 0x5bd11ce7:00000000 -- Wed Oct 24 18:31:19 2018
            crtime: 0x5bd11c77:03872348 -- Wed Oct 24 18:29:27 2018
            Size of extra inode fields: 32
            Extended attributes:
              trusted.lma (24) = 08 00 00 00 00 00 00 00 00 00 01 00 01 00 00 00 ae 55 21 00 00 00 00 00 
              lma: fid=[0x100010000:0x2155ae:0x0] compat=8 incompat=0
              trusted.fid (44)
              fid: parent=[0x200002101:0x66b8:0x0] stripe=0 stripe_size=1048576 stripe_count=1 component_id=1 component_start=0 component_end=8388608
            EXTENTS:
            (0):3426781548
            
            
            
            tpfe2 ~ # lfs fid2path /nobackupp13 0x200002101:0x66b8:0x0
            /nobackupp13/quarantine/spocops/git/sector/spoc/code/dist/dist/classes/java/main/gov/nasa/tess/dv/outputs/DvAbstractTargetTableData$Builder.class
            
            tpfe2 ~ # ls -l /nobackupp13/quarantine/spocops/git/sector/spoc/code/dist/dist/classes/java/main/gov/nasa/tess/dv/outputs/DvAbstractTargetTableData
            ls: cannot access '/nobackupp13/quarantine/spocops/git/sector/spoc/code/dist/dist/classes/java/main/gov/nasa/tess/dv/outputs/DvAbstractTargetTableData': No such file or directory
            
            mhanafi Mahmoud Hanafi added a comment - We haven't ran the new code but here is one more example: Is this bad lma on the OST object? [325981.396812] Lustre: Skipped 3 previous similar messages [326747.450553] Lustre: nbp13-OST0001: trigger OI scrub by RPC for the [0x100010000:0x2155af:0x0] with flags 0x4a, rc = 0 [326747.482740] Lustre: Skipped 3 previous similar messages [327512.978588] Lustre: nbp13-OST0001: trigger OI scrub by RPC for the [0x100010000:0x2155af:0x0] with flags 0x4a, rc = 0 [327513.010762] Lustre: Skipped 3 previous similar messages [328279.688198] Lustre: nbp13-OST0001: trigger OI scrub by RPC for the [0x100010000:0x2155af:0x0] with flags 0x4a, rc = 0 [328279.720378] Lustre: Skipped 3 previous similar messages nbp13-srv1 ~ # objid=`printf "%i" 0x2155af` nbp13-srv1 ~ # debugfs -c -R "stat O/0/d$((objid % 32))/$objid" /dev/mapper/nbp13_1-OST1 debugfs 1.44.3.wc1 (23-July-2018) /dev/mapper/nbp13_1-OST1: catastrophic mode - not reading inode or group bitmaps Inode: 1673602 Type: regular Mode: 0666 Flags: 0x80000 Generation: 2828099384 Version: 0x00000003:005e7593 User: 30757 Group: 41548 Project: 0 Size: 2180 File ACL: 0 Links: 2 Blockcount: 8 Fragment: Address: 0 Number : 0 Size: 0 ctime: 0x5bd11ce7:00000000 -- Wed Oct 24 18:31:19 2018 atime: 0x5bd11ce8:00000000 -- Wed Oct 24 18:31:20 2018 mtime: 0x5bd11ce7:00000000 -- Wed Oct 24 18:31:19 2018 crtime: 0x5bd11c77:03872348 -- Wed Oct 24 18:29:27 2018 Size of extra inode fields: 32 Extended attributes: trusted.lma (24) = 08 00 00 00 00 00 00 00 00 00 01 00 01 00 00 00 ae 55 21 00 00 00 00 00 lma: fid=[0x100010000:0x2155ae:0x0] compat=8 incompat=0 trusted.fid (44) fid: parent=[0x200002101:0x66b8:0x0] stripe=0 stripe_size=1048576 stripe_count=1 component_id=1 component_start=0 component_end=8388608 EXTENTS: (0):3426781548 tpfe2 ~ # lfs fid2path /nobackupp13 0x200002101:0x66b8:0x0 /nobackupp13/quarantine/spocops/git/sector/spoc/code/dist/dist/classes/java/main/gov/nasa/tess/dv/outputs/DvAbstractTargetTableData$Builder.class tpfe2 ~ # ls -l /nobackupp13/quarantine/spocops/git/sector/spoc/code/dist/dist/classes/java/main/gov/nasa/tess/dv/outputs/DvAbstractTargetTableData ls: cannot access '/nobackupp13/quarantine/spocops/git/sector/spoc/code/dist/dist/classes/java/main/gov/nasa/tess/dv/outputs/DvAbstractTargetTableData' : No such file or directory

            Thanks for the update, Andreas and Alex~

            jaylan Jay Lan (Inactive) added a comment - Thanks for the update, Andreas and Alex~

            People

              bzzz Alex Zhuravlev
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: