Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6722

sanity-lfsck test_1a: FAIL: (3) Fail to start LFSCK for namespace!

Details

    • 3
    • 9223372036854775807

    Description

      sanity-lfsck test 1a failed as follows:

      CMD: shadow-5vm12 /usr/sbin/lctl set_param fail_loc=0x1501
      CMD: shadow-5vm12 /usr/sbin/lctl set_param fail_loc=0
      10.1.4.53@tcp:/lustre /mnt/lustre lustre rw,flock,user_xattr 0 0
      CMD: shadow-5vm1.shadow.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts
      Stopping client shadow-5vm1.shadow.whamcloud.com /mnt/lustre (opts:)
      CMD: shadow-5vm1.shadow.whamcloud.com lsof -t /mnt/lustre
      CMD: shadow-5vm1.shadow.whamcloud.com umount  /mnt/lustre 2>&1
      CMD: shadow-5vm12 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r
      shadow-5vm12: Fail to start LFSCK: Read-only file system
       sanity-lfsck test_1a: @@@@@@ FAIL: (3) Fail to start LFSCK for namespace! 
      

      Maloo report: https://testing.hpdd.intel.com/test_sets/8520310e-108d-11e5-a2d3-5254006e85c2

      Attachments

        Issue Links

          Activity

            [LU-6722] sanity-lfsck test_1a: FAIL: (3) Fail to start LFSCK for namespace!

            The latest MDS corruption is different. The transaction was aborted because the osp_write_local_file() did not declare enough credits for write, the original failure was caused by less credits when delete/destroy. I have open a new ticket LU-7045 for the new issue.

            yong.fan nasf (Inactive) added a comment - The latest MDS corruption is different. The transaction was aborted because the osp_write_local_file() did not declare enough credits for write, the original failure was caused by less credits when delete/destroy. I have open a new ticket LU-7045 for the new issue.
            yujian Jian Yu added a comment -

            On master branch, in the following report:
            https://testing.hpdd.intel.com/sub_tests/ebb78d16-1044-11e5-8fc4-5254006e85c2

            Dmesg on MDS 1 (shadow-15vm8):

            [12947.228164] ------------[ cut here ]------------
            [12947.228652] WARNING: at /var/lib/jenkins/workspace/lustre-master/arch/x86_64/build_type/server/distro/el7/ib_stack/inkernel/BUILD/BUILD/lustre-2.7.54/ldiskfs/ext4_jbd2.c:260 __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]()
            [12947.230373] Modules linked in: osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgs(OF) mgc(OF) osd_ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) libcfs(OF) ldiskfs(OF) sha512_generic dm_mod nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ppdev pcspkr serio_raw parport_pc virtio_balloon i2c_piix4 parport ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper ttm 8139too virtio_pci 8139cp virtio_ring virtio mii drm ata_piix libata i2c_core floppy [last unloaded: libcfs]
            
            [12947.237641] CPU: 1 PID: 18882 Comm: llog_process_th Tainted: GF       W  O--------------   3.10.0-229.4.2.el7_lustre.g1fee634.x86_64 #1
            [12947.238718] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
            [12947.239582]  0000000000000000 000000005e4d00bc ffff88006d6a77f0 ffffffff816050da
            [12947.240672]  ffff88006d6a7828 ffffffff8106e34b ffff88000083fc98 ffff88007b5877e0
            [12947.241691]  ffff88005c362708 ffffffffa062745c 0000000000000325 ffff88006d6a7838
            [12947.242719] Call Trace:
            [12947.243043]  [<ffffffff816050da>] dump_stack+0x19/0x1b
            [12947.243689]  [<ffffffff8106e34b>] warn_slowpath_common+0x6b/0xb0
            [12947.244632]  [<ffffffff8106e49a>] warn_slowpath_null+0x1a/0x20
            [12947.245577]  [<ffffffffa05d26b2>] __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]
            [12947.246666]  [<ffffffffa05e30c1>] ldiskfs_getblk+0x131/0x200 [ldiskfs]
            [12947.247508]  [<ffffffffa05e31b7>] ldiskfs_bread+0x27/0xc0 [ldiskfs]
            [12947.248258]  [<ffffffffa0bca919>] osd_ldiskfs_write_record+0x169/0x350 [osd_ldiskfs]
            [12947.248974]  [<ffffffffa0bcabdf>] osd_write+0xdf/0x200 [osd_ldiskfs]
            [12947.249558]  [<ffffffffa0765399>] dt_record_write+0x39/0x120 [obdclass]
            [12947.250194]  [<ffffffffa0eed261>] osp_write_local_file.isra.16+0x284/0x30c [osp]
            [12947.250845]  [<ffffffffa0eed7ca>] osp_last_used_init+0x1e7/0xaa6 [osp]
            [12947.251417]  [<ffffffffa0ecef84>] osp_init0.isra.17+0x1b44/0x2060 [osp]
            [12947.252042]  [<ffffffffa0ecf549>] osp_device_alloc+0xa9/0x1a0 [osp]
            [12947.252606]  [<ffffffffa074b6c4>] obd_setup+0x114/0x2a0 [obdclass]
            [12947.253147]  [<ffffffffa074f430>] class_setup+0x2a0/0x820 [obdclass]
            [12947.253735]  [<ffffffffa0753b0c>] class_process_config+0x224c/0x2e50 [obdclass]
            [12947.254369]  [<ffffffffa0571967>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
            [12947.254979]  [<ffffffffa0754f6f>] class_config_llog_handler+0x85f/0x16e0 [obdclass]
            [12947.255645]  [<ffffffffa071945a>] llog_process_thread+0x7aa/0xe90 [obdclass]
            [12947.256291]  [<ffffffffa071a4f0>] ? llog_backup+0x500/0x500 [obdclass]
            [12947.256883]  [<ffffffffa071a53c>] llog_process_thread_daemonize+0x4c/0x80 [obdclass]
            [12947.257551]  [<ffffffff8109739f>] kthread+0xcf/0xe0
            [12947.258000]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
            [12947.258565]  [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0
            [12947.259066]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
            [12947.259689] ---[ end trace ddd376db706c5663 ]---
            [12947.260092] LDISKFS-fs: ldiskfs_getblk:805: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
            [12947.260934] LDISKFS-fs error (device dm-0): ldiskfs_getblk:805: inode #132: block 16827: comm llog_process_th: journal_dirty_metadata failed: handle type 0 started at line 967, credits 6/0, errcode -28
            [12947.275334] Aborting journal on device dm-0-8.
            [12947.287404] LDISKFS-fs (dm-0): Remounting filesystem read-only
            [12947.288619] LustreError: 18882:0:(osd_io.c:1645:osd_ldiskfs_write_record()) dm-0: error reading offset 84152 (block 20): rc = -28
            [12947.290466] LDISKFS-fs error (device dm-0) in osd_trans_stop:1052: error 28
            [12947.290556] LustreError: 18458:0:(osd_handler.c:829:osd_trans_commit_cb()) transaction @0xffff8800700b43c0 commit error: 2
            [12947.309035] LustreError: 18882:0:(osd_handler.c:1055:osd_trans_stop()) lustre-MDT0000-osd: failed to stop transaction: rc = -28
            [12947.310070] LustreError: 18882:0:(osp_dev.c:283:osp_init_last_objid()) lustre-OST2917-osc-MDT0000: can't initialize lov_objid: rc = -28
            [12947.311186] LustreError: 18882:0:(osp_dev.c:369:osp_last_used_init()) lustre-OST2917-osc-MDT0000: Can not get ids -28 from old objid!
            [12947.312480] LustreError: 18882:0:(obd_config.c:558:class_setup()) setup lustre-OST2917-osc-MDT0000 failed (-28)
            [12947.313339] LustreError: 18882:0:(obd_config.c:1601:class_config_llog_handler()) MGC10.1.4.179@tcp: cfg command failed: rc = -28
            [12947.314362] Lustre:    cmd=cf003 0:lustre-OST2917-osc-MDT0000  1:lustre-OST2917_UUID  2:10.1.4.178@tcp  
            
            [12947.315420] LustreError: 18477:0:(mgc_request.c:527:do_requeue()) failed processing log: -28
            [12948.576155] LustreError: 18880:0:(tgt_lastrcvd.c:591:tgt_client_new()) lustre-MDT0000: Failed to write client lcd at idx 3, rc -30
            [12953.576890] LustreError: 18480:0:(tgt_lastrcvd.c:591:tgt_client_new()) lustre-MDT0000: Failed to write client lcd at idx 3, rc -30
            
            yujian Jian Yu added a comment - On master branch, in the following report: https://testing.hpdd.intel.com/sub_tests/ebb78d16-1044-11e5-8fc4-5254006e85c2 Dmesg on MDS 1 (shadow-15vm8): [12947.228164] ------------[ cut here ]------------ [12947.228652] WARNING: at /var/lib/jenkins/workspace/lustre-master/arch/x86_64/build_type/server/distro/el7/ib_stack/inkernel/BUILD/BUILD/lustre-2.7.54/ldiskfs/ext4_jbd2.c:260 __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]() [12947.230373] Modules linked in: osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgs(OF) mgc(OF) osd_ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) libcfs(OF) ldiskfs(OF) sha512_generic dm_mod nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ppdev pcspkr serio_raw parport_pc virtio_balloon i2c_piix4 parport ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper ttm 8139too virtio_pci 8139cp virtio_ring virtio mii drm ata_piix libata i2c_core floppy [last unloaded: libcfs] [12947.237641] CPU: 1 PID: 18882 Comm: llog_process_th Tainted: GF W O-------------- 3.10.0-229.4.2.el7_lustre.g1fee634.x86_64 #1 [12947.238718] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [12947.239582] 0000000000000000 000000005e4d00bc ffff88006d6a77f0 ffffffff816050da [12947.240672] ffff88006d6a7828 ffffffff8106e34b ffff88000083fc98 ffff88007b5877e0 [12947.241691] ffff88005c362708 ffffffffa062745c 0000000000000325 ffff88006d6a7838 [12947.242719] Call Trace: [12947.243043] [<ffffffff816050da>] dump_stack+0x19/0x1b [12947.243689] [<ffffffff8106e34b>] warn_slowpath_common+0x6b/0xb0 [12947.244632] [<ffffffff8106e49a>] warn_slowpath_null+0x1a/0x20 [12947.245577] [<ffffffffa05d26b2>] __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs] [12947.246666] [<ffffffffa05e30c1>] ldiskfs_getblk+0x131/0x200 [ldiskfs] [12947.247508] [<ffffffffa05e31b7>] ldiskfs_bread+0x27/0xc0 [ldiskfs] [12947.248258] [<ffffffffa0bca919>] osd_ldiskfs_write_record+0x169/0x350 [osd_ldiskfs] [12947.248974] [<ffffffffa0bcabdf>] osd_write+0xdf/0x200 [osd_ldiskfs] [12947.249558] [<ffffffffa0765399>] dt_record_write+0x39/0x120 [obdclass] [12947.250194] [<ffffffffa0eed261>] osp_write_local_file.isra.16+0x284/0x30c [osp] [12947.250845] [<ffffffffa0eed7ca>] osp_last_used_init+0x1e7/0xaa6 [osp] [12947.251417] [<ffffffffa0ecef84>] osp_init0.isra.17+0x1b44/0x2060 [osp] [12947.252042] [<ffffffffa0ecf549>] osp_device_alloc+0xa9/0x1a0 [osp] [12947.252606] [<ffffffffa074b6c4>] obd_setup+0x114/0x2a0 [obdclass] [12947.253147] [<ffffffffa074f430>] class_setup+0x2a0/0x820 [obdclass] [12947.253735] [<ffffffffa0753b0c>] class_process_config+0x224c/0x2e50 [obdclass] [12947.254369] [<ffffffffa0571967>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [12947.254979] [<ffffffffa0754f6f>] class_config_llog_handler+0x85f/0x16e0 [obdclass] [12947.255645] [<ffffffffa071945a>] llog_process_thread+0x7aa/0xe90 [obdclass] [12947.256291] [<ffffffffa071a4f0>] ? llog_backup+0x500/0x500 [obdclass] [12947.256883] [<ffffffffa071a53c>] llog_process_thread_daemonize+0x4c/0x80 [obdclass] [12947.257551] [<ffffffff8109739f>] kthread+0xcf/0xe0 [12947.258000] [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140 [12947.258565] [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0 [12947.259066] [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140 [12947.259689] ---[ end trace ddd376db706c5663 ]--- [12947.260092] LDISKFS-fs: ldiskfs_getblk:805: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata [12947.260934] LDISKFS-fs error (device dm-0): ldiskfs_getblk:805: inode #132: block 16827: comm llog_process_th: journal_dirty_metadata failed: handle type 0 started at line 967, credits 6/0, errcode -28 [12947.275334] Aborting journal on device dm-0-8. [12947.287404] LDISKFS-fs (dm-0): Remounting filesystem read-only [12947.288619] LustreError: 18882:0:(osd_io.c:1645:osd_ldiskfs_write_record()) dm-0: error reading offset 84152 (block 20): rc = -28 [12947.290466] LDISKFS-fs error (device dm-0) in osd_trans_stop:1052: error 28 [12947.290556] LustreError: 18458:0:(osd_handler.c:829:osd_trans_commit_cb()) transaction @0xffff8800700b43c0 commit error: 2 [12947.309035] LustreError: 18882:0:(osd_handler.c:1055:osd_trans_stop()) lustre-MDT0000-osd: failed to stop transaction: rc = -28 [12947.310070] LustreError: 18882:0:(osp_dev.c:283:osp_init_last_objid()) lustre-OST2917-osc-MDT0000: can't initialize lov_objid: rc = -28 [12947.311186] LustreError: 18882:0:(osp_dev.c:369:osp_last_used_init()) lustre-OST2917-osc-MDT0000: Can not get ids -28 from old objid! [12947.312480] LustreError: 18882:0:(obd_config.c:558:class_setup()) setup lustre-OST2917-osc-MDT0000 failed (-28) [12947.313339] LustreError: 18882:0:(obd_config.c:1601:class_config_llog_handler()) MGC10.1.4.179@tcp: cfg command failed: rc = -28 [12947.314362] Lustre: cmd=cf003 0:lustre-OST2917-osc-MDT0000 1:lustre-OST2917_UUID 2:10.1.4.178@tcp [12947.315420] LustreError: 18477:0:(mgc_request.c:527:do_requeue()) failed processing log: -28 [12948.576155] LustreError: 18880:0:(tgt_lastrcvd.c:591:tgt_client_new()) lustre-MDT0000: Failed to write client lcd at idx 3, rc -30 [12953.576890] LustreError: 18480:0:(tgt_lastrcvd.c:591:tgt_client_new()) lustre-MDT0000: Failed to write client lcd at idx 3, rc -30

            We are going to need a patch for SLES12 now that it is supported.

            simmonsja James A Simmons added a comment - We are going to need a patch for SLES12 now that it is supported.
            pjones Peter Jones added a comment -

            Landed for 2.8

            pjones Peter Jones added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15334/
            Subject: LU-6722 ldiskfs: declare credits for quota when destroy inode
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e065d04c3cdcf40ae4d4fd4aee0b2548aca4d045

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15334/ Subject: LU-6722 ldiskfs: declare credits for quota when destroy inode Project: fs/lustre-release Branch: master Current Patch Set: Commit: e065d04c3cdcf40ae4d4fd4aee0b2548aca4d045

            Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/15401/
            Subject: LU-6722 jbd: double minimum journal size for RHEL7
            Project: tools/e2fsprogs
            Branch: master-lustre
            Current Patch Set:
            Commit: 15d2f58cd493a73f860a4415cac0da48c932e72a

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/15401/ Subject: LU-6722 jbd: double minimum journal size for RHEL7 Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: 15d2f58cd493a73f860a4415cac0da48c932e72a
            gerrit Gerrit Updater added a comment - - edited

            nevermind

            gerrit Gerrit Updater added a comment - - edited nevermind
            gerrit Gerrit Updater added a comment - - edited

            Deleted irrelevant comment.

            gerrit Gerrit Updater added a comment - - edited Deleted irrelevant comment.

            Fan Yong,
            there is also a generic issue that RHEL7 only allows transactions 1/2 the size of RHEL6, so in addition to your patch to fix the transaction credits for setxattr, there also needs to be a second patch to double the minimum journal size from 4MB to 8MB when running on kernels 3.10 and later.

            adilger Andreas Dilger added a comment - Fan Yong, there is also a generic issue that RHEL7 only allows transactions 1/2 the size of RHEL6, so in addition to your patch to fix the transaction credits for setxattr, there also needs to be a second patch to double the minimum journal size from 4MB to 8MB when running on kernels 3.10 and later.

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/15334
            Subject: LU-6722 ldiskfs: more credits to destroy inode with large EA
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4f54dc215328cc9dbc2cab1236db84de29afe6ad

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/15334 Subject: LU-6722 ldiskfs: more credits to destroy inode with large EA Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4f54dc215328cc9dbc2cab1236db84de29afe6ad
            yong.fan nasf (Inactive) added a comment - - edited

            The http://review.whamcloud.com/10376 mainly affects the the OSD-ldiskfs to complain that someone declare too much credits, but it is not fatal. The root reason for this trouble is inside ldiskfs itself because of not consider quota modification when ldiskfs_xattr_delete_inode().

            yong.fan nasf (Inactive) added a comment - - edited The http://review.whamcloud.com/10376 mainly affects the the OSD-ldiskfs to complain that someone declare too much credits, but it is not fatal. The root reason for this trouble is inside ldiskfs itself because of not consider quota modification when ldiskfs_xattr_delete_inode().

            People

              yong.fan nasf (Inactive)
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: