Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13955

OST become readonly when test using fio with file size larger than 4G

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      When do a test with latest mater branch by using fio, I found that when file size is larger than 4G, it will cause the OST into readonly state on CentOS7.

      cat /etc/redhat-release 
      CentOS Linux release 7.6.1810 (Core) 
      
      # mkfs.lustre --fsname=lustre --mdt --mgs --index=0 --reformat /dev/sdb1
      # mkfs.lustre --fsname=lustre --ost --mgsnode=192.168.150.128@tcp --index=0 --reformat /dev/sdb2
      # mount.lustre /dev/sdb1 /mnt/lustre-mds1
      # mount.lustre /dev/sdb2 /mnt/lustre-ost1
      # mount.lustre 192.168.150.128@tcp:/lustre /mnt/lustre
      # df
      Filesystem                  1K-blocks      Used Available Use% Mounted on
      /dev/sda1                    52507040  21610852  28205936  44% /
      devtmpfs                      1917568         0   1917568   0% /dev
      tmpfs                         1930752         0   1930752   0% /dev/shm
      tmpfs                         1930752     11732   1919020   1% /run
      tmpfs                         1930752         0   1930752   0% /sys/fs/cgroup
      .host:/                     488245288 283447468 204797820  59% /mnt/hgfs
      tmpfs                          386152         0    386152   0% /run/user/0
      /dev/sdb1                      159688      1908    143972   2% /mnt/lustre-mds1
      /dev/sdb2                    17839688     46168  16833420   1% /mnt/lustre-ost1
      192.168.150.128@tcp:/lustre  17839688     46168  16833420   1% /mnt/lustre
      
      # lctl get_param version
      version=2.13.55_84_g03e6db5
      
      mkdir /mnt/lustre/qian
      fio --name=seqread --directory=/mnt/lustre/qian --filesize=5G --bs=128K --create_only=1 --numjobs=1 --create_serialize=0
      seqread: (g=0): rw=read, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=psync, iodepth=1
      fio-3.1
      Starting 1 process
      seqread: Laying out IO file (1 file / 5120MiB)
      fio: native_fallocate call failed: No space left on device
      fio: pid=10054, err=30/file:filesetup.c:184, func=ftruncate, error=Read-only file system
      
      
      
      
      Run status group 0 (all jobs):
      

      The server dump messages:

      [  150.093475] WARNING: CPU: 0 PID: 9940 at /tmp/rpmbuild-lustre-root-t8NmDyeO/BUILD/lustre-2.13.55_84_g03e6db5/ldiskfs/ext4_jbd2.c:266 __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]
      [  150.093476] Modules linked in: lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ldiskfs(OE) ipmi_devintf ipmi_msghandler vmhgfs(OE) vmw_vsock_vmci_transport vsock ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel vmw_balloon aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr sg vmw_vmci i2c_piix4 parport_pc parport ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic sr_mod cdrom crct10dif_pclmul crct10dif_common crc32c_intel serio_raw vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm e1000 nfit drm libnvdimm drm_panel_orientation_quirks mptspi scsi_transport_spi mptscsih mptbase ata_generic
      [  150.093516]  pata_acpi ata_piix libata
      [  150.093521] CPU: 0 PID: 9940 Comm: ll_ost00_002 Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.12.2.el7_lustre.2.12.55_47_gf6497eb.x86_64 #1
      [  150.093523] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
      [  150.093524] Call Trace:
      [  150.093532]  [<ffffffff90963041>] dump_stack+0x19/0x1b
      [  150.093536]  [<ffffffff902976e8>] __warn+0xd8/0x100
      [  150.093539]  [<ffffffff9029782d>] warn_slowpath_null+0x1d/0x20
      [  150.093551]  [<ffffffffc050e862>] __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]
      [  150.093561]  [<ffffffffc04ec67b>] ldiskfs_mb_mark_diskspace_used+0x2bb/0x510 [ldiskfs]
      [  150.093570]  [<ffffffffc04f0800>] ldiskfs_mb_new_blocks+0x350/0xb20 [ldiskfs]
      [  150.093581]  [<ffffffffc05186c5>] ? __read_extent_tree_block+0x55/0x1e0 [ldiskfs]
      [  150.093585]  [<ffffffff9041d9bb>] ? __kmalloc+0x1eb/0x230
      [  150.093596]  [<ffffffffc0519764>] ? ldiskfs_ext_find_extent+0x134/0x340 [ldiskfs]
      [  150.093606]  [<ffffffffc051dbf6>] ldiskfs_ext_map_blocks+0x4a6/0xf60 [ldiskfs]
      [  150.093610]  [<ffffffff90477fff>] ? has_bh_in_lru+0xf/0x50
      [  150.093620]  [<ffffffffc052286c>] ldiskfs_map_blocks+0x12c/0x6a0 [ldiskfs]
      [  150.093630]  [<ffffffffc0518c0e>] ? ldiskfs_alloc_file_blocks.isra.36+0xbe/0x2f0 [ldiskfs]
      [  150.093639]  [<ffffffffc0518c31>] ldiskfs_alloc_file_blocks.isra.36+0xe1/0x2f0 [ldiskfs]
      [  150.093648]  [<ffffffffc051fff9>] ldiskfs_fallocate+0x809/0x8a0 [ldiskfs]
      [  150.093651]  [<ffffffff904af45a>] ? __dquot_initialize+0x3a/0x240
      [  150.093656]  [<ffffffffc0321a93>] ? jbd2__journal_start+0xf3/0x1f0 [jbd2]
      [  150.093671]  [<ffffffffc0c4da23>] osd_fallocate+0x243/0x530 [osd_ldiskfs]
      [  150.093679]  [<ffffffffc0c2ff65>] ? osd_trans_start+0x235/0x4e0 [osd_ldiskfs]
      [  150.093688]  [<ffffffffc106ce28>] ofd_object_fallocate+0x538/0x780 [ofd]
      [  150.093693]  [<ffffffffc10565b1>] ofd_fallocate_hdl+0x231/0x970 [ofd]
      [  150.093742]  [<ffffffffc09d6dbf>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc]
      [  150.093789]  [<ffffffffc0a3fd0a>] tgt_request_handle+0x96a/0x1700 [ptlrpc]
      [  150.093829]  [<ffffffffc0a1a301>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
      [  150.093838]  [<ffffffffc059402e>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
      [  150.093873]  [<ffffffffc09e33f6>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
      [  150.093908]  [<ffffffffc09e29ca>] ? ptlrpc_server_handle_req_in+0x92a/0x1100 [ptlrpc]
      [  150.093912]  [<ffffffff902c2df0>] ? wake_up_atomic_t+0x30/0x30
      [  150.093946]  [<ffffffffc09e7f4c>] ptlrpc_main+0xb3c/0x14d0 [ptlrpc]
      [  150.093980]  [<ffffffffc09e7410>] ? ptlrpc_register_service+0xf90/0xf90 [ptlrpc]
      [  150.093983]  [<ffffffff902c1d21>] kthread+0xd1/0xe0
      [  150.093985]  [<ffffffff902c1c50>] ? insert_kthread_work+0x40/0x40
      [  150.093988]  [<ffffffff90975c1d>] ret_from_fork_nospec_begin+0x7/0x21
      [  150.093991]  [<ffffffff902c1c50>] ? insert_kthread_work+0x40/0x40
      [  150.093992] ---[ end trace 92c47b4354741217 ]---
      [  150.093995] LDISKFS-fs: ldiskfs_mb_mark_diskspace_used:3450: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
      [  150.094045] LDISKFS: jbd2_journal_dirty_metadata failed: handle type 0 started at line 1919, credits 41/0, errcode -28
      [  150.094087] LDISKFS-fs warning (device sdb2): ldiskfs_mb_new_blocks:5077: Updating bitmap error: [err -28] [pa ffff8d59008f6068] [phy 1441792] [logic 1146880] [len 32768] [free 32768] [error 1] [inode 233]
      [  150.094526] Quota error (device sdb2): qtree_write_dquot: dquota write failed
      [  150.094552] LDISKFS-fs error (device sdb2) in ldiskfs_write_dquot:5495: error 28
      [  150.094886] Aborting journal on device sdb2-8.
      [  150.095175] LDISKFS-fs (sdb2): Remounting filesystem read-only
      [  150.095200] LDISKFS-fs error (device sdb2) in ldiskfs_reserve_inode_write:5313: Journal has aborted
      [  150.095515] LDISKFS-fs error (device sdb2) in ldiskfs_alloc_file_blocks:4760: error 28
      [  150.095852] LDISKFS-fs error (device sdb2) in osd_trans_stop:2029: error 28
      [  150.095958] LustreError: 9933:0:(osd_handler.c:1728:osd_trans_commit_cb()) transaction @0xffff8d590a96a200 commit error: 2
      [  150.096084] LustreError: 9940:0:(osd_handler.c:2032:osd_trans_stop()) lustre-OST0000: failed to stop transaction: rc = -28
      [  152.806430] LustreError: 9940:0:(ofd_dev.c:1818:ofd_destroy_hdl()) lustre-OST0000: error destroying object [0x100000000:0x2:0x0]: -30
      

       

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              qian_wc Qian Yingjin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: