Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
When do a test with latest mater branch by using fio, I found that when file size is larger than 4G, it will cause the OST into readonly state on CentOS7.
cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core)
# mkfs.lustre --fsname=lustre --mdt --mgs --index=0 --reformat /dev/sdb1 # mkfs.lustre --fsname=lustre --ost --mgsnode=192.168.150.128@tcp --index=0 --reformat /dev/sdb2 # mount.lustre /dev/sdb1 /mnt/lustre-mds1 # mount.lustre /dev/sdb2 /mnt/lustre-ost1 # mount.lustre 192.168.150.128@tcp:/lustre /mnt/lustre # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 52507040 21610852 28205936 44% / devtmpfs 1917568 0 1917568 0% /dev tmpfs 1930752 0 1930752 0% /dev/shm tmpfs 1930752 11732 1919020 1% /run tmpfs 1930752 0 1930752 0% /sys/fs/cgroup .host:/ 488245288 283447468 204797820 59% /mnt/hgfs tmpfs 386152 0 386152 0% /run/user/0 /dev/sdb1 159688 1908 143972 2% /mnt/lustre-mds1 /dev/sdb2 17839688 46168 16833420 1% /mnt/lustre-ost1 192.168.150.128@tcp:/lustre 17839688 46168 16833420 1% /mnt/lustre # lctl get_param version version=2.13.55_84_g03e6db5
mkdir /mnt/lustre/qian fio --name=seqread --directory=/mnt/lustre/qian --filesize=5G --bs=128K --create_only=1 --numjobs=1 --create_serialize=0 seqread: (g=0): rw=read, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=psync, iodepth=1 fio-3.1 Starting 1 process seqread: Laying out IO file (1 file / 5120MiB) fio: native_fallocate call failed: No space left on device fio: pid=10054, err=30/file:filesetup.c:184, func=ftruncate, error=Read-only file system Run status group 0 (all jobs):
The server dump messages:
[ 150.093475] WARNING: CPU: 0 PID: 9940 at /tmp/rpmbuild-lustre-root-t8NmDyeO/BUILD/lustre-2.13.55_84_g03e6db5/ldiskfs/ext4_jbd2.c:266 __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs] [ 150.093476] Modules linked in: lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ldiskfs(OE) ipmi_devintf ipmi_msghandler vmhgfs(OE) vmw_vsock_vmci_transport vsock ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel vmw_balloon aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr sg vmw_vmci i2c_piix4 parport_pc parport ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic sr_mod cdrom crct10dif_pclmul crct10dif_common crc32c_intel serio_raw vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm e1000 nfit drm libnvdimm drm_panel_orientation_quirks mptspi scsi_transport_spi mptscsih mptbase ata_generic [ 150.093516] pata_acpi ata_piix libata [ 150.093521] CPU: 0 PID: 9940 Comm: ll_ost00_002 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.12.2.el7_lustre.2.12.55_47_gf6497eb.x86_64 #1 [ 150.093523] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 [ 150.093524] Call Trace: [ 150.093532] [<ffffffff90963041>] dump_stack+0x19/0x1b [ 150.093536] [<ffffffff902976e8>] __warn+0xd8/0x100 [ 150.093539] [<ffffffff9029782d>] warn_slowpath_null+0x1d/0x20 [ 150.093551] [<ffffffffc050e862>] __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs] [ 150.093561] [<ffffffffc04ec67b>] ldiskfs_mb_mark_diskspace_used+0x2bb/0x510 [ldiskfs] [ 150.093570] [<ffffffffc04f0800>] ldiskfs_mb_new_blocks+0x350/0xb20 [ldiskfs] [ 150.093581] [<ffffffffc05186c5>] ? __read_extent_tree_block+0x55/0x1e0 [ldiskfs] [ 150.093585] [<ffffffff9041d9bb>] ? __kmalloc+0x1eb/0x230 [ 150.093596] [<ffffffffc0519764>] ? ldiskfs_ext_find_extent+0x134/0x340 [ldiskfs] [ 150.093606] [<ffffffffc051dbf6>] ldiskfs_ext_map_blocks+0x4a6/0xf60 [ldiskfs] [ 150.093610] [<ffffffff90477fff>] ? has_bh_in_lru+0xf/0x50 [ 150.093620] [<ffffffffc052286c>] ldiskfs_map_blocks+0x12c/0x6a0 [ldiskfs] [ 150.093630] [<ffffffffc0518c0e>] ? ldiskfs_alloc_file_blocks.isra.36+0xbe/0x2f0 [ldiskfs] [ 150.093639] [<ffffffffc0518c31>] ldiskfs_alloc_file_blocks.isra.36+0xe1/0x2f0 [ldiskfs] [ 150.093648] [<ffffffffc051fff9>] ldiskfs_fallocate+0x809/0x8a0 [ldiskfs] [ 150.093651] [<ffffffff904af45a>] ? __dquot_initialize+0x3a/0x240 [ 150.093656] [<ffffffffc0321a93>] ? jbd2__journal_start+0xf3/0x1f0 [jbd2] [ 150.093671] [<ffffffffc0c4da23>] osd_fallocate+0x243/0x530 [osd_ldiskfs] [ 150.093679] [<ffffffffc0c2ff65>] ? osd_trans_start+0x235/0x4e0 [osd_ldiskfs] [ 150.093688] [<ffffffffc106ce28>] ofd_object_fallocate+0x538/0x780 [ofd] [ 150.093693] [<ffffffffc10565b1>] ofd_fallocate_hdl+0x231/0x970 [ofd] [ 150.093742] [<ffffffffc09d6dbf>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [ 150.093789] [<ffffffffc0a3fd0a>] tgt_request_handle+0x96a/0x1700 [ptlrpc] [ 150.093829] [<ffffffffc0a1a301>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [ 150.093838] [<ffffffffc059402e>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [ 150.093873] [<ffffffffc09e33f6>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [ 150.093908] [<ffffffffc09e29ca>] ? ptlrpc_server_handle_req_in+0x92a/0x1100 [ptlrpc] [ 150.093912] [<ffffffff902c2df0>] ? wake_up_atomic_t+0x30/0x30 [ 150.093946] [<ffffffffc09e7f4c>] ptlrpc_main+0xb3c/0x14d0 [ptlrpc] [ 150.093980] [<ffffffffc09e7410>] ? ptlrpc_register_service+0xf90/0xf90 [ptlrpc] [ 150.093983] [<ffffffff902c1d21>] kthread+0xd1/0xe0 [ 150.093985] [<ffffffff902c1c50>] ? insert_kthread_work+0x40/0x40 [ 150.093988] [<ffffffff90975c1d>] ret_from_fork_nospec_begin+0x7/0x21 [ 150.093991] [<ffffffff902c1c50>] ? insert_kthread_work+0x40/0x40 [ 150.093992] ---[ end trace 92c47b4354741217 ]--- [ 150.093995] LDISKFS-fs: ldiskfs_mb_mark_diskspace_used:3450: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata [ 150.094045] LDISKFS: jbd2_journal_dirty_metadata failed: handle type 0 started at line 1919, credits 41/0, errcode -28 [ 150.094087] LDISKFS-fs warning (device sdb2): ldiskfs_mb_new_blocks:5077: Updating bitmap error: [err -28] [pa ffff8d59008f6068] [phy 1441792] [logic 1146880] [len 32768] [free 32768] [error 1] [inode 233] [ 150.094526] Quota error (device sdb2): qtree_write_dquot: dquota write failed [ 150.094552] LDISKFS-fs error (device sdb2) in ldiskfs_write_dquot:5495: error 28 [ 150.094886] Aborting journal on device sdb2-8. [ 150.095175] LDISKFS-fs (sdb2): Remounting filesystem read-only [ 150.095200] LDISKFS-fs error (device sdb2) in ldiskfs_reserve_inode_write:5313: Journal has aborted [ 150.095515] LDISKFS-fs error (device sdb2) in ldiskfs_alloc_file_blocks:4760: error 28 [ 150.095852] LDISKFS-fs error (device sdb2) in osd_trans_stop:2029: error 28 [ 150.095958] LustreError: 9933:0:(osd_handler.c:1728:osd_trans_commit_cb()) transaction @0xffff8d590a96a200 commit error: 2 [ 150.096084] LustreError: 9940:0:(osd_handler.c:2032:osd_trans_stop()) lustre-OST0000: failed to stop transaction: rc = -28 [ 152.806430] LustreError: 9940:0:(ofd_dev.c:1818:ofd_destroy_hdl()) lustre-OST0000: error destroying object [0x100000000:0x2:0x0]: -30
Attachments
Issue Links
- is related to
-
LU-13765 ldiskfs_mb_mark_diskspace_used:3472: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
- Resolved