[LU-13955] OST become readonly when test using fio with file size larger than 4G Created: 11/Sep/20 Updated: 11/Sep/20 Resolved: 11/Sep/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Qian Yingjin | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
When do a test with latest mater branch by using fio, I found that when file size is larger than 4G, it will cause the OST into readonly state on CentOS7. cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core) # mkfs.lustre --fsname=lustre --mdt --mgs --index=0 --reformat /dev/sdb1 # mkfs.lustre --fsname=lustre --ost --mgsnode=192.168.150.128@tcp --index=0 --reformat /dev/sdb2 # mount.lustre /dev/sdb1 /mnt/lustre-mds1 # mount.lustre /dev/sdb2 /mnt/lustre-ost1 # mount.lustre 192.168.150.128@tcp:/lustre /mnt/lustre # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 52507040 21610852 28205936 44% / devtmpfs 1917568 0 1917568 0% /dev tmpfs 1930752 0 1930752 0% /dev/shm tmpfs 1930752 11732 1919020 1% /run tmpfs 1930752 0 1930752 0% /sys/fs/cgroup .host:/ 488245288 283447468 204797820 59% /mnt/hgfs tmpfs 386152 0 386152 0% /run/user/0 /dev/sdb1 159688 1908 143972 2% /mnt/lustre-mds1 /dev/sdb2 17839688 46168 16833420 1% /mnt/lustre-ost1 192.168.150.128@tcp:/lustre 17839688 46168 16833420 1% /mnt/lustre # lctl get_param version version=2.13.55_84_g03e6db5 mkdir /mnt/lustre/qian fio --name=seqread --directory=/mnt/lustre/qian --filesize=5G --bs=128K --create_only=1 --numjobs=1 --create_serialize=0 seqread: (g=0): rw=read, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=psync, iodepth=1 fio-3.1 Starting 1 process seqread: Laying out IO file (1 file / 5120MiB) fio: native_fallocate call failed: No space left on device fio: pid=10054, err=30/file:filesetup.c:184, func=ftruncate, error=Read-only file system Run status group 0 (all jobs): The server dump messages: [ 150.093475] WARNING: CPU: 0 PID: 9940 at /tmp/rpmbuild-lustre-root-t8NmDyeO/BUILD/lustre-2.13.55_84_g03e6db5/ldiskfs/ext4_jbd2.c:266 __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs] [ 150.093476] Modules linked in: lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ldiskfs(OE) ipmi_devintf ipmi_msghandler vmhgfs(OE) vmw_vsock_vmci_transport vsock ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel vmw_balloon aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr sg vmw_vmci i2c_piix4 parport_pc parport ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic sr_mod cdrom crct10dif_pclmul crct10dif_common crc32c_intel serio_raw vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm e1000 nfit drm libnvdimm drm_panel_orientation_quirks mptspi scsi_transport_spi mptscsih mptbase ata_generic [ 150.093516] pata_acpi ata_piix libata [ 150.093521] CPU: 0 PID: 9940 Comm: ll_ost00_002 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.12.2.el7_lustre.2.12.55_47_gf6497eb.x86_64 #1 [ 150.093523] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 [ 150.093524] Call Trace: [ 150.093532] [<ffffffff90963041>] dump_stack+0x19/0x1b [ 150.093536] [<ffffffff902976e8>] __warn+0xd8/0x100 [ 150.093539] [<ffffffff9029782d>] warn_slowpath_null+0x1d/0x20 [ 150.093551] [<ffffffffc050e862>] __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs] [ 150.093561] [<ffffffffc04ec67b>] ldiskfs_mb_mark_diskspace_used+0x2bb/0x510 [ldiskfs] [ 150.093570] [<ffffffffc04f0800>] ldiskfs_mb_new_blocks+0x350/0xb20 [ldiskfs] [ 150.093581] [<ffffffffc05186c5>] ? __read_extent_tree_block+0x55/0x1e0 [ldiskfs] [ 150.093585] [<ffffffff9041d9bb>] ? __kmalloc+0x1eb/0x230 [ 150.093596] [<ffffffffc0519764>] ? ldiskfs_ext_find_extent+0x134/0x340 [ldiskfs] [ 150.093606] [<ffffffffc051dbf6>] ldiskfs_ext_map_blocks+0x4a6/0xf60 [ldiskfs] [ 150.093610] [<ffffffff90477fff>] ? has_bh_in_lru+0xf/0x50 [ 150.093620] [<ffffffffc052286c>] ldiskfs_map_blocks+0x12c/0x6a0 [ldiskfs] [ 150.093630] [<ffffffffc0518c0e>] ? ldiskfs_alloc_file_blocks.isra.36+0xbe/0x2f0 [ldiskfs] [ 150.093639] [<ffffffffc0518c31>] ldiskfs_alloc_file_blocks.isra.36+0xe1/0x2f0 [ldiskfs] [ 150.093648] [<ffffffffc051fff9>] ldiskfs_fallocate+0x809/0x8a0 [ldiskfs] [ 150.093651] [<ffffffff904af45a>] ? __dquot_initialize+0x3a/0x240 [ 150.093656] [<ffffffffc0321a93>] ? jbd2__journal_start+0xf3/0x1f0 [jbd2] [ 150.093671] [<ffffffffc0c4da23>] osd_fallocate+0x243/0x530 [osd_ldiskfs] [ 150.093679] [<ffffffffc0c2ff65>] ? osd_trans_start+0x235/0x4e0 [osd_ldiskfs] [ 150.093688] [<ffffffffc106ce28>] ofd_object_fallocate+0x538/0x780 [ofd] [ 150.093693] [<ffffffffc10565b1>] ofd_fallocate_hdl+0x231/0x970 [ofd] [ 150.093742] [<ffffffffc09d6dbf>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [ 150.093789] [<ffffffffc0a3fd0a>] tgt_request_handle+0x96a/0x1700 [ptlrpc] [ 150.093829] [<ffffffffc0a1a301>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [ 150.093838] [<ffffffffc059402e>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [ 150.093873] [<ffffffffc09e33f6>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [ 150.093908] [<ffffffffc09e29ca>] ? ptlrpc_server_handle_req_in+0x92a/0x1100 [ptlrpc] [ 150.093912] [<ffffffff902c2df0>] ? wake_up_atomic_t+0x30/0x30 [ 150.093946] [<ffffffffc09e7f4c>] ptlrpc_main+0xb3c/0x14d0 [ptlrpc] [ 150.093980] [<ffffffffc09e7410>] ? ptlrpc_register_service+0xf90/0xf90 [ptlrpc] [ 150.093983] [<ffffffff902c1d21>] kthread+0xd1/0xe0 [ 150.093985] [<ffffffff902c1c50>] ? insert_kthread_work+0x40/0x40 [ 150.093988] [<ffffffff90975c1d>] ret_from_fork_nospec_begin+0x7/0x21 [ 150.093991] [<ffffffff902c1c50>] ? insert_kthread_work+0x40/0x40 [ 150.093992] ---[ end trace 92c47b4354741217 ]--- [ 150.093995] LDISKFS-fs: ldiskfs_mb_mark_diskspace_used:3450: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata [ 150.094045] LDISKFS: jbd2_journal_dirty_metadata failed: handle type 0 started at line 1919, credits 41/0, errcode -28 [ 150.094087] LDISKFS-fs warning (device sdb2): ldiskfs_mb_new_blocks:5077: Updating bitmap error: [err -28] [pa ffff8d59008f6068] [phy 1441792] [logic 1146880] [len 32768] [free 32768] [error 1] [inode 233] [ 150.094526] Quota error (device sdb2): qtree_write_dquot: dquota write failed [ 150.094552] LDISKFS-fs error (device sdb2) in ldiskfs_write_dquot:5495: error 28 [ 150.094886] Aborting journal on device sdb2-8. [ 150.095175] LDISKFS-fs (sdb2): Remounting filesystem read-only [ 150.095200] LDISKFS-fs error (device sdb2) in ldiskfs_reserve_inode_write:5313: Journal has aborted [ 150.095515] LDISKFS-fs error (device sdb2) in ldiskfs_alloc_file_blocks:4760: error 28 [ 150.095852] LDISKFS-fs error (device sdb2) in osd_trans_stop:2029: error 28 [ 150.095958] LustreError: 9933:0:(osd_handler.c:1728:osd_trans_commit_cb()) transaction @0xffff8d590a96a200 commit error: 2 [ 150.096084] LustreError: 9940:0:(osd_handler.c:2032:osd_trans_stop()) lustre-OST0000: failed to stop transaction: rc = -28 [ 152.806430] LustreError: 9940:0:(ofd_dev.c:1818:ofd_destroy_hdl()) lustre-OST0000: error destroying object [0x100000000:0x2:0x0]: -30
|
| Comments |
| Comment by Qian Yingjin [ 11/Sep/20 ] |
|
This bug can be easily reproduced via the fallocate:
fallocate -l 5G /mnt/lustre/tfile fallocate: fallocate failed: No space left on device [root@qian ~]# fallocate -l 4G /mnt/lustre/tfile fallocate: fallocate failed: Read-only file system [ 8421.788181] WARNING: CPU: 0 PID: 90292 at /tmp/rpmbuild-lustre-root-t8NmDyeO/BUILD/lustre-2.13.55_84_g03e6db5/ldiskfs/ext4_jbd2.c:266 __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs] [ 8421.788182] Modules linked in: lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ldiskfs(OE) ipmi_devintf ipmi_msghandler vmhgfs(OE) ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel vmw_vsock_vmci_transport vsock vmw_balloon aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr sg parport_pc parport vmw_vmci i2c_piix4 ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic sr_mod cdrom crct10dif_pclmul crct10dif_common crc32c_intel ata_generic serio_raw pata_acpi mptspi scsi_transport_spi e1000 mptscsih mptbase vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix drm_panel_orientation_quirks [ 8421.788212] libata nfit libnvdimm [ 8421.788216] CPU: 0 PID: 90292 Comm: ll_ost00_000 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.12.2.el7_lustre.2.12.55_47_gf6497eb.x86_64 #1 [ 8421.788217] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 [ 8421.788218] Call Trace: [ 8421.788286] [<ffffffffaaf63041>] dump_stack+0x19/0x1b [ 8421.788291] [<ffffffffaa8976e8>] __warn+0xd8/0x100 [ 8421.788295] [<ffffffffaa89782d>] warn_slowpath_null+0x1d/0x20 [ 8421.788304] [<ffffffffc045e862>] __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs] [ 8421.788311] [<ffffffffc043c67b>] ldiskfs_mb_mark_diskspace_used+0x2bb/0x510 [ldiskfs] [ 8421.788317] [<ffffffffc0440800>] ldiskfs_mb_new_blocks+0x350/0xb20 [ldiskfs] [ 8421.788324] [<ffffffffc04686c5>] ? __read_extent_tree_block+0x55/0x1e0 [ldiskfs] [ 8421.788327] [<ffffffffaaa1d9bb>] ? __kmalloc+0x1eb/0x230 [ 8421.788335] [<ffffffffc0469764>] ? ldiskfs_ext_find_extent+0x134/0x340 [ldiskfs] [ 8421.788341] [<ffffffffc046dbf6>] ldiskfs_ext_map_blocks+0x4a6/0xf60 [ldiskfs] [ 8421.788344] [<ffffffffaaa77fff>] ? has_bh_in_lru+0xf/0x50 [ 8421.788351] [<ffffffffc047286c>] ldiskfs_map_blocks+0x12c/0x6a0 [ldiskfs] [ 8421.788358] [<ffffffffc0468c0e>] ? ldiskfs_alloc_file_blocks.isra.36+0xbe/0x2f0 [ldiskfs] [ 8421.788363] [<ffffffffc0468c31>] ldiskfs_alloc_file_blocks.isra.36+0xe1/0x2f0 [ldiskfs] [ 8421.788369] [<ffffffffc046fff9>] ldiskfs_fallocate+0x809/0x8a0 [ldiskfs] [ 8421.788372] [<ffffffffaaaaf45a>] ? __dquot_initialize+0x3a/0x240 [ 8421.788377] [<ffffffffc025ba93>] ? jbd2__journal_start+0xf3/0x1f0 [jbd2] [ 8421.788391] [<ffffffffc0b9da23>] osd_fallocate+0x243/0x530 [osd_ldiskfs] [ 8421.788434] [<ffffffffc0b7ff65>] ? osd_trans_start+0x235/0x4e0 [osd_ldiskfs] [ 8421.788441] [<ffffffffc0fbce28>] ofd_object_fallocate+0x538/0x780 [ofd] [ 8421.788445] [<ffffffffc0fa65b1>] ofd_fallocate_hdl+0x231/0x970 [ofd] [ 8421.788478] [<ffffffffc0926dbf>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [ 8421.788511] [<ffffffffc098fd0a>] tgt_request_handle+0x96a/0x1700 [ptlrpc] [ 8421.788539] [<ffffffffc096a301>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [ 8421.788546] [<ffffffffc04e402e>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [ 8421.788570] [<ffffffffc09333f6>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [ 8421.788594] [<ffffffffc09329ca>] ? ptlrpc_server_handle_req_in+0x92a/0x1100 [ptlrpc] [ 8421.788654] [<ffffffffaa8c2df0>] ? wake_up_atomic_t+0x30/0x30 [ 8421.788682] [<ffffffffc0937f4c>] ptlrpc_main+0xb3c/0x14d0 [ptlrpc] [ 8421.788706] [<ffffffffc0937410>] ? ptlrpc_register_service+0xf90/0xf90 [ptlrpc] [ 8421.788708] [<ffffffffaa8c1d21>] kthread+0xd1/0xe0 [ 8421.788711] [<ffffffffaa8c1c50>] ? insert_kthread_work+0x40/0x40 [ 8421.788714] [<ffffffffaaf75c1d>] ret_from_fork_nospec_begin+0x7/0x21 [ 8421.788715] [<ffffffffaa8c1c50>] ? insert_kthread_work+0x40/0x40 [ 8421.788716] ---[ end trace ed0258569624c37e ]--- [ 8421.788719] LDISKFS-fs: ldiskfs_mb_mark_diskspace_used:3450: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata [ 8421.788821] LDISKFS: jbd2_journal_dirty_metadata failed: handle type 0 started at line 1919, credits 41/0, errcode -28 [ 8421.788893] LDISKFS-fs warning (device sdb2): ldiskfs_mb_new_blocks:5077: Updating bitmap error: [err -28] [pa ffff954669f92000] [phy 1441792] [logic 1146880] [len 32768] [free 32768] [error 1] [inode 233] [ 8421.789150] Quota error (device sdb2): qtree_write_dquot: dquota write failed [ 8421.789172] LDISKFS-fs error (device sdb2) in ldiskfs_write_dquot:5495: error 28 [ 8421.789453] Aborting journal on device sdb2-8. [ 8421.789758] LDISKFS-fs (sdb2): Remounting filesystem read-only [ 8421.789780] LDISKFS-fs error (device sdb2) in ldiskfs_reserve_inode_write:5313: Journal has aborted [ 8421.791221] LDISKFS-fs error (device sdb2) in ldiskfs_alloc_file_blocks:4760: error 28 [ 8421.791447] LDISKFS-fs error (device sdb2) in osd_trans_stop:2029: error 28 [ 8421.791519] LustreError: 90286:0:(osd_handler.c:1728:osd_trans_commit_cb()) transaction @0xffff954774c39d00 commit error: 2 [ 8421.791724] LustreError: 90292:0:(osd_handler.c:2032:osd_trans_stop()) lustre-OST0000: failed to stop transaction: rc = -28
|
| Comment by Wang Shilong (Inactive) [ 11/Sep/20 ] |
|
This looks duplicated as |