[LU-8786] Terrible i/o performance of a test application doing repeatable writes and truncates Created: 01/Nov/16  Updated: 04/Dec/18  Resolved: 04/Dec/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Alexander Zarochentsev Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-10048 osd-ldiskfs to truncate outside of ma... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

A customer application benchmark required for acceptance testing shows terrible i/o performance. A simple test case was created to mimic the customer application behavior. This test never completes when run with a walltime limit of 30 minutes (job is killed after walltime exceeded); the same test run to the /tmp fs on the client node completes within a few seconds.
The tests completes in few seconds in Lustre-2.1.

 program iotest
      PARAMETER (NA=6400)
      dimension IA(NA)
      call init (IA,NA)
      call sleep(1)
      open (unit=22, file='time.step')
      do i=1,na
      call my_write(iA,NA,i)
      end do
      STOP
      END
      SUBROUTINE my_write(IA,NA,I)
      dimension ia(na)
      kt=ia(i)
      WRITE ( 22, '(1x, i8)' )   kt
      REWIND (22)
      return
      end
      subroutine init(IA,NA)
      dimension iA(na)
      do i=1,NA
      ia(i)=i
      end do
      return
      end


 Comments   
Comment by Gerrit Updater [ 01/Nov/16 ]

Alexander Zarochentsev (alexander.zarochentsev@seagate.com) uploaded a new patch: http://review.whamcloud.com/23502
Subject: LU-8786 osd: unnecessary truncate in osd_punch()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: fcac693a8f722758c45961a97fa17b75556c0b4d

Comment by Alexander Zarochentsev [ 01/Nov/16 ]

The patch restores a truncate optimisation lost in obdfilter->ofd rewrite. It explains why the test works well in Lustre-2.1.

Comment by Alexander Zarochentsev [ 04/Dec/18 ]

The optimization is already included into LU-10048: osd: async truncate:

@@ -1937,49 +1949,51 @@ static int osd_punch(const struct lu_env *env, struct dt_object *dt,
        oh = container_of(th, struct osd_thandle, ot_super);
        LASSERT(oh->ot_handle->h_transaction != NULL);
 
-       osd_trans_exec_op(env, th, OSD_OT_PUNCH);
+       /* we used to skip truncate to current size to
+        * optimize truncates on OST. with DoM we can
+        * get attr_set to set specific size (MDS_REINT)
+        * and then get truncate RPC which essentially
+        * would be skipped. this is bad.. so, disable
+        * this optimization on MDS till the client stop
+        * to sent MDS_REINT (LU-11033) -bzzz */
+       if (osd->od_is_ost && i_size_read(inode) == start)
+               RETURN(0);
 
-       tid = oh->ot_handle->h_transaction->t_tid;
+       osd_trans_exec_op(env, th, OSD_OT_PUNCH);
 
        spin_lock(&inode->i_lock);
+       if (i_size_read(inode) < start)
+               grow = true;
        i_size_write(inode, start);
        spin_unlock(&inode->i_lock);
        ll_truncate_pagecache(inode, start);
Generated at Sat Feb 10 02:20:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.