[LU-8284] i_size updates from BRW writes are not atomic Created: 15/Jun/16  Updated: 13/Sep/16  Resolved: 13/Sep/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Major
Reporter: Andrew Perepechko Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Duplicate
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

There is a race window in osd_write_commit() between i_size_read() check and i_size_write(). A test case showing the issue and a fix will be uploaded shortly.



 Comments   
Comment by Gerrit Updater [ 15/Jun/16 ]

Andrew Perepechko (andrew.perepechko@seagate.com) uploaded a new patch: http://review.whamcloud.com/20815
Subject: LU-8284 tests: i_size updates from BRW writes are not atomic
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a621d7a64378b1765c33e95b8cfc4dacb9afeec0

Comment by Andrew Perepechko [ 15/Jun/16 ]

The test case cannot be properly integrated with the fix because it would require sleeping under spinlock or some other meaningless operation simply for the sake of having a test case.

The test case is uploaded only for the purpose of showing there is an issue with the current code and it is not supposed to be landed.

Comment by Gerrit Updater [ 15/Jun/16 ]

Andrew Perepechko (andrew.perepechko@seagate.com) uploaded a new patch: http://review.whamcloud.com/20816
Subject: LU-8284 osd-ldiskfs: i_size updates from BRW should be atomic
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b4ae906fb5021736384b98c211882577a77bb665

Comment by Andrew Perepechko [ 15/Jun/16 ]

That's how the issue is reproduced locally with the above test case:

[root@panda-testbox tests]# REFORMAT=yes ONLY=258 bash sanity.sh
Logging to shared log directory: /tmp/test_logs/1466023647
Client: Lustre version: 2.8.54_60_g2a55f34
MDS: Lustre version: 2.8.54_60_g2a55f34
OSS: Lustre version: 2.8.54_60_g2a55f34
Stopping clients: panda-testbox /mnt/lustre (opts:)
Stopping clients: panda-testbox /mnt/lustre2 (opts:)
Loading modules from /mnt/nfs/xyratex/lustre-release/lustre
detected 4 online CPUs by sysfs
Force libcfs to create 2 CPU partitions
debug=vfstrace rpctrace dlmtrace neterror ha config                   ioctl super lfsck
subsystem_debug=all -lnet -lnd -pinger
gss/krb5 is not supported
Formatting mgs, mds, osts
Format mds1: /dev/mapper/vg_livecd-mdt
Format ost1: /dev/mapper/vg_livecd-ost1
Format ost2: /dev/mapper/vg_livecd-ost2
Format ost3: /dev/mapper/vg_livecd-ost3
Checking servers environments
Checking clients panda-testbox environments
Loading modules from /mnt/nfs/xyratex/lustre-release/lustre
detected 4 online CPUs by sysfs
Force libcfs to create 2 CPU partitions
debug=vfstrace rpctrace dlmtrace neterror ha config                   ioctl super lfsck
subsystem_debug=all -lnet -lnd -pinger
gss/krb5 is not supported
Setup mgs, mdt, osts
Starting mds1:   /dev/mapper/vg_livecd-mdt /mnt/lustre-mds1
Started lustre-MDT0000
Starting ost1:   /dev/mapper/vg_livecd-ost1 /mnt/lustre-ost1
Started lustre-OST0000
Starting ost2:   /dev/mapper/vg_livecd-ost2 /mnt/lustre-ost2
Started lustre-OST0001
Starting ost3:   /dev/mapper/vg_livecd-ost3 /mnt/lustre-ost3
Started lustre-OST0002
Starting client: panda-testbox:  -o user_xattr,flock panda-testbox@tcp:/lustre /mnt/lustre
Starting client panda-testbox:  -o user_xattr,flock panda-testbox@tcp:/lustre /mnt/lustre
Started clients panda-testbox: 
panda-testbox@tcp:/lustre on /mnt/lustre type lustre (rw,user_xattr,flock)
Using TIMEOUT=20
seting jobstats to procname_uid
Setting lustre.sys.jobid_var from disable to procname_uid
warning: 'lctl conf_param' is deprecated, use 'lctl set_param -P' instead
Waiting 90 secs for update
Updated after 3s: wanted 'procname_uid' got 'procname_uid'
disable quota as required
osd-ldiskfs.track_declares_assert=1
running as uid/gid/euid/egid 500/500/500/500, groups:
 [touch] [/mnt/lustre/d0_runas_test/f10215]
excepting tests: 76 101g 42a 42b 42c 42d 45 68b
skipping tests SLOW=no: 24D 27m 64b 68 71 115 300o
preparing for tests involving mounts
mke2fs 1.42.13.wc4 (28-Nov-2015)

debug=-1
resend_count is set to 4 4 4
resend_count is set to 4 4 4
resend_count is set to 4 4 4
resend_count is set to 4 4 4
resend_count is set to 4 4 4


== sanity test 258: i_size updates from BRW should be atomic == 00:48:16 (1466023696)
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 1.09159 s, 1.9 MB/s
fail_loc=0x80000237
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.387469 s, 5.4 MB/s
cmp: EOF on /mnt/lustre/f258.sanity
 sanity test_258: @@@@@@ FAIL: files differ 
  Trace dump:
  = /mnt/nfs/xyratex/lustre-release/lustre/tests/test-framework.sh:4780:error()
  = sanity.sh:13995:test_258()
  = /mnt/nfs/xyratex/lustre-release/lustre/tests/test-framework.sh:5045:run_one()
  = /mnt/nfs/xyratex/lustre-release/lustre/tests/test-framework.sh:5084:run_one_logged()
  = /mnt/nfs/xyratex/lustre-release/lustre/tests/test-framework.sh:4882:run_test()
  = sanity.sh:14000:main()
Dumping lctl log to /tmp/test_logs/1466023647/sanity.test_258.*.1466023700.log
Dumping logs only on local client.
Resetting fail_loc on all nodes...done.
FAIL 258 (6s)
== sanity test complete, duration 55 sec == 00:48:22 (1466023702)
sanity: FAIL: test_258 files differ
Stopping clients: panda-testbox /mnt/lustre (opts:-f)
Stopping client panda-testbox /mnt/lustre opts:-f
Stopping clients: panda-testbox /mnt/lustre2 (opts:-f)
Stopping /mnt/lustre-mds1 (opts:-f) on panda-testbox
Stopping /mnt/lustre-ost1 (opts:-f) on panda-testbox
Stopping /mnt/lustre-ost2 (opts:-f) on panda-testbox
Stopping /mnt/lustre-ost3 (opts:-f) on panda-testbox
waited 0 for 10 ST ost OSS OSS_uuid 0
modules unloaded.
Comment by Gerrit Updater [ 11/Jul/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20816/
Subject: LU-8284 osd-ldiskfs: i_size updates from BRW should be atomic
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 98f2f46d51a4d4bdb12df2cf7b4f2b2c4fbbd735

Comment by Joseph Gmitter (Inactive) [ 13/Jul/16 ]

Patch landed to master for 2.9.0

Comment by Gerrit Updater [ 24/Aug/16 ]

Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/22103
Subject: LU-8284 osd-ldisksf: need lock around i_size update
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9ba5338b93333476623054dfde0bc824f8325297

Comment by Gerrit Updater [ 13/Sep/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22103/
Subject: LU-8284 osd-ldisksf: need lock around i_size update
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b172cd5bc80d2560056794cd640a95303ad42405

Comment by Peter Jones [ 13/Sep/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:16:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.