[LU-3392] filter_do_bio()) ASSERTION(rw == OBD_BRW_READ) Created: 24/May/13  Updated: 03/Feb/17  Resolved: 28/Sep/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.2
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Wojciech Turek (Inactive) Assignee: WC Triage
Resolution: Incomplete Votes: 1
Labels: None
Environment:

servers: RHEL6 2.6.32-220.17.1.el6_lustre.x86_64 Lustre-2.1.2
clients: RHEL6 2.6.32-358.6.2.el6.x86_64 Lustre-2.1.5 patchless


Issue Links:
Duplicate
Severity: 3
Rank (Obsolete): 8396

 Description   

We have experienced an LBUG today on one of our OSS servers which I have not seen before.

LustreError: 16066:0:(filter_io_26.c:344:filter_do_bio()) ASSERTION(rw == OBD_BRW_READ) failed
LustreError: 16066:0:(filter_io_26.c:344:filter_do_bio()) LBUG
Pid: 16066, comm: ll_ost_io_126

Now after rebooting that OSS same LBUG is triggered as soon as OSTs finish recovery and start servicing their data. Has anyone seen this before ?

Our environment:
servers: RHEL6 2.6.32-220.17.1.el6_lustre.x86_64 Lustre-2.1.2
clients: RHEL6 2.6.32-358.6.2.el6.x86_64 Lustre-2.1.5 patchless



 Comments   
Comment by Wojciech Turek (Inactive) [ 29/May/13 ]

The problem seem to be caused by one particular OST. I identify that OST by unmounting OSTs and I have run fsck on that OST and it found some errors. After fixing them I was able to mount OST and and LBUG was not reoccurring. I can not explain how the corruption of the inode size has crept in in the first place which is concerning.

fsck from util-linux-ng 2.17.2
e2fsck 1.42.7.wc1 (12-Apr-2013)
lustre1-OST0003: recovering journal
Pass 1: Checking inodes, blocks, and sizes
Inode 9289789, i_size is 17592186044416, should be 17592184995840. Fix<y>? yes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (447184151, counted=447797885).
Fix<y>? yes
Free inodes count wrong (16879337, counted=16879523).
Fix<y>? yes

lustre1-OST0003: ***** FILE SYSTEM WAS MODIFIED *****

6008797 inodes used (26.25%, out of 22888320)
455104 non-contiguous files (0.4%)
32 non-contiguous directories (0.0%)

  1. of inodes with ind/dind/tind blocks: 0/0/0
    Extent depth histogram: 5919318/89344/127
    5411611248 blocks used (92.36%, out of 5859409133)
    0 bad blocks
    1726 large files

6008751 regular files
37 directories
0 character device files
0 block device files
0 fifos
0 links
0 symbolic links (0 fast symbolic links)
0 sockets
------------
6008788 files

Comment by Wojciech Turek (Inactive) [ 03/Jun/13 ]

This LBUG has hit us again, and I found that fsck alone does not actually fix it but aborting recovery does. So after being hit by this lbug one needs to restart OSS and then fsck the OST to fix the i_size error then mount the OST with abort recovery option. If we did not abort recovery LBUG hits again and the i_size is corrupted again. We found that after recovering filesystem one client would not come back. This is most likely the client that creates the problem in the first place. It sounds like a serious bug as it seem that a client operation can bring down the server. Next step for us is to update the server side to 2.1.5 and see if we still see this problem.

Comment by John Fuchs-Chesney (Inactive) [ 28/Sep/15 ]

Marking this as resolved/incomplete.

If this is still a live issue on newer release, just let us know and we'll move the ticket to the correct Project.

Thanks,
~ jfc.

Comment by Gerrit Updater [ 24/Nov/16 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: http://review.whamcloud.com/23931
Subject: LU-3392 obdfilter: handle large file writting gracefully
Project: fs/lustre-release
Branch: b2_1
Current Patch Set: 1
Commit: d03ea264ec0aa273e3e91ef81070a66adeced965

Comment by Gerrit Updater [ 24/Nov/16 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: http://review.whamcloud.com/23938
Subject: LU-3392 osd-ldiskfs: handle large file writting gracefully
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 622fbed8fd4203487c78ec24bdda2bbd0a9ded07

Generated at Sat Feb 10 01:33:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.