[LU-4351] sanity test_54c: can't find an ext2 filesystem on dev loop3 Created: 05/Dec/13  Updated: 21/Jul/14  Resolved: 03/Jun/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0, Lustre 2.5.2

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Bob Glossman (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

lustre-master build #1791 ldiskfs
client is running SLES11 SP3


Issue Links:
Duplicate
duplicates LU-5032 Mount of loopdev in test 54c fails Resolved
Severity: 3
Rank (Obsolete): 11919

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/0ac2cba4-5de4-11e3-8f3c-52540035b04c.

The sub-test test_54c failed with the following error:

error mounting /mnt/lustre/loop54c on /mnt/lustre/d54c

client dmesg shows:

[ 2564.211169] LustreError: 9888:0:(rw.c:128:ll_cl_init()) lustre: [0x200001b70:0xeff1:0x0] no active IO, please file a ticket.
[ 2564.211172] Pid: 9888, comm: loop3
[ 2564.211173] 
[ 2564.211174] Call Trace:
[ 2564.211186]  [<ffffffff81004935>] dump_trace+0x75/0x310
[ 2564.211210]  [<ffffffffa05c282a>] libcfs_debug_dumpstack+0x4a/0x70 [libcfs]
[ 2564.211250]  [<ffffffffa0becb8e>] ll_cl_init+0x22e/0x330 [lustre]
[ 2564.211280]  [<ffffffffa0c0901a>] ll_write_begin+0x8a/0x5d0 [lustre]
[ 2564.211305]  [<ffffffffa02cb50d>] do_lo_send_aops+0xad/0x1b0 [loop]
[ 2564.211310]  [<ffffffffa02cb7a0>] do_bio_filebacked+0x190/0x280 [loop]
[ 2564.211314]  [<ffffffffa02cb952>] loop_thread+0xc2/0x250 [loop]
[ 2564.211319]  [<ffffffff81082306>] kthread+0x96/0xa0
[ 2564.211325]  [<ffffffff81467864>] kernel_thread_helper+0x4/0x10
[ 2564.211328] 
[ 2564.211335] Buffer I/O error on device loop3, logical block 0
[ 2564.211336] lost page write due to I/O error on loop3
[ 2564.211345] Pid: 9888, comm: loop3
[ 2564.211346] 
[ 2564.211346] Call Trace:
[ 2564.211350]  [<ffffffff81004935>] dump_trace+0x75/0x310
[ 2564.211360]  [<ffffffffa05c282a>] libcfs_debug_dumpstack+0x4a/0x70 [libcfs]
[ 2564.211377]  [<ffffffffa0becb8e>] ll_cl_init+0x22e/0x330 [lustre]
[ 2564.211403]  [<ffffffffa0c0901a>] ll_write_begin+0x8a/0x5d0 [lustre]
[ 2564.211421]  [<ffffffffa02cb50d>] do_lo_send_aops+0xad/0x1b0 [loop]
[ 2564.211426]  [<ffffffffa02cb7a0>] do_bio_filebacked+0x190/0x280 [loop]
[ 2564.211430]  [<ffffffffa02cb952>] loop_thread+0xc2/0x250 [loop]
[ 2564.211434]  [<ffffffff81082306>] kthread+0x96/0xa0
[ 2564.211445]  [<ffffffff81467864>] kernel_thread_helper+0x4/0x10
[ 2564.237136] EXT2-fs (loop3): error: can't find an ext2 filesystem on dev loop3.
[ 2568.722796] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_54c: @@@@@@ FAIL: error mounting \/mnt\/lustre\/loop54c on \/mnt\/lustre\/d54c 


 Comments   
Comment by Jinshan Xiong (Inactive) [ 06/Dec/13 ]

too bad, this kernel is too old which still calls write_being() and write_end() to write pages.

Comment by Oleg Drokin [ 09/Dec/13 ]

what do you mean the kernel is OLD? It's 3.4 or something like that, definitely very new compared to rhel6.x we run.

Comment by Jian Yu [ 10/Jan/14 ]

Lustre build: http://build.whamcloud.com/job/lustre-reviews/20841/
Distro/Arch: SLES11SP3/x86_64 (both server and client, kernel version: 3.0.101-0.8)

The same failure occurred:
https://maloo.whamcloud.com/test_sets/7e296c20-7986-11e3-a27b-52540035b04c

Comment by Bob Glossman (Inactive) [ 14/Apr/14 ]

This is actually a distinct SLES11 specific bug and not a dup. The underlying cause is the quite variant implementation of kernel loop devices in SLES11. It differs from the implementation of similar devices in both earlier and later kernel versions in all other distros yet examined, including much later kernel.org versions.

The basic problem is that the SLES implementation uses address space ops while all the other linux'es use only regular file I/O ops in their write code for kernel loop devices. This doesn't play nicely with embedded assumptions in lustre code.

Since the problem is expected to go away in later kernel versions, even later SLES kernels, the plan is to just skip the test in current SLES clients as it will always fail there.

Comment by Bob Glossman (Inactive) [ 14/Apr/14 ]

http://review.whamcloud.com/9955

Comment by Bob Glossman (Inactive) [ 24/Apr/14 ]

in b2_5
http://review.whamcloud.com/10085

Comment by Patrick Farrell (Inactive) [ 08/May/14 ]

I'm a bit puzzled:
Cray has been running sanity on 2.x Lustre+SLES11SP3 for some time now. We only see this test failure when testing master, not 2.5 or 2.4.
So it seem strange to say the fault lies completely with SLES.

I'm not saying SLES doesn't have a bug in it, but wasn't there a change made on the Lustre side that caused this failure to start happening? Have you actually seen this test failure with b2_5?

Comment by Bob Glossman (Inactive) [ 08/May/14 ]

yes, this problem has been reproduced in various branches. not just seen in master. It's possible that the SLES11 flaw in their kernel device implementation only cut in with some specific kernel version over the long life of SLES11. I only looked at recent 3.0.x versions, didn't check back through all of SLES11 history.

It's also possible that the lustre sensitivity to the use of address space ops called directly came about due to recent changes in lustre, like clio or unified target. Even if that's true we aren't likely to back out such new features due to a very SLES11 specific incompatibility.

Just as an aside have already confirmed that the current SLES12 beta doesn't have this flaw in its loop devices. Nor does any other linux kernel we've been able to examine.

Comment by Patrick Farrell (Inactive) [ 08/May/14 ]

Thanks, Bob. That clarifies.

Well, if this use case turns out to be something we/our customers truly care about - I doubt it will, but... - Cray will continue looking in to it to try to figure out a fix and pass on whatever we find.

Comment by Dmitry Eremin (Inactive) [ 12/May/14 ]

I still have the error on Xeon Phi:

[12215.497838] LustreError: 33755:0:(rw.c:128:ll_cl_init()) lustre25: [0x200000bd1:0xbfda:0x0] no active IO, please file a ticket.
[12215.497865] Pid: 33755, comm: loop3
[12215.497871]
[12215.497874] Call Trace:
[12215.497934]  [<ffffffffa01615f9>] libcfs_debug_dumpstack+0x5b/0x6d [libcfs]
[12215.497991]  [<ffffffffa066e177>] ll_cl_init+0x1e1/0x30a [lustre]
[12215.498043]  [<ffffffffa0684745>] ll_write_begin+0x120/0x609 [lustre]
[12215.498068]  [<ffffffff8109f2de>] pagecache_write_begin+0x1c/0x1e
[12215.498091]  [<ffffffff81204c8f>] do_lo_send_aops+0x9c/0x17a
[12215.498110]  [<ffffffff81022c8c>] ? set_next_entity+0x45/0x99
[12215.498127]  [<ffffffff81028ae7>] ? pick_next_task_fair+0xaf/0xbf
[12215.498145]  [<ffffffff81204600>] do_bio_filebacked+0xfd/0x27f
[12215.498161]  [<ffffffff81204bf3>] ? do_lo_send_aops+0x0/0x17a
[12215.498178]  [<ffffffff81204bbd>] loop_thread+0x1db/0x211
[12215.498194]  [<ffffffff8104dba0>] ? autoremove_wake_function+0x0/0x38
[12215.498211]  [<ffffffff812049e2>] ? loop_thread+0x0/0x211
[12215.498229]  [<ffffffff8104d611>] kthread+0x84/0x8c
[12215.498247]  [<ffffffff81003814>] kernel_thread_helper+0x4/0x10
[12215.498264]  [<ffffffff8104d58d>] ? kthread+0x0/0x8c
[12215.498278]  [<ffffffff81003810>] ? kernel_thread_helper+0x0/0x10
[12215.498286]
[12215.498321] Buffer I/O error on device loop3, logical block 0
[12215.498333] lost page write due to I/O error on loop3
[12215.498416] Pid: 33755, comm: loop3
[12215.498423]
[12215.498426] Call Trace:
[12215.498459]  [<ffffffffa01615f9>] libcfs_debug_dumpstack+0x5b/0x6d [libcfs]
[12215.498508]  [<ffffffffa066e177>] ll_cl_init+0x1e1/0x30a [lustre]
[12215.498560]  [<ffffffffa0684745>] ll_write_begin+0x120/0x609 [lustre]
[12215.498577]  [<ffffffff8109f2de>] pagecache_write_begin+0x1c/0x1e
[12215.498594]  [<ffffffff81204c8f>] do_lo_send_aops+0x9c/0x17a
[12215.498615]  [<ffffffff81351ed8>] ? __slab_free+0x78/0xd8
[12215.498630]  [<ffffffff810a323b>] ? mempool_free_slab+0x17/0x19
[12215.498647]  [<ffffffff81204600>] do_bio_filebacked+0xfd/0x27f
[12215.498663]  [<ffffffff81204bf3>] ? do_lo_send_aops+0x0/0x17a
[12215.498685]  [<ffffffff8110605f>] ? bio_free+0x3d/0x54
[12215.498700]  [<ffffffff8110608b>] ? bio_fs_destructor+0x15/0x17
[12215.498716]  [<ffffffff81105cc3>] ? bio_put+0x2b/0x2d
[12215.498731]  [<ffffffff81204bbd>] loop_thread+0x1db/0x211
[12215.498745]  [<ffffffff8104dba0>] ? autoremove_wake_function+0x0/0x38
[12215.498762]  [<ffffffff812049e2>] ? loop_thread+0x0/0x211
[12215.498778]  [<ffffffff8104d611>] kthread+0x84/0x8c
[12215.498794]  [<ffffffff81003814>] kernel_thread_helper+0x4/0x10
[12215.498811]  [<ffffffff8104d58d>] ? kthread+0x0/0x8c
[12215.498825]  [<ffffffff81003810>] ? kernel_thread_helper+0x0/0x10
[12215.498833]
[12215.498862] Buffer I/O error on device loop3, logical block 1
[12215.498874] lost page write due to I/O error on loop3
[12215.498950] Pid: 33755, comm: loop3
...
Comment by Bob Glossman (Inactive) [ 12/May/14 ]

Dmitry,
Are you using a SLES kernel on the Phi? if so then this is expected behavior. If not then it may be an entirely different problem.

Comment by Dmitry Eremin (Inactive) [ 12/May/14 ]

No, it's upstream kernel 2.6.38 with patches for Xeon Phi.

Comment by Bob Glossman (Inactive) [ 12/May/14 ]

I see your call stack has do_lo_send_aops in it. That says this is likely the same bug that's in the SLES11 kernel. Does any of your Phi specific patches apply in drivers/block/loop.c? If not then this bug probably exists in 2.6.38 upstream kernel source you are using. Previously had only seen this internal routine in SLES 11 (3.0.x) kernels. Not seen in el6 (2.6.32-xxx) kernels or fc20, rhel7, sles12.

As far as I know there is no workaround.

Comment by Bob Glossman (Inactive) [ 13/May/14 ]

I have been researching the use of address space ops in the kernel loop device code a bit. As far as I can tell it's been part of the mainline upstream linux kernel for a very long time until the following commit in 2011:

author	Christoph Hellwig <hch@infradead.org>	2011-10-17 10:57:20 (GMT)
committer	Jens Axboe <axboe@kernel.dk>	2011-10-17 10:57:20 (GMT)
commit	456be1484ffc72a24bdb4200b5847c4fa90139d9 (patch)
tree	570f0818bd6cfa245ab23d0121853b7b1e5a649b /drivers/block/loop.c
parent	8bc03e8f3a334e09e89a7dffb486ee97a5ce84ae (diff)
loop: remove the incorrect write_begin/write_end shortcut
Currently the loop device tries to call directly into write_begin/write_end
instead of going through ->write if it can.  This is a fairly nasty shortcut
as write_begin and write_end are only callbacks for the generic write code
and expect to be called with filesystem specific locks held.

This code currently causes various issues for clustered filesystems as it
doesn't take the required cluster locks, and it also causes issues for XFS
as it doesn't properly lock against the swapext ioctl as called by the
defragmentation tools.  This in case causes data corruption if
defragmentation hits a busy loop device in the wrong time window, as
reported by RH QA.

The reason why we have this shortcut is that it saves a data copy when
doing a transformation on the loop device, which is the technical term
for using cryptoloop (or an XOR transformation).  Given that cryptoloop
has been deprecated in favour of dm-crypt my opinion is that we should
simply drop this shortcut instead of finding complicated ways to to
introduce a formal interface for this shortcut.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

I suspect the reason we don't see it in el6 is that yanking it out was one of the RH customizations of their distro kernel.
I suspect the reason it's in SLES11 is the branch their kernel is derived from is too old to include the upstream kernel commit.

Comment by Bob Glossman (Inactive) [ 22/May/14 ]

Recent bug report submitted upstream to SuSE and marked fixed suggests this bug will very likely disappear after the next sles11sp3 kernel update.

Comment by Patrick Farrell (Inactive) [ 22/May/14 ]

Bob: Are you able to provide a link or other info about that bug report?

Thanks

Comment by Bob Glossman (Inactive) [ 22/May/14 ]

Patrick,
It's https://bugzilla.novell.com/show_bug.cgi?id=878123. Don't know if that's generally accessible.

Comment by Bob Glossman (Inactive) [ 22/May/14 ]

btw, the fix isn't in the latest update to 3.0.101-0.29. I expect it in the next one.

Comment by Patrick Farrell (Inactive) [ 22/May/14 ]

Thanks, Bob - It doesn't seem to be, but Cray has support contracts with SUSE, so someone around here should be able to get me in.

Much appreciated.

Comment by Jodi Levi (Inactive) [ 03/Jun/14 ]

Patches have landed.

Comment by Bob Glossman (Inactive) [ 21/Jul/14 ]

fixed in sles11sp3 as well in the most recent upstream kernel update from SuSE to version 3.0.101-0.35

Generated at Sat Feb 10 01:41:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.