[LU-4000] Fix build failure on ppc64 w/ 64k pages Created: 24/Sep/13  Updated: 10/Jan/17  Resolved: 10/Jan/17

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Major
Reporter: Jeff Mahoney Assignee: James A Simmons
Resolution: Won't Do Votes: 0
Labels: patch

Issue Links:
Related
is related to LU-6585 Virtual block device (lloop) Closed
Severity: 3
Rank (Obsolete): 10710

 Description   

lloop fails to build on ppc64 with 64k pages due to the block layer API that limits the logical block size to sizes representable with an unsigned short.

The logical block size shouldn't be set to the page size since that will force any file systems on that loop device to also require a 64k page size.



 Comments   
Comment by Jeff Mahoney [ 24/Sep/13 ]

Fix here: http://review.whamcloud.com/7745

Comment by Peter Jones [ 24/Sep/13 ]

Thanks Jeff!

Minh

Could you please take care of this patch?

Thanks

Peter

Comment by Jodi Levi (Inactive) [ 27/Sep/13 ]

Patch landed to Master.

Comment by Jinshan Xiong (Inactive) [ 25/Oct/13 ]

This patch broke loop device as follows:

-----------[ cut here ]-----------
kernel BUG at /root/lustre/lustre/llite/lloop.c:226!
invalid opcode: 0000 1 SMP
last sysfs file: /sys/devices/virtual/block/lloop0/dev
CPU 4
Modules linked in: llite_lloop(U) lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) osd_ldiskfs(U) fsfilt_ldiskfs(U) ldiskfs(U) mdd(U) mgs(U) lquota(U) lfsck(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) jbd2 sha512_generic sha256_generic crc32c_intel nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core e1000e microcode serio_raw i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ext3 jbd mbcache sd_mod crc_t10dif pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]

Pid: 20050, comm: lloop0 Not tainted 2.6.32-358.18.1.el6_lustre.ga0a1066.x86_64 #1 Supermicro X8DTT-H/X8DTT-H
RIP: 0010:[<ffffffffa02ad4ca>] [<ffffffffa02ad4ca>] loop_thread+0x71a/0x860 [llite_lloop]
RSP: 0018:ffff8807d282fdf0 EFLAGS: 00010206
RAX: 0000000000000000 RBX: ffff8807b8c20000 RCX: 0000000000000000
RDX: ffff880638efd370 RSI: 0000000000000000 RDI: 0000000000000002
RBP: ffff8807d282fee0 R08: 0000000000000400 R09: ffff8807c1a0a138
R10: ffff880638efd300 R11: 0000000000000000 R12: ffff880638efd300
R13: ffff8807b8c201f0 R14: ffff8807b8c209f0 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff880045680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000034b161d9d0 CR3: 000000082f466000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process lloop0 (pid: 20050, threadinfo ffff8807d282e000, task ffff88082ee60080)
Stack:
0000000000000000 ffff8807d282fe80 ffff8805fc19be18 ffff8807c1a0a138
<d> ffff88082ee60080 ffff8807d282fe98 ffff8807b8c20098 ffff8807b8c20048
<d> ffff8807b8c201c8 00000000b7ec5c60 ffff8807b8c20060 ffff8807b8c200a0
Call Trace:
[<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa02acdb0>] ? loop_thread+0x0/0x860 [llite_lloop]
[<ffffffff81096a36>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffff810969a0>] ? kthread+0x0/0xa0
[<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Code: 60 e9 2a a0 48 c7 05 d2 14 00 00 90 e9 2a a0 c7 05 c0 14 00 00 00 04 00 00 31 c0 8b 13 e8 af 22 94 00 e9 02 fa ff ff 0f 0b eb fe <0f> 0b 0f 1f 40 00 eb fa 48 c7 c7 00 e3 2a a0 48 c7 c2 ba da 2a
RIP [<ffffffffa02ad4ca>] loop_thread+0x71a/0x860 [llite_lloop]
RSP <ffff8807d282fdf0>

We have to revert this patch or make a new fix.

Comment by Jeff Mahoney [ 28/Oct/13 ]

Ah, ok. It looks like my fix was incomplete. Those assertions essentially back up the ones made by the build assertion my patch removed. My initial analysis of where the logical block size is used missed the directio case. I expect that's where these split bios are coming from.

So, those two BUG_ONs need to be removed to avoid the Oops. The original fix should probably be changed to use min(PAGE_SIZE, 32768) to keep the original lloop performance, at least until the size of the logical_block_size queue limit variable is increased (if that happens).

Comment by Jeff Mahoney [ 29/Oct/13 ]

Updated but untested fix here: http://review.whamcloud.com/8096

Comment by James A Simmons [ 10/Jan/17 ]

The llite_lloop back device is no longer supported so this can be closed.

Generated at Sat Feb 10 01:38:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.