[LU-599] 1.8<->2.1 interop: Oops: RIP: mdt:mdt_identity_parse_downcall+0x4dc/0x5d0 Created: 16/Aug/11  Updated: 19/Aug/11  Resolved: 19/Aug/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0, Lustre 1.8.6
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Blocker
Reporter: Jian Yu Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Lustre Clients:
Tag: 1.8.6-wc1
Distro/Arch: RHEL5/x86_64 (kernel version: 2.6.18-238.12.1.el5)
Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/arch=x86_64,build_type=client,distro=el5,ib_stack=ofa/
Network: IB (OFED 1.5.3.1)

Lustre Servers:
Branch: master
Distro/Arch: RHEL5/x86_64 (kernel version: 2.6.18-238.19.1.el5_lustre.gd4ea36c)
Build: http://newbuild.whamcloud.com/job/lustre-master/257/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/
Network: IB (OFED 1.5.3.1)


Severity: 3
Rank (Obsolete): 4911

 Description   

While running sanity test, the MDS crashed as follows:

Lustre: DEBUG MARKER: -----============= acceptance-small: sanity ============----- Tue Aug 16 05:32:37 PDT 2011^MLustre: DEBUG MARKER: Using TIMEOUT=20^MLustre: 13601:0:(debug.c:323:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.^MLustre: 13601:0:(debug.c:323:libcfs_debug_str2mask()) Skipped 1 previous similar message^MUnable to handle kernel NULL pointer dereference at 0000000000000000 RIP: ^M [<ffffffff88de82ac>] :mdt:mdt_identity_parse_downcall+0x4dc/0x5d0^MPGD 21a22f067 PUD 31df09067 PMD 0 ^M
Oops: 0000 [1] SMP ^M
last sysfs file: /block/sdb/queue/max_sectors_kb^M
CPU 13 ^M
Modules linked in: cmm(U) osd_ldiskfs(FU) mdt(U) mdd(U) mds(U) fsfilt_ldiskfs(FU) exportfs(U) mgs(U) mgc(U) lustre(U) lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ko2iblnd(U) p
tlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) ldiskfs(U) jbd2(U) crc16(U) nfs(U) fscache(U) nfs_acl(U) mlx4_ib(U) mlx4_core(U) autofs4(U) hidp(U) rfcomm(U) l2cap(U) bluetooth(
U) lockd(U) sunrpc(U) cpufreq_ondemand(U) powernow_k8(U) freq_table(U) mperf(U) be2iscsi(U) iscsi_tcp(U) bnx2i(U) cnic(U) uio(U) iw_cxgb3(U) cxgb3(U) libiscsi_tcp(U) libiscsi2(
U) scsi_transport_iscsi2(U) scsi_transport_iscsi(U) rds(U) ib_sdp(U) ib_ipoib(U) ipoib_helper(U) rdma_ucm(U) rdma_cm(U) ib_ucm(U) ib_uverbs(U) ib_umad(U) ib_cm(U) iw_cm(U) ib_a
ddr(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) ib_sa(U) ib_mad(U) ib_core(U) loop(U) dm_mirror(U) dm_multipath(U) scsi_dh(U) video(U) backlight(U) sbs(U) power_meter(U) i2c_ec(U) d
ell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) shpchp(U) igb(U) sg(U) 8021q(U) k10temp(U) hwmon(U) i2c_piix4(U) tpm
_tis(U) i2c_core(U) dca(U) tpm(U) pcspkr(U) tpm_bios(U) amd64_edac_mod(U) serio_raw(U) edac_mc(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(
U) ahci(U) libata(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)^M
Pid: 13607, comm: l_getidentity Tainted: GF     2.6.18-238.19.1.el5_lustre.gd4ea36c #1^M
RIP: 0010:[<ffffffff88de82ac>]  [<ffffffff88de82ac>] :mdt:mdt_identity_parse_downcall+0x4dc/0x5d0^M
RSP: 0018:ffff8100bd67dd58  EFLAGS: 00010202^M
RAX: 0000000000000000 RBX: ffff81031e4b5000 RCX: ffff8100be340f88^M
RDX: 00000000000001f4 RSI: 00000000000001f4 RDI: 0000000000000000^M
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000^M
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000001f4^M
R13: ffff8100d6dc6000 R14: 0000000000000000 R15: ffff8100be340f40^M
FS:  00002b5557b966e0(0000) GS:ffff810223a892c0(0000) knlGS:0000000000000000^M
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M
CR2: 0000000000000000 CR3: 000000031e97b000 CR4: 00000000000006e0^M
Process l_getidentity (pid: 13607, threadinfo ffff8100bd67c000, task ffff8103238ac080)^M
Stack:  ffff810103f93b80 ffffffff8002cc10 ffff810103f93b80 ffff81011fe92d20^M
 ffff8100bd67dea8 ffffffff8000eb94 ffff8100bd67dea8 ffff810000000000^M
 00000000ffffffe9 ffff8100be340f40 ffff8100be340f58 00000000000001f4^M
Call Trace:^M
 [<ffffffff8002cc10>] mntput_no_expire+0x19/0x89^M
 [<ffffffff8000eb94>] link_path_walk+0xa6/0xb2^M
 [<ffffffff8882b1dc>] :libcfs:upcall_cache_downcall+0x4ec/0x720^M
 [<ffffffff88dece69>] :mdt:lprocfs_wr_identity_info+0x659/0x720^M
 [<ffffffff8001ebe5>] __dentry_open+0x101/0x1dc^M
 [<ffffffff888df11b>] :obdclass:lprocfs_fops_write+0x6b/0xc0^M
 [<ffffffff80016b48>] vfs_write+0xce/0x174^M
 [<ffffffff80017415>] sys_write+0x45/0x6e^M
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0^M
^M
^M
Code: 41 8b 06 41 b9 cf 00 00 00 89 54 24 28 89 74 24 20 49 c7 c0 ^M
RIP  [<ffffffff88de82ac>] :mdt:mdt_identity_parse_downcall+0x4dc/0x5d0^M
 RSP <ffff8100bd67dd58>^M
CR2: 0000000000000000^M
 <0>Kernel panic - not syncing: Fatal exception^M

Maloo report: https://maloo.whamcloud.com/test_sets/f85b8234-c804-11e0-8d02-52540025f9af



 Comments   
Comment by Jian Yu [ 16/Aug/11 ]

This issue blocks all of the tests which need to be run by non-root users.

Comment by Peter Jones [ 16/Aug/11 ]

Hongchao will look into this one

Comment by Oleg Drokin [ 16/Aug/11 ]

I also looked at it a bit and I cannot reproduce any problems when I try to access 2.1 servers from 1.8 client as non-root.
I only did simple accesses like touch, cat and such, though.
If anything more complicated is required, please specify the exact reproduction steps.

Comment by nasf (Inactive) [ 17/Aug/11 ]

If the target user only belong to one group, then we just set "identity->mi_gid", and does not set "identity->mi_ginfo", then save the memory allocation and unnecessary groups processing.

So here, we miss to check whether "identity->mi_ginfo" is valid, just access "identity->mi_ginfo->ngroups" directly, so caused invalide memory accessing.

Comment by nasf (Inactive) [ 17/Aug/11 ]

The patch is available:

http://review.whamcloud.com/#change,1252

Comment by Build Master (Inactive) [ 18/Aug/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #263
LU-599 verify "mi_ginfo" before accessing its member

Oleg Drokin : 2d86bd1e70106afaff5200ba819303b16c2e587d
Files :

  • lustre/mdt/mdt_identity.c
Comment by Build Master (Inactive) [ 18/Aug/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #263
LU-599 verify "mi_ginfo" before accessing its member

Oleg Drokin : 2d86bd1e70106afaff5200ba819303b16c2e587d
Files :

  • lustre/mdt/mdt_identity.c
Comment by Build Master (Inactive) [ 18/Aug/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #263
LU-599 verify "mi_ginfo" before accessing its member

Oleg Drokin : 2d86bd1e70106afaff5200ba819303b16c2e587d
Files :

  • lustre/mdt/mdt_identity.c
Comment by Build Master (Inactive) [ 18/Aug/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #263
LU-599 verify "mi_ginfo" before accessing its member

Oleg Drokin : 2d86bd1e70106afaff5200ba819303b16c2e587d
Files :

  • lustre/mdt/mdt_identity.c
Comment by Build Master (Inactive) [ 18/Aug/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #263
LU-599 verify "mi_ginfo" before accessing its member

Oleg Drokin : 2d86bd1e70106afaff5200ba819303b16c2e587d
Files :

  • lustre/mdt/mdt_identity.c
Comment by Build Master (Inactive) [ 18/Aug/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #263
LU-599 verify "mi_ginfo" before accessing its member

Oleg Drokin : 2d86bd1e70106afaff5200ba819303b16c2e587d
Files :

  • lustre/mdt/mdt_identity.c
Comment by Build Master (Inactive) [ 18/Aug/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #263
LU-599 verify "mi_ginfo" before accessing its member

Oleg Drokin : 2d86bd1e70106afaff5200ba819303b16c2e587d
Files :

  • lustre/mdt/mdt_identity.c
Comment by Build Master (Inactive) [ 18/Aug/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #263
LU-599 verify "mi_ginfo" before accessing its member

Oleg Drokin : 2d86bd1e70106afaff5200ba819303b16c2e587d
Files :

  • lustre/mdt/mdt_identity.c
Comment by Build Master (Inactive) [ 18/Aug/11 ]

Integrated in lustre-master » i686,client,el5,ofa #263
LU-599 verify "mi_ginfo" before accessing its member

Oleg Drokin : 2d86bd1e70106afaff5200ba819303b16c2e587d
Files :

  • lustre/mdt/mdt_identity.c
Comment by Build Master (Inactive) [ 18/Aug/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #263
LU-599 verify "mi_ginfo" before accessing its member

Oleg Drokin : 2d86bd1e70106afaff5200ba819303b16c2e587d
Files :

  • lustre/mdt/mdt_identity.c
Comment by Build Master (Inactive) [ 18/Aug/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #263
LU-599 verify "mi_ginfo" before accessing its member

Oleg Drokin : 2d86bd1e70106afaff5200ba819303b16c2e587d
Files :

  • lustre/mdt/mdt_identity.c
Comment by Build Master (Inactive) [ 18/Aug/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #263
LU-599 verify "mi_ginfo" before accessing its member

Oleg Drokin : 2d86bd1e70106afaff5200ba819303b16c2e587d
Files :

  • lustre/mdt/mdt_identity.c
Comment by Build Master (Inactive) [ 18/Aug/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #263
LU-599 verify "mi_ginfo" before accessing its member

Oleg Drokin : 2d86bd1e70106afaff5200ba819303b16c2e587d
Files :

  • lustre/mdt/mdt_identity.c
Comment by Build Master (Inactive) [ 18/Aug/11 ]

Integrated in lustre-master » i686,server,el5,ofa #263
LU-599 verify "mi_ginfo" before accessing its member

Oleg Drokin : 2d86bd1e70106afaff5200ba819303b16c2e587d
Files :

  • lustre/mdt/mdt_identity.c
Comment by nasf (Inactive) [ 19/Aug/11 ]

Patch is landed to Lustre-2.1 candidate.

Generated at Sat Feb 10 01:08:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.