Details
Description
Mounting a 2.1.6 LFS from 1.8.9 client over lnet@tcp (10GbE) would cause an instant deadlock of the client.
Coredump shows kernel bug / selinux conflict with mount operation.
Pertinent core dump dmesg output:
---cut--- <5>Registering the id_resolver key type <5>FS-Cache: Netfs 'nfs' registered for caching <7>SELinux: initialized (dev 0:1b, type nfs), uses genfs_contexts <7>SELinux: initialized (dev 0:1c, type nfs), uses genfs_contexts <7>SELinux: initialized (dev autofs, type autofs), uses genfs_contexts <7>SELinux: initialized (dev autofs, type autofs), uses genfs_contexts <7>SELinux: initialized (dev autofs, type autofs), uses genfs_contexts <7>SELinux: initialized (dev autofs, type autofs), uses genfs_contexts <6>Installing knfsd (copyright (C) 1996 okir@monad.swb.de). <7>SELinux: initialized (dev nfsd, type nfsd), uses genfs_contexts <4>NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory <6>NFSD: starting 90-second grace period <6>Lustre: Build Version: jenkins-wc1--PRISTINE-2.6.32-279.19.1.el6.x86_64 <6>Lustre: Added LNI 192.52.98.54@tcp [8/256/0/180] <6>Lustre: Accept secure, port 988 <6>Lustre: Lustre Client File System; http://www.lustre.org/ <4>Lustre: MGC192.52.98.30@tcp: Reactivating import <4>Lustre: Client hpfs-eg3-client(ffff880335b66800) mount complete <7>SELinux: initialized (dev lustre, type lustre), uses xattr <5>Bridge firewalling registered <6>device virbr0-nic entered promiscuous mode <6>virbr0: starting userspace STP failed, starting kernel STP <6>ip_tables: (C) 2000-2006 Netfilter Core Team <4>nf_conntrack version 0.5.0 (16384 buckets, 65536 max) <6>Ebtables v2.0 registered <6>ip6_tables: (C) 2000-2006 Netfilter Core Team <7>SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs <7>SELinux: initialized (dev proc, type proc), uses genfs_contexts <7>SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs <7>SELinux: initialized (dev proc, type proc), uses genfs_contexts <7>SELinux: initialized (dev 0:22, type nfs4), uses genfs_contexts <7>SELinux: initialized (dev 0:23, type nfs4), uses genfs_contexts <7>SELinux: initialized (dev 0:24, type nfs4), uses genfs_contexts <6>fuse init (API version 7.13) <7>SELinux: initialized (dev fuse, type fuse), uses genfs_contexts <4>Lustre: MGC192.52.98.142@tcp: Reactivating import <4>Lustre: Server MGS version (2.1.6.0) is much newer than client version (1.8.9) <4>Lustre: 9039:0:(obd_config.c:1127:class_config_llog_handler()) skipping 'lmv' config: cmd=cf001,clilmv:lmv <3>LustreError: 11-0: an error occurred while communicating with 192.52.98.142@tcp. The mds_connect operation failed with -16 <3>LustreError: 11-0: an error occurred while communicating with 192.52.98.142@tcp. The mds_connect operation failed with -16 <3>LustreError: 11-0: an error occurred while communicating with 192.52.98.142@tcp. The mds_connect operation failed with -16 <3>LustreError: 11-0: an error occurred while communicating with 192.52.98.142@tcp. The mds_connect operation failed with -16 <3>LustreError: 11-0: an error occurred while communicating with 192.52.98.142@tcp. The mds_connect operation failed with -16 <3>LustreError: 11-0: an error occurred while communicating with 192.52.98.142@tcp. The mds_connect operation failed with -16 <3>LustreError: 11-0: an error occurred while communicating with 192.52.98.142@tcp. The mds_connect operation failed with -16 <3>LustreError: Skipped 1 previous similar message <3>LustreError: 11-0: an error occurred while communicating with 192.52.98.142@tcp. The mds_connect operation failed with -16 <3>LustreError: Skipped 2 previous similar messages <3>LustreError: 11-0: an error occurred while communicating with 192.52.98.142@tcp. The mds_connect operation failed with -16 <3>LustreError: Skipped 1 previous similar message <4>Lustre: Server hpfs2eg3-MDT0000_UUID version (2.1.6.0) is much newer than client version (1.8.9) <6>Lustre: client supports 64-bits dir hash/offset! <4>Lustre: Client hpfs2eg3-client(ffff880635520000) mount complete <7>SELinux: initialized (dev lustre, type lustre), uses xattr <4>------------[ cut here ]------------ <2>kernel BUG at security/selinux/ss/services.c:625! <4>invalid opcode: 0000 [#1] SMP <4>last sysfs file: /sys/kernel/mm/ksm/run <4>CPU 4 <4>Modules linked in: fuse ip6_tables ebtable_nat ebtables xt_state ipt_MASQUERADE ipt_REJECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack xt_CHECKSUM iptable_mangle nf_defrag_ipv4 iptable_filter ip_tables bridge stp llc mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfsd exportfs autofs4 nfs lockd fscache nfs_acl auth_rpcgss sunrpc ext3 jbd tcp_bic vhost_net macvtap macvlan tun kvm_intel kvm uinput nvidia(P)(U) sg microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000e ixgbe mdio snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc ioatdma dca i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom mpt2sas scsi_transport_sas raid_class pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> <4>Pid: 7217, comm: gvfsd-trash Tainted: P --------------- 2.6.32-279.19.1.el6.centos.plus.x86_64 #1 Supermicro X8DTL/X8DTL <4>RIP: 0010:[<ffffffff8122a86b>] [<ffffffff8122a86b>] context_struct_compute_av+0x40b/0x420 <4>RSP: 0018:ffff8806366a5c08 EFLAGS: 00010246 <4>RAX: 0000000000000000 RBX: ffff8806366a5d98 RCX: 0000000000000100 <4>RDX: 0000000000000f3c RSI: 00000000ffffffff RDI: 0000000000000010 <4>RBP: ffff8806366a5c88 R08: 0000000000013570 R09: ffff8806366a5d98 <4>R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000007 <4>R13: ffff880324960948 R14: 0000000000000769 R15: 00000000000007cb <4>FS: 00007fadb81f17a0(0000) GS:ffff880028280000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>CR2: 000000000199e428 CR3: 0000000636fdd000 CR4: 00000000000006e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process gvfsd-trash (pid: 7217, threadinfo ffff8806366a4000, task ffff880634921500) <4>Stack: <4> ffff880637fc5380 0007ffff00000007 ffff88062b2a0548 ffff880324960948 <4><d> ffff8806319fa060 ffff88031c981c00 0000000000000000 0000000000000000 <4><d> 00070007366a5d28 0000000001002fce ffff8806366a5c68 ffff8806366a5d98 <4>Call Trace: <4> [<ffffffff8122ad45>] security_compute_av+0xf5/0x2c0 <4> [<ffffffff812142be>] avc_has_perm_noaudit+0x14e/0x470 <4> [<ffffffffa124850b>] ? __ll_inode_revalidate_it+0x15b/0x640 [lustre] <4> [<ffffffff8121462b>] avc_has_perm+0x4b/0x90 <4> [<ffffffff812163f4>] inode_has_perm+0x54/0xa0 <4> [<ffffffff812164b2>] selinux_inode_permission+0x72/0xb0 <4> [<ffffffff8120dc5f>] security_inode_permission+0x1f/0x30 <4> [<ffffffff81182b5f>] inode_permission+0xaf/0xd0 <4> [<ffffffff811b81c2>] sys_inotify_add_watch+0xa2/0x450 <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b <4>Code: ff ff ff e8 48 00 e4 ff 85 c0 0f 84 34 ff ff ff 0f b7 75 8e 48 c7 c7 70 7b 7b 81 31 c0 e8 e7 5d 2c 00 e9 1d ff ff ff 0f 0b eb fe <0f> 0b 0f 1f 00 eb fb 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 <1>RIP [<ffffffff8122a86b>] context_struct_compute_av+0x40b/0x420 <4> RSP <ffff8806366a5c08> ---end---
Issue resolved by disabling selinux on Lustre client node (via selinux=0 at kernel cmdline).
As the issue is resolved by disabling selinux this ticket filed so it is searchable for problem resolution.