[LU-7590] After downgrade system from master RHEL7 to 2.5.5 RHEL6.6, hit cannot access /mnt/lustre: Permission denied Created: 21/Dec/15 Updated: 12/Jan/18 Resolved: 12/Jan/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | Dmitry Eremin (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
before upgrade: 2.5.5 RHEL6.6 ldiskfs |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
after clean downgrade from master/ #3264 RHEL7 to 2.5.5 RHEL6.6, cannot access /mnt/lustre This issue also happened with zfs client console [root@onyx-27 ~]# mount /dev/sda1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) nfsd on /proc/fs/nfsd type nfsd (rw) onyx-25:/lustre on /mnt/lustre type lustre (rw,user_xattr) [root@onyx-27 ~]# pwd /root [root@onyx-27 ~]# ls /mnt/luLustreError: 11808:0:(mdc_locks.c:918:mdc_enqueue()) ldlm_cli_enqueue: -13 LustreError: 11808:0:(file.c:3128:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -13 stre ls: cannot access /mnt/lustre: Permission denied [root@onyx-27 ~]# MDS dmesg alg: No test for crc32 (crc32-pclmul) Lustre: Lustre: Build Version: 2.5.5-RC2--PRISTINE-2.6.32-504.23.4.el6_lustre.x86_64 LNet: Added LNI 10.2.4.47@tcp [8/256/0/180] LNet: Accept secure, port 988 LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: MGC10.2.4.47@tcp: Connection restored to MGS (at 0@lo) Lustre: lustre-MDT0000: used disk, loading LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11. LustreError: 11876:0:(lfsck_namespace.c:154:lfsck_namespace_load()) lustre-MDT0000-o: fail to load lfsck_namespace, expected = 256, rc = 4 Lustre: lustre-MDT0000-lwp-MDT0000: Connection restored to lustre-MDT0000 (at 0@lo) Lustre: 11949:0:(client.c:1943:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1450734129/real 1450734129] req@ffff88082d934c00 x1521204989001764/t0(0) o8->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:28/4 lens 400/544 e 0 to 1 dl 1450734134 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 11949:0:(client.c:1943:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1450734154/real 1450734154] req@ffff88082d934400 x1521204989001888/t0(0) o8->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:28/4 lens 400/544 e 0 to 1 dl 1450734164 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 11949:0:(client.c:1943:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1450734169/real 1450734169] req@ffff88082d473000 x1521204989001916/t0(0) o8->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:28/4 lens 400/544 e 0 to 1 dl 1450734184 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: lustre-OST0000-osc-MDT0000: Connection restored to lustre-OST0000 (at 10.2.4.56@tcp) LustreError: 12091:0:(mdt_identity.c:136:mdt_identity_do_upcall()) lustre-MDT0000: error invoking upcall /sbin/l_getidentity lustre-MDT0000 0: rc -2; check /proc/fs/lustre/mdt/lustre-MDT0000/identity_upcall, time 312us [root@onyx-25 ~]# |
| Comments |
| Comment by Sarah Liu [ 22/Dec/15 ] |
|
I ran the exact same test but upgrade from 2.5.5 RHEL6.6 to master RHEL6.7 and then downgrade, didn't hit this problem. |
| Comment by Andreas Dilger [ 22/Dec/15 ] |
|
The -13 = -EPERM, which is what would be expected if /sbin/l_get_identity wasn't found (in the old days this returned -EIDRM, which was much easier to diagnose, but also confused users): LustreError: 12091:0:(mdt_identity.c:136:mdt_identity_do_upcall()) lustre-MDT0000: error invoking upcall /sbin/l_getidentity lustre-MDT0000 0: rc -2; check /proc/fs/lustre/mdt/lustre-MDT0000/identity_upcall, time 312us The -2 = -ENOENT, which means the /sbin/l_getidentity couldn't be found for some reason? The other error is about revalidating FID [0x200000007:0x1:0x0 which is FID_SEQ_ROOT, but I suspect that is just printed because it got an error when checking permissions on the root directory. |
| Comment by Joseph Gmitter (Inactive) [ 23/Dec/15 ] |
|
Hi Dmitry, |
| Comment by Sarah Liu [ 30/Dec/15 ] |
|
Before downgrade from master RHEL7.1 to lower version lustre, according to the comment mentioned in [root@onyx-25 ~]# mount -t lustre -o abort_recovery /dev/sdb1 /mnt/mds1 [ 3437.966195] LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache [ 3438.206440] Lustre: MGS: Connection restored to MGC10.2.4.47@tcp_0 (at 0@lo) [ 3438.217099] Lustre: Skipped 4 previous similar messages [ 3439.039274] LustreError: 10157:0:(mdt_handler.c:5603:mdt_iocontrol()) lustre-MDT0000: Aborting recovery for device [root@onyx-25 ~]# [ 3443.513614] Lustre: 3306:0:(client.c:1994:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1451508192/real 1451508192] req@ffff8804026ac800 x1522015822538812/t0(0) o8->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:28/4 lens 520/544 e 0 to 1 dl 1451508197 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 3443.554336] Lustre: lustre-MDT0000: Connection restored to 10.2.4.47@tcp (at 0@lo) [root@onyx-25 ~]# mount|grep lustre /dev/sdb1 on /mnt/mds1 type lustre (ro) [root@onyx-25 ~]# rpm -qa|grep lustre lustre-osd-ldiskfs-mount-2.7.64-3.10.0_229.20.1.el7_lustre.x86_64.x86_64 lustre-osd-ldiskfs-2.7.64-3.10.0_229.20.1.el7_lustre.x86_64.x86_64 lustre-iokit-2.7.64-3.10.0_229.20.1.el7_lustre.x86_64.x86_64 lustre-modules-2.7.64-3.10.0_229.20.1.el7_lustre.x86_64.x86_64 lustre-2.7.64-3.10.0_229.20.1.el7_lustre.x86_64.x86_64 lustre-tests-2.7.64-3.10.0_229.20.1.el7_lustre.x86_64.x86_64 kernel-3.10.0-229.20.1.el7_lustre.x86_64 [root@onyx-25 ~]# |
| Comment by Dmitry Eremin (Inactive) [ 29/Jan/16 ] |
|
I think this happens because of during mkfs.lustre on RHEL7 used wrong path to l_getidentity. There is a symbolic link /sbin -> /usr/sbin on RHEL7.x but this is not true for RHEL6.x. Originally --param=mdt.identity_upcall=/sbin/l_getidentity was specified or auto discovered. This parameter should be changed or correct link to this binary is provided. |
| Comment by Dmitry Eremin (Inactive) [ 29/Jan/16 ] |
|
If the Lustre FS was created by llmount.sh script then we will have this issue. In this script if L_GETIDENTITY environment is undefined it set with `which l_getidentity`. Therefore on RHEL 7.x which utility will return /sbin/l_getidentity instead of /usr/sbin/l_getidentity on RHEL 6.x. As workaround you can run llmount.sh script with L_GETIDENTITY environment set explicitly. For example: L_GETIDENTITY=/usr/sbin/l_getidentity llmount.sh |
| Comment by Sarah Liu [ 29/Jan/16 ] |
|
Thank you for the suggestion, I will try the workaround method. |
| Comment by Dmitry Eremin (Inactive) [ 12/Jan/18 ] |
|
I hope this workaround works. If you are disagree please reopen this ticket. |