Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3857

panic in lprocfs_remove_nolock+0x3b/0x100

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.6.0, Lustre 2.5.1
    • Lustre 2.4.0, Lustre 2.5.0, Lustre 2.6.0
    • None
    • b2_4 branch - 2.6.32-358.2.1 kernel
    • 3
    • 10008

    Description

      while testing problems with mount lustre in 2.4 i found a panic with shutdown

      [root@rhel6-64 utils]# ../utils/mkfs.lustre --reformat --mgs --mdt --fsname=wwtest --mgsnode=rhel6-64.shadowland@tcp --index=0 --quiet --backfstype=ldiskfs --param sys.timeout=300 --param lov.stripesize=1048576 --param lov.stripecount=1 --device-size=0 --verbose /dev/sdb1 
      
         Permanent disk data:
      Target:     wwtest:MDT0000
      Index:      0
      Lustre FS:  wwtest
      Mount type: ldiskfs
      Flags:      0x65
                    (MDT MGS first_time update )
      Persistent mount opts: user_xattr,errors=remount-ro
      Parameters: mgsnode=192.168.69.5@tcp sys.timeout=300 lov.stripesize=1048576 lov.stripecount=1
      
      device size = 2048MB
      formatting backing filesystem ldiskfs on /dev/sdb1
      	target name  wwtest:MDT0000
      	4k blocks     524540
      	options        -J size=80 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
      mkfs_cmd = mke2fs -j -b 4096 -L wwtest:MDT0000  -J size=80 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/sdb1 524540
      Writing CONFIGS/mountdata
      [root@rhel6-64 utils]# ../utils/mkfs.lustre --reformat --mgs --mdt --fsname=wwtest --mgsnode=rhel6-64.shadowland@tcp --index=0 --quiet --backfstype=ldiskfs --param sys.timeout=300 --param lov.stripesize=1048576 --param lov.stripecount=1 --device-size=0 --verbose /dev/sdb1 ^C
      [root@rhel6-64 utils]# mount -t lustre -o nosvc,abort_recov,svname=rhel6-64.shadowland /dev/sdb1 /mnt/mdt
      [root@rhel6-64 utils]# umount -f /mnt/mdt
      Timeout, server not responding.
      

      console backtrace

      general protection fault: 0000 [#1] SMP 
      last sysfs file: /sys/devices/pci0000:00/0000:00:15.0/0000:03:00.0/host2/target2:0:1/2:0:1:0/block/sdb/queue/max_sectors_kb
      CPU 7 
      Modules linked in: ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs exportfs mdd mgs lquota jbd mgc fid fld ptlrpc obdclass lvfs ksocklnd lnet sha512_generic sha256_generic crc32c_inte
      l libcfs nfs lockd auth_rpcgss nfs_acl sunrpc cachefiles fscache(T) ib_ipoib ib_cm ipv6 ib_uverbs ib_umad mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core dm_mirror dm_region_hash dm_log dm_mo
      d ppdev vmware_balloon parport_pc parport vmxnet3 i2c_piix4 i2c_core sg shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom vmw_pvscsi pata_acpi ata_generic ata_piix [last unloaded: lmv]
      
      Pid: 10144, comm: umount Tainted: G           ---------------  T 2.6.32-358.2.1.el6 #0 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
      RIP: 0010:[<ffffffffa056aacb>]  [<ffffffffa056aacb>] lprocfs_remove_nolock+0x3b/0x100 [obdclass]
      RSP: 0018:ffff880101e21a58  EFLAGS: 00010202
      RAX: ffff88011954c100 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000
      RDX: 0000000000000001 RSI: ffffffffa056acdd RDI: ffff880138742580
      RBP: ffff880101e21a88 R08: 0000000000000000 R09: 0000000000000001
      R10: 0000000000000000 R11: 0a6e776f64747568 R12: 6b6b6b6b6b6b6b6b
      R13: ffff8801387426d8 R14: 6b6b6b6b6b6b6b6b R15: 0000000000000000
      FS:  00007f77cb076740(0000) GS:ffff88002cc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00007f77ca7961b0 CR3: 0000000102102000 CR4: 00000000000407e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process umount (pid: 10144, threadinfo ffff880101e20000, task ffff88011954c100)
      Stack:
       ffffffffa056acdd ffff8801387426d8 ffff880138742580 ffff880101e21b28
      <d> ffff8801387426d8 ffff880101e21b28 ffff880101e21aa8 ffffffffa056ace5
      <d> ffff8801387424f0 ffff8801387424f0 ffff880101e21ad8 ffffffffa0a18cb0
      Call Trace:
       [<ffffffffa056acdd>] ? lprocfs_remove+0x1d/0x40 [obdclass]
       [<ffffffffa056ace5>] lprocfs_remove+0x25/0x40 [obdclass]
       [<ffffffffa0a18cb0>] qsd_fini+0x80/0x460 [lquota]
       [<ffffffffa0bbd708>] osd_shutdown+0x38/0xe0 [osd_ldiskfs]
       [<ffffffffa0bc52b9>] osd_device_fini+0x129/0x190 [osd_ldiskfs]
       [<ffffffffa058b437>] class_cleanup+0x577/0xda0 [obdclass]
       [<ffffffffa0560f6c>] ? class_name2dev+0x7c/0xe0 [obdclass]
       [<ffffffffa058cd1c>] class_process_config+0x10bc/0x1c80 [obdclass]
       [<ffffffffa03f5e58>] ? libcfs_log_return+0x28/0x40 [libcfs]
       [<ffffffffa0586741>] ? lustre_cfg_new+0x391/0x7e0 [obdclass]
       [<ffffffffa058da59>] class_manual_cleanup+0x179/0x6e0 [obdclass]
       [<ffffffffa03f5e58>] ? libcfs_log_return+0x28/0x40 [libcfs]
       [<ffffffffa0bc6404>] osd_obd_disconnect+0x174/0x1e0 [osd_ldiskfs]
       [<ffffffffa058fa8e>] lustre_put_lsi+0x17e/0xe20 [obdclass]
       [<ffffffffa05982a8>] lustre_common_put_super+0x5d8/0xc20 [obdclass]
       [<ffffffffa05c12ba>] server_put_super+0x1ca/0xe60 [obdclass]
       [<ffffffff811b136a>] ? invalidate_inodes+0xfa/0x180
       [<ffffffff81196afb>] generic_shutdown_super+0x5b/0xe0
       [<ffffffff81196be6>] kill_anon_super+0x16/0x60
       [<ffffffff8119737f>] ? deactivate_super+0x4f/0x80
       [<ffffffffa058f8b6>] lustre_kill_super+0x36/0x60 [obdclass]
       [<ffffffff81197387>] deactivate_super+0x57/0x80
       [<ffffffff811b5acf>] mntput_no_expire+0xbf/0x110
       [<ffffffff811b654b>] sys_umount+0x7b/0x3a0
       [<ffffffff81531b48>] ? lockdep_sys_exit_thunk+0x35/0x67
       [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      Code: 1f 44 00 00 48 8b 1f 48 85 db 74 51 48 c7 07 00 00 00 00 4c 8b 73 48 4d 85 f6 75 0f e9 9a 00 00 00 0f 1f 80 00 00 00 00 4c 89 e3 <4c> 8b 63 50 4d 85 e4 75 f4 4c 8b 6b 08 4c 8b 63 48 4c 89 ef e8 
      RIP  [<ffffffffa056aacb>] lprocfs_remove_nolock+0x3b/0x100 [obdclass]
      

      Attachments

        Issue Links

          Activity

            [LU-3857] panic in lprocfs_remove_nolock+0x3b/0x100
            pjones Peter Jones added a comment -

            Landed for 2.6

            pjones Peter Jones added a comment - Landed for 2.6
            di.wang Di Wang added a comment - http://review.whamcloud.com/8506
            di.wang Di Wang added a comment -

            It seems in osd_device_fini, osd_procfs_fini is being called first to destroy all of proc entry for OSD, but later, osd_shutdown/qsd_fini does it again, which caused the LBUG.

            di.wang Di Wang added a comment - It seems in osd_device_fini, osd_procfs_fini is being called first to destroy all of proc entry for OSD, but later, osd_shutdown/qsd_fini does it again, which caused the LBUG.
            di.wang Di Wang added a comment - - edited

            hmm, I also hit this problem in my local test on master branch. (conf-sanity 17 with 4 MDTs and 2 MDSes )

            Lustre: DEBUG MARKER: == conf-sanity test 17: Verify failed mds_postsetup won't fail assertion (2936) (should return errs) == 19:57:05 (1384487825)
            LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
            LDISKFS-fs (sdb2): mounted filesystem with ordered data mode. quota=on. Opts: 
            LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 172.16.151.128@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
            LustreError: Skipped 52 previous similar messages
            LustreError: 11-0: lustre-MDT0001-osp-MDT0002: Communicating with 172.16.151.128@tcp, operation obd_ping failed with -107.
            LustreError: Skipped 97 previous similar messages
            LustreError: 166-1: MGC172.16.151.30@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
            LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
            LustreError: 15c-8: MGC172.16.151.30@tcp: The configuration from log 'lustre-MDT0000' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
            LustreError: 10584:0:(obd_mount_server.c:1256:server_start_targets()) failed to start server lustre-MDT0000: -2
            LustreError: 10584:0:(obd_mount_server.c:1731:server_fill_super()) Unable to start targets: -2
            LustreError: 10584:0:(obd_mount_server.c:847:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client.
            LustreError: 10584:0:(obd_mount_server.c:1425:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2)
            LustreError: 10584:0:(obd_mount_server.c:1455:server_put_super()) no obd lustre-MDT0000
            general protection fault: 0000 [#1] SMP 
            last sysfs file: /sys/devices/pci0000:00/0000:00:10.0/host0/target0:0:1/0:0:1:0/block/sdb/queue/scheduler
            CPU 1 
            Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U) osd_ldiskfs(U) ldiskfs(U) lquota(U) lfsck(U) jbd obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic crc32c_intel libcfs(U) nfsd exportfs autofs4 nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate dm_mirror dm_region_hash dm_log ppdev snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 parport_pc parport vmware_balloon sg i2c_piix4 i2c_core shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptspi mptscsih mptbase scsi_transport_spi dm_mod [last unloaded: speedstep_lib]
            
            Pid: 10584, comm: mount.lustre Tainted: P           ---------------    2.6.32 #3 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
            RIP: 0010:[<ffffffffa081ca23>]  [<ffffffffa081ca23>] lprocfs_remove_nolock+0x43/0x110 [obdclass]
            RSP: 0018:ffff88006e0578b8  EFLAGS: 00010202
            RAX: ffffffffa089b020 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000
            RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800658251a8
            RBP: ffff88006e0578e8 R08: 616974696e69203a R09: 44535120676e6974
            R10: 732044535120676e R11: 0a6e776f64747568 R12: 6b6b6b6b6b6b6b6b
            R13: ffff8800658252b0 R14: 6b6b6b6b6b6b6b6b R15: ffff8800771f6de0
            FS:  00007f9a2e488700(0000) GS:ffff88000a240000(0000) knlGS:0000000000000000
            CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
            CR2: 00007fc3ae5b9000 CR3: 000000006f953000 CR4: 00000000000006e0
            DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
            DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
            Process mount.lustre (pid: 10584, threadinfo ffff88006e056000, task ffff880036d9c380)
            Stack:
             ffff880000000010 ffff88006e057918 ffff8800658251a8 ffff88006e057988
            <d> ffff8800658252b0 ffff88006e057988 ffff88006e057908 ffffffffa081cc45
            <d> ffff88006e057908 ffff880065825118 ffff88006e057938 ffffffffa0b23dd0
            Call Trace:
             [<ffffffffa081cc45>] lprocfs_remove+0x25/0x40 [obdclass]
             [<ffffffffa0b23dd0>] qsd_fini+0x80/0x450 [lquota]
             [<ffffffffa0d2ec78>] osd_shutdown+0x38/0xe0 [osd_ldiskfs]
             [<ffffffffa0d36bf9>] osd_device_fini+0x129/0x190 [osd_ldiskfs]
             [<ffffffffa0834913>] class_cleanup+0x573/0xd30 [obdclass]
             [<ffffffffa081233c>] ? class_name2dev+0x7c/0xe0 [obdclass]
             [<ffffffffa083663a>] class_process_config+0x156a/0x1ad0 [obdclass]
             [<ffffffffa06be9b8>] ? libcfs_log_return+0x28/0x40 [libcfs]
             [<ffffffffa082f202>] ? lustre_cfg_new+0x312/0x6e0 [obdclass]
             [<ffffffffa0836d19>] class_manual_cleanup+0x179/0x6e0 [obdclass]
             [<ffffffffa06be9b8>] ? libcfs_log_return+0x28/0x40 [libcfs]
             [<ffffffffa0d378b4>] osd_obd_disconnect+0x174/0x1e0 [osd_ldiskfs]
             [<ffffffffa083926b>] lustre_put_lsi+0x1ab/0xeb0 [obdclass]
             [<ffffffffa08414d8>] lustre_common_put_super+0x5c8/0xbe0 [obdclass]
             [<ffffffffa087081d>] server_put_super+0x1bd/0xed0 [obdclass]
             [<ffffffffa0871bbb>] server_fill_super+0x68b/0x1630 [obdclass]
             [<ffffffffa0840bb8>] lustre_fill_super+0x1d8/0x530 [obdclass]
             [<ffffffffa08409e0>] ? lustre_fill_super+0x0/0x530 [obdclass]
             [<ffffffff8117df8f>] get_sb_nodev+0x5f/0xa0
             [<ffffffffa0838bc5>] lustre_get_sb+0x25/0x30 [obdclass]
             [<ffffffff8117dbeb>] vfs_kern_mount+0x7b/0x1b0
             [<ffffffff8117dd92>] do_kern_mount+0x52/0x130
             [<ffffffff8119c642>] do_mount+0x2d2/0x8c0
             [<ffffffff8119ccc0>] sys_mount+0x90/0xe0
             [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
            Code: c7 07 00 00 00 00 48 85 db 74 52 48 81 fb 00 f0 ff ff 77 49 4c 8b 73 48 4d 85 f6 75 0e e9 99 00 00 00 66 0f 1f 44 00 00 4c 89 e3 <4c> 8b 63 50 4d 85 e4 75 f4 4c 8b 6b 08 4c 8b 63 48 4c 89 ef e8 
            RIP  [<ffffffffa081ca23>] lprocfs_remove_nolock+0x43/0x110 [obdclass]
             RSP <ffff88006e0578b8>
            ---[ end trace d6b25d1e4af65ad1 ]---
            
            di.wang Di Wang added a comment - - edited hmm, I also hit this problem in my local test on master branch. (conf-sanity 17 with 4 MDTs and 2 MDSes ) Lustre: DEBUG MARKER: == conf-sanity test 17: Verify failed mds_postsetup won't fail assertion (2936) (should return errs) == 19:57:05 (1384487825) LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (sdb2): mounted filesystem with ordered data mode. quota=on. Opts: LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 172.16.151.128@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 52 previous similar messages LustreError: 11-0: lustre-MDT0001-osp-MDT0002: Communicating with 172.16.151.128@tcp, operation obd_ping failed with -107. LustreError: Skipped 97 previous similar messages LustreError: 166-1: MGC172.16.151.30@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: LustreError: 15c-8: MGC172.16.151.30@tcp: The configuration from log 'lustre-MDT0000' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 10584:0:(obd_mount_server.c:1256:server_start_targets()) failed to start server lustre-MDT0000: -2 LustreError: 10584:0:(obd_mount_server.c:1731:server_fill_super()) Unable to start targets: -2 LustreError: 10584:0:(obd_mount_server.c:847:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client. LustreError: 10584:0:(obd_mount_server.c:1425:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2) LustreError: 10584:0:(obd_mount_server.c:1455:server_put_super()) no obd lustre-MDT0000 general protection fault: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:10.0/host0/target0:0:1/0:0:1:0/block/sdb/queue/scheduler CPU 1 Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U) osd_ldiskfs(U) ldiskfs(U) lquota(U) lfsck(U) jbd obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic crc32c_intel libcfs(U) nfsd exportfs autofs4 nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate dm_mirror dm_region_hash dm_log ppdev snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 parport_pc parport vmware_balloon sg i2c_piix4 i2c_core shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptspi mptscsih mptbase scsi_transport_spi dm_mod [last unloaded: speedstep_lib] Pid: 10584, comm: mount.lustre Tainted: P --------------- 2.6.32 #3 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffffa081ca23>] [<ffffffffa081ca23>] lprocfs_remove_nolock+0x43/0x110 [obdclass] RSP: 0018:ffff88006e0578b8 EFLAGS: 00010202 RAX: ffffffffa089b020 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800658251a8 RBP: ffff88006e0578e8 R08: 616974696e69203a R09: 44535120676e6974 R10: 732044535120676e R11: 0a6e776f64747568 R12: 6b6b6b6b6b6b6b6b R13: ffff8800658252b0 R14: 6b6b6b6b6b6b6b6b R15: ffff8800771f6de0 FS: 00007f9a2e488700(0000) GS:ffff88000a240000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fc3ae5b9000 CR3: 000000006f953000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mount.lustre (pid: 10584, threadinfo ffff88006e056000, task ffff880036d9c380) Stack: ffff880000000010 ffff88006e057918 ffff8800658251a8 ffff88006e057988 <d> ffff8800658252b0 ffff88006e057988 ffff88006e057908 ffffffffa081cc45 <d> ffff88006e057908 ffff880065825118 ffff88006e057938 ffffffffa0b23dd0 Call Trace: [<ffffffffa081cc45>] lprocfs_remove+0x25/0x40 [obdclass] [<ffffffffa0b23dd0>] qsd_fini+0x80/0x450 [lquota] [<ffffffffa0d2ec78>] osd_shutdown+0x38/0xe0 [osd_ldiskfs] [<ffffffffa0d36bf9>] osd_device_fini+0x129/0x190 [osd_ldiskfs] [<ffffffffa0834913>] class_cleanup+0x573/0xd30 [obdclass] [<ffffffffa081233c>] ? class_name2dev+0x7c/0xe0 [obdclass] [<ffffffffa083663a>] class_process_config+0x156a/0x1ad0 [obdclass] [<ffffffffa06be9b8>] ? libcfs_log_return+0x28/0x40 [libcfs] [<ffffffffa082f202>] ? lustre_cfg_new+0x312/0x6e0 [obdclass] [<ffffffffa0836d19>] class_manual_cleanup+0x179/0x6e0 [obdclass] [<ffffffffa06be9b8>] ? libcfs_log_return+0x28/0x40 [libcfs] [<ffffffffa0d378b4>] osd_obd_disconnect+0x174/0x1e0 [osd_ldiskfs] [<ffffffffa083926b>] lustre_put_lsi+0x1ab/0xeb0 [obdclass] [<ffffffffa08414d8>] lustre_common_put_super+0x5c8/0xbe0 [obdclass] [<ffffffffa087081d>] server_put_super+0x1bd/0xed0 [obdclass] [<ffffffffa0871bbb>] server_fill_super+0x68b/0x1630 [obdclass] [<ffffffffa0840bb8>] lustre_fill_super+0x1d8/0x530 [obdclass] [<ffffffffa08409e0>] ? lustre_fill_super+0x0/0x530 [obdclass] [<ffffffff8117df8f>] get_sb_nodev+0x5f/0xa0 [<ffffffffa0838bc5>] lustre_get_sb+0x25/0x30 [obdclass] [<ffffffff8117dbeb>] vfs_kern_mount+0x7b/0x1b0 [<ffffffff8117dd92>] do_kern_mount+0x52/0x130 [<ffffffff8119c642>] do_mount+0x2d2/0x8c0 [<ffffffff8119ccc0>] sys_mount+0x90/0xe0 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b Code: c7 07 00 00 00 00 48 85 db 74 52 48 81 fb 00 f0 ff ff 77 49 4c 8b 73 48 4d 85 f6 75 0e e9 99 00 00 00 66 0f 1f 44 00 00 4c 89 e3 <4c> 8b 63 50 4d 85 e4 75 f4 4c 8b 6b 08 4c 8b 63 48 4c 89 ef e8 RIP [<ffffffffa081ca23>] lprocfs_remove_nolock+0x43/0x110 [obdclass] RSP <ffff88006e0578b8> ---[ end trace d6b25d1e4af65ad1 ]---

            Hit again today.

            ustre: 3905:0:(client.c:1789:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 0/real 0]  req@ffff88020b8d9148 x1447772636058308/t0(0) o101->MGC192.168.69.5@t
            cp@0@lo:26/25 lens 328/344 e 0 to 1 dl 0 ref 2 fl Rpc:EX/0/ffffffff rc -108/-1
            LustreError: 6244:0:(ldlm_resource.c:809:ldlm_resource_complain()) MGC192.168.69.5@tcp: namespace resource [0x65727473756c:0x0:0x0].0 (ffff88023401ab00) refcount nonzero (1) after lock cleanup; 
            forcing cleanup.
            LustreError: 6244:0:(ldlm_resource.c:1448:ldlm_resource_dump()) --- Resource: [0x65727473756c:0x0:0x0].0 (ffff88023401ab00) refcount = 2
            Lustre: 3905:0:(client.c:1789:ptlrpc_expire_one_request()) Skipped 1 previous similar message
            LustreError: 15c-8: MGC192.168.69.5@tcp: The configuration from log 'lustre-MDT0000' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuratio
            n, or other errors. See the syslog for more information.
            Lustre: MGS: Client f0a9a982-4623-afea-e5fb-240161f63073 (at 0@lo) reconnecting
            Lustre: MGS: Client f0a9a982-4623-afea-e5fb-240161f63073 (at 0@lo) refused reconnection, still busy with 1 active RPCs
            LustreError: 6244:0:(obd_mount_server.c:1257:server_start_targets()) failed to start server lustre-MDT0000: -5
            LustreError: 6244:0:(obd_mount_server.c:1732:server_fill_super()) Unable to start targets: -5
            LustreError: 6244:0:(obd_mount_server.c:848:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client.
            LustreError: 6244:0:(obd_mount_server.c:1426:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2)
            LustreError: 6244:0:(obd_mount_server.c:1456:server_put_super()) no obd lustre-MDT0000
            general protection fault: 0000 [#1] SMP 
            last sysfs file: /sys/devices/system/cpu/possible
            CPU 0 
            Modules linked in: lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs exportfs mdd mgs lquota lfsck jbd obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet sha512_g
            eneric sha256_generic crc32c_intel libcfs nfs lockd auth_rpcgss nfs_acl sunrpc cachefiles fscache(T) ib_ipoib ib_cm ipv6 ib_uverbs ib_umad mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core dm_mirro
            r dm_region_hash dm_log dm_mod ppdev parport_pc parport vmware_balloon microcode vmxnet3 i2c_piix4 i2c_core sg shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom vmw_pvscsi pata_acpi ata_ge
            neric ata_piix [last unloaded: scsi_wait_scan]
            
            Pid: 6244, comm: mount.lustre Tainted: G           ---------------  T 2.6.32-358.18.1-lustre #0 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
            RIP: 0010:[<ffffffffa0562013>]  [<ffffffffa0562013>] lprocfs_remove_nolock+0x43/0x110 [obdclass]
            
            
            shadow Alexey Lyashkov added a comment - Hit again today. ustre: 3905:0:(client.c:1789:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 0/real 0] req@ffff88020b8d9148 x1447772636058308/t0(0) o101->MGC192.168.69.5@t cp@0@lo:26/25 lens 328/344 e 0 to 1 dl 0 ref 2 fl Rpc:EX/0/ffffffff rc -108/-1 LustreError: 6244:0:(ldlm_resource.c:809:ldlm_resource_complain()) MGC192.168.69.5@tcp: namespace resource [0x65727473756c:0x0:0x0].0 (ffff88023401ab00) refcount nonzero (1) after lock cleanup; forcing cleanup. LustreError: 6244:0:(ldlm_resource.c:1448:ldlm_resource_dump()) --- Resource: [0x65727473756c:0x0:0x0].0 (ffff88023401ab00) refcount = 2 Lustre: 3905:0:(client.c:1789:ptlrpc_expire_one_request()) Skipped 1 previous similar message LustreError: 15c-8: MGC192.168.69.5@tcp: The configuration from log 'lustre-MDT0000' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuratio n, or other errors. See the syslog for more information. Lustre: MGS: Client f0a9a982-4623-afea-e5fb-240161f63073 (at 0@lo) reconnecting Lustre: MGS: Client f0a9a982-4623-afea-e5fb-240161f63073 (at 0@lo) refused reconnection, still busy with 1 active RPCs LustreError: 6244:0:(obd_mount_server.c:1257:server_start_targets()) failed to start server lustre-MDT0000: -5 LustreError: 6244:0:(obd_mount_server.c:1732:server_fill_super()) Unable to start targets: -5 LustreError: 6244:0:(obd_mount_server.c:848:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client. LustreError: 6244:0:(obd_mount_server.c:1426:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2) LustreError: 6244:0:(obd_mount_server.c:1456:server_put_super()) no obd lustre-MDT0000 general protection fault: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/possible CPU 0 Modules linked in: lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs exportfs mdd mgs lquota lfsck jbd obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet sha512_g eneric sha256_generic crc32c_intel libcfs nfs lockd auth_rpcgss nfs_acl sunrpc cachefiles fscache(T) ib_ipoib ib_cm ipv6 ib_uverbs ib_umad mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core dm_mirro r dm_region_hash dm_log dm_mod ppdev parport_pc parport vmware_balloon microcode vmxnet3 i2c_piix4 i2c_core sg shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom vmw_pvscsi pata_acpi ata_ge neric ata_piix [last unloaded: scsi_wait_scan] Pid: 6244, comm: mount.lustre Tainted: G --------------- T 2.6.32-358.18.1-lustre #0 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffffa0562013>] [<ffffffffa0562013>] lprocfs_remove_nolock+0x43/0x110 [obdclass]

            People

              di.wang Di Wang
              shadow Alexey Lyashkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: