Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14523

can't mount more than 13 lustre filesystems on client.

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • None
    • Lustre 2.12.5
    • None
    • 2
    • 9223372036854775807

    Description

      When trying to mount more than 12 filesystem we get this error. Again if the count is less than 12 this filesystem mount just fine.

       [1471948.869092] ------------[ cut here ]------------
      [1471948.874143] WARNING: CPU: 14 PID: 59764 at ../fs/sysfs/dir.c:31 sysfs_warn_dup+0x53/0x70
      [1471948.882661] Modules linked in: tcp_diag(E) udp_diag(E) raw_diag(E) inet_diag(E) unix_diag(E) iptable_nat(E) nf_nat_ipv4(E) nf_nat(E) binfmt_misc(E) fuse(E) beegfs(OEN) mgc(OEN) rpcsec_gss_krb5(E) auth_rpcgss(E) lustre(OEN) lmv(OEN) mdc(OEN) fid(OEN) osc(OEN) lov(OEN) fld(OEN) nfsv4(E) dns_resolver(E) ko2iblnd(OEN) ptlrpc(OEN) obdclass(OEN) lnet(OEN) nfsv3(E) nfs_acl(E) nfs(E) lockd(E) grace(E) fscache(E) libcfs(OEN) rdma_ucm(OEX) ib_ucm(OEX) rdma_cm(OEX) iw_cm(OEX) configfs(E) ib_ipoib(OEX) ib_cm(OEX) ib_uverbs(OEX) ib_umad(OEX) bonding(E) iscsi_ibft(E) iscsi_boot_sysfs(E) nf_log_ipv6(E) nf_log_common(E) xt_LOG(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_filter(E) ip6_tables(E) xt_tcpudp(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) xt_conntrack(E) iptable_filter(E) xt_CT(E) nf_conntrack(E) libcrc32c(E)
      [1471948.954397]  iptable_raw(E) ip_tables(E) x_tables(E) mlx4_ib(OEX) ib_core(OEX) csiostor(E) tcp_bic(EN) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) ipmi_ssif(E) mlx4_core(OEX) kvm_intel(E) kvm(E) cxgb4(E) iTCO_wdt(E) iTCO_vendor_support(E) ipmi_si(E) mlx_compat(OEX) irqbypass(E) mei_me(E) crc32_pclmul(E) mei(E) ghash_clmulni_intel(E) pcc_cpufreq(E) ipmi_devintf(E) igb(E) devlink(E) ptp(E) acpi_cpufreq(E) pcbc(E) ioatdma(E) aesni_intel(E) ipmi_msghandler(E) aes_x86_64(E) pcspkr(E) crypto_simd(E) glue_helper(E) cryptd(E) i2c_i801(E) scsi_transport_fc(E) pps_core(E) lpc_ich(E) dca(E) mfd_core(E) sunrpc(E) ext4(E) crc16(E) jbd2(E) mbcache(E) sd_mod(E) sr_mod(E) cdrom(E) mgag200(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) isci(EX) sysimgblt(E) fb_sys_fops(E)
      [1471949.025613]  ahci(E) libsas(E) ttm(E) libahci(E) scsi_transport_sas(E) drm(E) crc32c_intel(E) serio_raw(E) drm_panel_orientation_quirks(E) libata(E) wmi(E) button(E) hwperf(OEX) numatools(OEX) xpmem(OEX) gru(OEX) xvma(OEX) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E)
      [1471949.054598] Supported: No, Unreleased kernel
      [1471949.059305] CPU: 14 PID: 59764 Comm: llog_process_th Tainted: G           OE      4.12.14-122.60.1.20210209-nasa #1 SLE12-SP5 (unreleased)
      [1471949.072154] Hardware name: SGI.COM C1104-RP7/X9DRW-3LN4F+/X9DRW-3TF+, BIOS 3.00 09/12/2013
      [1471949.080839] task: ffff9ab3ae438400 task.stack: ffffb8f38bed4000
      [1471949.087188] RIP: 0010:sysfs_warn_dup+0x53/0x70
      [1471949.092062] RSP: 0018:ffffb8f38bed7990 EFLAGS: 00010296
      [1471949.097714] RAX: 0000000000000038 RBX: ffff9ab476e76000 RCX: 0000000000000000
      [1471949.105271] RDX: 0000000000000001 RSI: ffff9ab75f397948 RDI: ffff9ab75f397948
      [1471949.112829] RBP: ffffb8f38bed79e1 R08: 0000000000000000 R09: 000000000000782b
      [1471949.120387] R10: 0000000000000000 R11: ffffb8f38bed7708 R12: ffff9aa8a4af4780
      [1471949.127953] R13: 0000000000000001 R14: ffffffffffffffef R15: 0000000000000000
      [1471949.135509] FS:  0000000000000000(0000) GS:ffff9ab75f380000(0000) knlGS:0000000000000000
      [1471949.144020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [1471949.150192] CR2: 00002aaaaaaf0000 CR3: 00000007c700a004 CR4: 00000000000606e0
      [1471949.157749] Call Trace:
      [1471949.160634]  sysfs_do_create_link_sd.isra.2+0xa3/0xb0
      [1471949.166117]  device_add+0x54d/0x610
      [1471949.170046]  ? kstrdup+0x45/0x50
      [1471949.173705]  device_create_groups_vargs+0xd0/0x100
      [1471949.178923]  device_create_with_groups+0x36/0x40
      [1471949.183970]  ? snprintf+0x39/0x40
      [1471949.187725]  misc_register+0xe1/0x180
      [1471949.191832]  mdc_changelog_cdev_init+0x440/0x560 [mdc]
      [1471949.197405]  mdc_setup+0x24e/0x5a0 [mdc]
      [1471949.201765]  mdc_device_alloc+0xa5/0x230 [mdc]
      [1471949.206675]  obd_setup+0xb8/0x230 [obdclass]
      [1471949.211398]  class_setup+0x299/0x800 [obdclass]
      [1471949.216374]  class_process_config+0x1a3e/0x2790 [obdclass]
      [1471949.222294]  ? vsnprintf+0x3f8/0x510
      [1471949.226321]  ? libcfs_debug_msg+0x47/0x50 [libcfs]
      [1471949.231543]  ? __kmalloc+0x15c/0x210
      [1471949.235570]  class_config_llog_handler+0x3ac/0x11c0 [obdclass]
      [1471949.241845]  llog_process_thread+0x7ba/0x1880 [obdclass]
      [1471949.247587]  ? __switch_to_asm+0x35/0x70
      [1471949.251944]  ? __switch_to_asm+0x41/0x70
      [1471949.256321]  ? keys_fill+0xf0/0x180 [obdclass]
      [1471949.261215]  llog_process_thread_daemonize+0x8b/0xb0 [obdclass]
      [1471949.267562]  kthread+0xf6/0x130
      [1471949.271156]  ? llog_backup+0x4e0/0x4e0 [obdclass]
      [1471949.276288]  ? kthread_bind+0x10/0x10
      [1471949.280379]  ret_from_fork+0x35/0x40
      [1471949.284391] Code: 48 89 c3 74 12 b9 00 10 00 00 48 89 c2 31 f6 4c 89 e7 e8 31 ca ff ff 48 89 ea 48 89 de 48 c7 c7 a0 d4 e6 85 31 c0 e8 03 c6 ed ff <0f> 0b 48 89 df 5b 5d 41 5c e9 1f c3 f4 ff 66 66 2e 0f 1f 84 00 
      [1471949.303675] ---[ end trace ca2a379fedb3af70 ]---
      [1471949.308840] LustreError: 59764:0:(mdc_request.c:2775:mdc_setup()) nbp19-MDT0001-mdc-ffff9aaadd82b000: failed to setup changelog char device: rc = -17
      [1471949.324642] LustreError: 59764:0:(obd_config.c:559:class_setup()) setup nbp19-MDT0001-mdc-ffff9aaadd82b000 failed (-17)
      [1471949.335935] LustreError: 59764:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.151.27.174@o2ib: cfg command failed: rc = -17
      [1471949.348304] Lustre:    cmd=cf003 0:nbp19-MDT0001-mdc  1:nbp19-MDT0001_UUID  2:10.151.27.175@o2ib  [1471949.359727] LustreError: 15c-8: MGC10.151.27.174@o2ib: The configuration from log 'nbp19-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      [1471949.679857] LustreError: 59760:0:(obd_config.c:610:class_cleanup()) Device 1249 not setup
      [1471949.688542] Lustre: Unmounted nbp19-client
      [1471949.695063] LustreError: 59760:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-17)
      

      Attachments

        Issue Links

          Activity

            [LU-14523] can't mount more than 13 lustre filesystems on client.
            pjones Peter Jones added a comment -

            b2_12 port pushed here - https://review.whamcloud.com/42087

            pjones Peter Jones added a comment - b2_12 port pushed here - https://review.whamcloud.com/42087

            We need a 2.12.5 and 2.12.6 back port.

            mhanafi Mahmoud Hanafi added a comment - We need a 2.12.5 and 2.12.6 back port.

            56 MDTs is just the upper limit, depending on how many other "misc char devices" are registered with the kernel already.

            adilger Andreas Dilger added a comment - 56 MDTs is just the upper limit, depending on how many other "misc char devices" are registered with the kernel already.

            We are at 37 MDTs, mounting the last filesystem would put us at 53. That's is less than the 56MDTs

             

            mhanafi Mahmoud Hanafi added a comment - We are at 37 MDTs, mounting the last filesystem would put us at 53. That's is less than the 56MDTs  

            I suspect this is a duplicate of LU-12506, with the important chage being patch: https://review.whamcloud.com/37759 "LU-12506 changelog: support large number of MDT".

            While you may not have a single filesystem with more than 56 MDTs, I'd guess that in total the 13 filesystems have 4 or more MDTs each?

            adilger Andreas Dilger added a comment - I suspect this is a duplicate of LU-12506 , with the important chage being patch: https://review.whamcloud.com/37759 " LU-12506 changelog: support large number of MDT ". While you may not have a single filesystem with more than 56 MDTs, I'd guess that in total the 13 filesystems have 4 or more MDTs each?

            People

              pjones Peter Jones
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: