Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-935

Crash lquota:dquot_create_oqaq+0x28f/0x510

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 1.8.x (1.8.0 - 1.8.5)

    Description

      The Lustre infrastructure is based on two HP Blade Server with an
      Hitachi Shared Storage. On the first server we have MDS, MGS, OST0/1/2,
      on the second server we have OST3/4..
      The first server is osiride-lp-030 and the second is osiride-lp-031.
      The clustering of these services are based on Red Hat Cluster Suite.
      The crash of the Lustre infrastructure is daily and we experience in the
      log these dumps:

      Dec 9 11:27:08 osiride-lp-030 kernel: BUG: soft lockup - CPU#8 stuck for 10s! [ll_mdt_06:21936]
      Dec 9 11:27:08 osiride-lp-030 kernel: CPU 8:
      Dec 9 11:27:08 osiride-lp-030 kernel: Modules linked in: obdfilter(U) ost(U) mds(U) fsfilt_ldiskfs(U) mgs(U) mgc(U) ldiskfs(U) crc16(U) lock_dlm(U) gfs2(U)
      dlm(U) configfs(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ptlrpc(U) obdclass(U) lvfs(U) lnet(U) libcfs(U) bonding(U) ipv6(U) xfrm_nalgo(U) cryp
      to_api(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U)
      ac(U) dm_round_robin(U) dm_multipath(U) scsi_dh(U) parport_pc(U) lp(U) parport(U) joydev(U) bnx2x(U) sg(U) amd64_edac_mod(U) shpchp(U) bnx2(U) serio_raw(U) t
      g3(U) pcspkr(U) edac_mc(U) hpilo(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_mem_cache(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_log(U) dm_mod(U) u
      sb_storage(U) qla2xxx(U) scsi_transport_fc(U) cciss(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
      Dec 9 11:27:08 osiride-lp-030 kernel: Pid: 21936, comm: ll_mdt_06 Tainted: G 2.6.18-194.17.1.el5_lustre.20110315140510 #1
      Dec 9 11:27:08 osiride-lp-030 kernel: RIP: 0010:[<ffffffff8882a270>] [<ffffffff8882a270>] :lquota:dquot_create_oqaq+0x2b0/0x510
      Dec 9 11:27:08 osiride-lp-030 kernel: RSP: 0018:ffff8104484e3ac0 EFLAGS: 00000246
      Dec 9 11:27:08 osiride-lp-030 kernel: RAX: 0000000000000000 RBX: ffff81041eee3ef0 RCX: 000000000000000c
      Dec 9 11:27:08 osiride-lp-030 kernel: RDX: 0000000000000000 RSI: 0000000000001400 RDI: 0000000000001400
      Dec 9 11:27:08 osiride-lp-030 kernel: RBP: 0000000000000004 R08: 000000000000000c R09: 0000000001000000
      Dec 9 11:27:08 osiride-lp-030 kernel: R10: 000000000000000c R11: 0000000000500000 R12: ffffffffffffffff
      Dec 9 11:27:08 osiride-lp-030 kernel: R13: 003fffffffffffff R14: 0000000000000282 R15: ffff81041eee3f00
      Dec 9 11:27:08 osiride-lp-030 kernel: FS: 00002b6411676230(0000) GS:ffff81010fc954c0(0000) knlGS:00000000f6cf2b90
      Dec 9 11:27:08 osiride-lp-030 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      Dec 9 11:27:08 osiride-lp-030 kernel: CR2: 00000000f6140000 CR3: 0000000000201000 CR4: 00000000000006e0
      Dec 9 11:27:08 osiride-lp-030 kernel:
      Dec 9 11:27:08 osiride-lp-030 kernel: Call Trace:
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff8882ad69>] :lquota:lustre_dqget+0x679/0x7e0
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff8882b086>] :lquota:init_oqaq+0x56/0x1c0
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff8883285e>] :lquota:mds_set_dqblk+0x8de/0x2010
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff88732fd3>] :ptlrpc:ptl_send_buf+0x3f3/0x5b0
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff8873b94a>] :ptlrpc:lustre_pack_reply_flags+0x86a/0x950
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff80150d56>] __next_cpu+0x19/0x28
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff88823e9a>] :lquota:mds_quota_ctl+0x16a/0x3c0
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff8873ba59>] :ptlrpc:lustre_pack_reply+0x29/0xb0
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff88afe78f>] :mds:mds_handle+0x3d7f/0x4d10
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff800767ae>] smp_send_reschedule+0x4e/0x53
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff8008c92d>] enqueue_task+0x41/0x56
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff8873da35>] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff887473b9>] :ptlrpc:ptlrpc_server_handle_request+0x989/0xe00
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff88747b15>] :ptlrpc:ptlrpc_wait_event+0x2e5/0x310
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff8008b3bd>] __wake_up_common+0x3e/0x68
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff88748ac8>] :ptlrpc:ptlrpc_main+0xf88/0x1150
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff8008c92d>] enqueue_task+0x41/0x56
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff8873da35>] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff887473b9>] :ptlrpc:ptlrpc_server_handle_request+0x989/0xe00
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff88747b15>] :ptlrpc:ptlrpc_wait_event+0x2e5/0x310
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff8008b3bd>] __wake_up_common+0x3e/0x68
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff88748ac8>] :ptlrpc:ptlrpc_main+0xf88/0x1150
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff88747b40>] :ptlrpc:ptlrpc_main+0x0/0x1150
      Dec 9 11:27:08 osiride-lp-030 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11
      Dec 9 11:27:08 osiride-lp-030 kernel:
      Dec 9 11:27:15 osiride-lp-030 kernel: Lustre: Service thread pid 23639 was inactive for 218.00s. Watchdog stack traces are limited to 3 per 300 seconds, sk pping this one.

      This saturates the resources of the server and the clients are unable to
      access to the filesystem.

      Regards

      Attachments

        Issue Links

          Activity

            People

              niu Niu Yawei (Inactive)
              lustre.support Supporto Lustre Jnet2000 (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: