Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8527

Lustre 2.8 server crashed on bring up during large scale test shot

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.8.0
    • None
    • RHEL6.7 servers running latest lustre 2.8 using latest lustre 2.8 clients.
    • 3
    • 9223372036854775807

    Description

      In our first attempt to test lustre 2.8 servers at scale the MDS crashed at bring up. The backtrace is:

      <2>[ 8011.527071] kernel BUG at fs/jbd2/transaction.c:1028!
      <4>[ 8011.532938] invalid opcode: 0000 1 SMP
      <4>[ 8011.537761] last sysfs file: /sys/devices/virtual/block/dm-7/uevent
      <4>[ 8011.544986] CPU 5
      <4>[ 8011.547040] Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) ldiskfs(U) mbcache jbd2 lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ptlrpc(U) obdclass(U) ko2iblnd(U) lnet(U) sha512_generic crc32c_intel libcfs(U) mpt2sas mptctl mptbase dell_rbu autofs4 8021q garp stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_REJECT xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_sa ib_mad ib_core ib_addr dm_mirror dm_region_hash dm_log dm_round_robin scsi_dh_rdac dm_multipath dm_mod sg ipmi_devintf sd_mod crc_t10dif iTCO_wdt iTCO_vendor_support microcode wmi power_meter acpi_ipmi ipmi_si ipmi_msghandler dcdbas mpt3sas scsi_transport_sas raid_class shpchp sb_edac edac_core lpc_ich mfd_core ahci ipv6 nfs lockd fscache auth_rpcgss nfs_acl sunrpc tg3 mlx4_en ptp pps_core mlx4_core [last unloaded: scsi_wait_scan]
      <4>[ 8011.646238]
      <4>[ 8011.648140] Pid: 30211, comm: mdt01_503 Not tainted 2.6.32-573.26.1.el6.dne2.x86_64 #1 Dell Inc. PowerEdge R630/0CNCJW
      <4>[ 8011.660578] RIP: 0010:[<ffffffffa0c9d92d>] [<ffffffffa0c9d92d>] jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2]
      <4>[ 8011.672316] RSP: 0018:ffff883e6751f580 EFLAGS: 00010246
      <4>[ 8011.678473] RAX: ffff883e3a8be380 RBX: ffff883ce650e3d8 RCX: ffff883e9a7656e0
      <4>[ 8011.686676] RDX: 0000000000000000 RSI: ffff883e9a7656e0 RDI: ffff883ce650e3d8
      <4>[ 8011.694873] RBP: ffff883e6751f5a0 R08: c010000000000000 R09: 0000000000000000
      <4>[ 8011.703072] R10: 0000000000000001 R11: 0000000000000008 R12: ffff883df92c8ba8
      <4>[ 8011.716541] R13: ffff883e9a7656e0 R14: ffff884023ca1800 R15: 0000000000000058
      <4>[ 8011.724738] FS: 0000000000000000(0000) GS:ffff88018fca0000(0000) knlGS:0000000000000000
      <4>[ 8011.734220] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      <4>[ 8011.740863] CR2: 00007fc59a79b380 CR3: 0000000001a8d000 CR4: 00000000001407e0
      <4>[ 8011.749059] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>[ 8011.757263] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>[ 8011.765460] Process mdt01_503 (pid: 30211, threadinfo ffff883e6751c000, task ffff883e6750c040)
      <4>[ 8011.775518] Stack:
      <4>[ 8011.777983] ffff883ce650e3d8 ffffffffa0d827b0 ffff883e9a7656e0 0000000000000000
      <4>[ 8011.786313] <d> ffff883e6751f5e0 ffffffffa0cc0fab ffff883e6751f5e0 ffffffffa0ccf968
      <4>[ 8011.795363] <d> 0000000000000000 0000000000000058 0000000000000058 ffff883e9a7656e0
      <4>[ 8011.804839] Call Trace:
      <4>[ 8011.807798] [<ffffffffa0cc0fab>] __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
      <4>[ 8011.817097] [<ffffffffa0ccf968>] ? ldiskfs_bread+0x18/0x80 [ldiskfs]
      <4>[ 8011.824517] [<ffffffffa0d6635c>] osd_ldiskfs_write_record+0xec/0x340 [osd_ldiskfs]
      <4>[ 8011.833519] [<ffffffffa0d690a3>] osd_write+0x183/0x5b0 [osd_ldiskfs]
      <4>[ 8011.840965] [<ffffffffa05f8b4d>] dt_record_write+0x3d/0x130 [obdclass]
      <4>[ 8011.848680] [<ffffffffa05bb44f>] llog_osd_write_rec+0xb6f/0x1ad0 [obdclass]
      <4>[ 8011.856787] [<ffffffffa0d7ba0b>] ? dynlock_unlock+0x16b/0x1d0 [osd_ldiskfs]
      <4>[ 8011.864894] [<ffffffffa05a9426>] llog_write_rec+0xb6/0x270 [obdclass]
      <4>[ 8011.872419] [<ffffffffa05b2893>] llog_cat_add_rec+0x1c3/0x7b0 [obdclass]
      <4>[ 8011.880234] [<ffffffffa05a9239>] llog_add+0x89/0x1c0 [obdclass]
      <4>[ 8011.887176] [<ffffffffa108641d>] osp_sync_add_rec+0x26d/0x9b0 [osp]
      <4>[ 8011.894501] [<ffffffffa1086c07>] osp_sync_add+0x77/0x80 [osp]
      <4>[ 8011.901248] [<ffffffffa0fb502e>] ? lod_sub_get_thandle+0x24e/0x3c0 [lod]
      <4>[ 8011.909060] [<ffffffffa1077823>] osp_object_destroy+0x173/0x230 [osp]
      <4>[ 8011.916582] [<ffffffffa0fb85ad>] lod_sub_object_destroy+0x1fd/0x440 [lod]

      Attachments

        Issue Links

          Activity

            [LU-8527] Lustre 2.8 server crashed on bring up during large scale test shot

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/22726
            Subject: LU-8527 obdclass: declare more credits for llog append
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c38efe8101d4bc977ceb9f07b041ac56795dd90d

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/22726 Subject: LU-8527 obdclass: declare more credits for llog append Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c38efe8101d4bc977ceb9f07b041ac56795dd90d

            It is suspected that the concurrent multiple llog append operations, such as unlink large striped (large EA) files concurrently, that can fill up a llog file in very short time. Because the llog may be shared (at the same time) by many concurrent modifications, then under some extreme cases, some unlink large striped file operation may have not declared to create llog file, but during the real llog append for recording the striped OST-objects, it finds that the current llog is filled up, and it needs to create new llog file, but because it does not declare related credits, then the lower layer may hit trouble.

            I am not sure whether it is just such case or not. But since it is reproducible on customer site, I will make a patch for verification.

            yong.fan nasf (Inactive) added a comment - It is suspected that the concurrent multiple llog append operations, such as unlink large striped (large EA) files concurrently, that can fill up a llog file in very short time. Because the llog may be shared (at the same time) by many concurrent modifications, then under some extreme cases, some unlink large striped file operation may have not declared to create llog file, but during the real llog append for recording the striped OST-objects, it finds that the current llog is filled up, and it needs to create new llog file, but because it does not declare related credits, then the lower layer may hit trouble. I am not sure whether it is just such case or not. But since it is reproducible on customer site, I will make a patch for verification.

            Lustre: 18235:0:(osd_handler.c:1272:osd_trans_dump_creds()) attr_set: 2/2/1, xattr_set: 1011/92/0

            The "xattr_set" seems abnormal, under which case the oti_declare_ops[xattr_set] can be so much larger than oti_declare_ops_cred[xattr_set] ?

            yong.fan nasf (Inactive) added a comment - Lustre: 18235:0:(osd_handler.c:1272:osd_trans_dump_creds()) attr_set: 2/2/1, xattr_set: 1011/92/0 The "xattr_set" seems abnormal, under which case the oti_declare_ops [xattr_set] can be so much larger than oti_declare_ops_cred [xattr_set] ?

            My first guess would be to take a look at the ldiskfs xattr inode (wide striping) patch. It may be that it is declaring too many credits in some cases.

            adilger Andreas Dilger added a comment - My first guess would be to take a look at the ldiskfs xattr inode (wide striping) patch. It may be that it is declaring too many credits in some cases.

            Lustre: 18235:0:(osd_handler.c:1272:osd_trans_dump_creds()) attr_set: 2/2/1, xattr_set: 1011/92/0
            Lustre: 18235:0:(osd_handler.c:1282:osd_trans_dump_creds()) write: 5043/43356/0, punch: 0/0/0, quota 4/52/0
            Lustre: 18235:0:(osd_handler.c:1289:osd_trans_dump_creds()) insert: 1009/17152/0, delete: 2/9/1
            Lustre: 18235:0:(osd_handler.c:1296:osd_trans_dump_creds()) ref_add: 1/1/0, ref_del: 2/2/1

            the transaction declared at least 2+92+43356+52+17152+9+2+2=60667 credits, while consumed 1+1+1=3
            well, probably a bit more as we don't track everything, but still way less than 60667.
            it looks like a false assertion though I don't understand the root cause yet.

            bzzz Alex Zhuravlev added a comment - Lustre: 18235:0:(osd_handler.c:1272:osd_trans_dump_creds()) attr_set: 2/2/1, xattr_set: 1011/92/0 Lustre: 18235:0:(osd_handler.c:1282:osd_trans_dump_creds()) write: 5043/43356/0, punch: 0/0/0, quota 4/52/0 Lustre: 18235:0:(osd_handler.c:1289:osd_trans_dump_creds()) insert: 1009/17152/0, delete: 2/9/1 Lustre: 18235:0:(osd_handler.c:1296:osd_trans_dump_creds()) ref_add: 1/1/0, ref_del: 2/2/1 the transaction declared at least 2+92+43356+52+17152+9+2+2=60667 credits, while consumed 1+1+1=3 well, probably a bit more as we don't track everything, but still way less than 60667. it looks like a false assertion though I don't understand the root cause yet.

            i'm trying to come up with a debugging patch.

            bzzz Alex Zhuravlev added a comment - i'm trying to come up with a debugging patch.

            Any updates?

            simmonsja James A Simmons added a comment - Any updates?
            yujian Jian Yu added a comment -

            Hi Alex,

            It's Lustre b2_8_fe branch with commit 71badf2 on the tip.

            yujian Jian Yu added a comment - Hi Alex, It's Lustre b2_8_fe branch with commit 71badf2 on the tip.

            please, tell the latest commit for the branch you're using.

            bzzz Alex Zhuravlev added a comment - please, tell the latest commit for the branch you're using.

            the first crash happened while running a large stripe single shared file with IOR. After the first crash this bug came back at random times.

            simmonsja James A Simmons added a comment - the first crash happened while running a large stripe single shared file with IOR. After the first crash this bug came back at random times.

            It seems possible that this is related to creating or deleting a wide-striped file? I found LU-3102 that may be related.

            adilger Andreas Dilger added a comment - It seems possible that this is related to creating or deleting a wide-striped file? I found LU-3102 that may be related.

            People

              bobijam Zhenyu Xu
              simmonsja James A Simmons
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: