Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8364

during OSS failover test with quotas enabled, OSS node crashed on 2 of 4 failovers

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.10.0
    • Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      Console logs:
      ===========
      Jan 10 14:50:47 snx11000n004 XYRAID(snx11000n004_md1-jnlr)[11815]: INFO: snx11000n004_md1-jnlr stop exit : 0
      Jan 10 14:50:48 snx11000n004 kernel: [340395.094979] __ratelimit: 1047 callbacks suppressed
      Jan 10 14:50:48 snx11000n004 kernel: [340395.099992] Write to readonly device md139 (0x90008b) bi_flags: f000000000000001, bi_vcnt: 1, bi_idx: 0, bi->size: 4096, bi_cnt: 2, bi_private: ffff8802fede3678
      Jan 10 14:50:48 snx11000n004 kernel: [340395.114672] Write to readonly device md139 (0x90008b) bi_flags: f000000000000001, bi_vcnt: 1, bi_idx: 0, bi->size: 4096, bi_cnt: 2, bi_private: ffff8802fede3748
      Jan 10 14:50:48 snx11000n004 kernel: [340395.129363] Write to readonly device md139 (0x90008b) bi_flags: f000000000000001, bi_vcnt: 1, bi_idx: 0, bi->size: 4096, bi_cnt: 2, bi_private: ffff8802fede3748
      Jan 10 14:50:48 snx11000n004 kernel: [340395.144056] Write to readonly device md5 (0x900005) bi_flags: f000000000000001, bi_vcnt: 1, bi_idx: 0, bi->size: 4096, bi_cnt: 2, bi_private: ffff88049c2fb610
      Jan 10 14:50:48 snx11000n004 kernel: [340395.176736] LDISKFS-fs error (device md5): ldiskfs_mb_release_inode_pa: pa free mismatch: [pa ffff88066edde7b8] [phy 16646156] [logic 267] [len 117] [free 115] [error 0] [inode 117886005] [freed 117]
      Jan 10 14:50:48 snx11000n004 kernel: [340395.194885] Aborting journal on device md139.
      Jan 10 14:50:48 snx11000n004 kernel: [340395.199450] Write to readonly device md139 (0x90008b) bi_flags: f000000000000001, bi_vcnt: 1, bi_idx: 0, bi->size: 4096, bi_cnt: 2, bi_private: ffff880594188540
      Jan 10 14:50:48 snx11000n004 kernel: [340395.214136] LDISKFS-fs (md5): Remounting filesystem read-only
      Jan 10 14:50:48 snx11000n004 kernel: [340395.220182] Write to readonly device md5 (0x900005) bi_flags: f000000000000001, bi_vcnt: 1, bi_idx: 0, bi->size: 4096, bi_cnt: 2, bi_private: ffff880596d40a88
      Jan 10 14:50:48 snx11000n004 kernel: [340395.234671] LDISKFS-fs error (device md5): ldiskfs_mb_release_inode_pa: free 117, pa_free 115
      Jan 10 14:50:48 snx11000n004 kernel: [340395.243558] ----------[ cut here ]----------
      Jan 10 14:50:48 snx11000n004 kernel: [340395.248362] kernel BUG at /builddir/build/BUILD/lustre-ldiskfs-3.3.0.x2/ldiskfs/mballoc.c:3799!
      Jan 10 14:50:49 snx11000n004 kernel: [340395.256674] Write to readonly device md143 (0x90008f) bi_flags: f000000000000001, bi_vcnt: 1, bi_idx: 0, bi->size: 4096, bi_cnt: 2, bi_private: ffff8802fede3bc0
      Jan 10 14:50:49 snx11000n004 kernel: [340395.256679] Write to readonly device md143 (0x90008f) bi_flags: f000000000000001, bi_vcnt: 1, bi_idx: 0, bi->size: 4096, bi_cnt: 2, bi_private: ffff8802fede30c8
      Jan 10 14:50:49 snx11000n004 kernel: [340395.256695] Write to readonly device md143 (0x90008f) bi_flags: f000000000000001, bi_vcnt: 1, bi_idx: 0, bi->size: 4096, bi_cnt: 2, bi_private: ffff8802fede30c8
      Jan 10 14:50:49 snx11000n004 kernel: [340395.256712] Write to readonly device md7 (0x900007) bi_flags: f000000000000001, bi_vcnt: 1, bi_idx: 0, bi->size: 4096, bi_cnt: 2, bi_private: ffff880483f645a8
      Jan 10 14:50:49 snx11000n004 kernel: [340395.317126] invalid opcode: 0000 1 SMP
      Jan 10 14:50:49 snx11000n004 kernel: [340395.321485] last sysfs file: /sys/devices/virtual/block/md131/uevent
      Jan 10 14:50:49 snx11000n004 kernel: [340395.328034] CPU 0
      Jan 10 14:50:49 snx11000n004 kernel: [340395.424861]
      Jan 10 14:50:49 snx11000n004 kernel: [340395.520504] Pid: 11724, comm: umount Tainted: P W ---------------- 2.6.32-131.21.1.el6.lustre.3021.x86_64 #1 CS6000AC
      Jan 10 14:50:49 snx11000n004 kernel: [340395.532218] RIP: 0010:[<ffffffffa0921ab6>] [<ffffffffa0921ab6>] ldiskfs_mb_release_inode_pa+0x346/0x360 [ldiskfs]
      Jan 10 14:50:49 snx11000n004 kernel: [340395.542875] RSP: 0018:ffff8805de375a58 EFLAGS: 00010202
      Jan 10 14:50:49 snx11000n004 kernel: [340395.548364] RAX: 0000000000000073 RBX: 0000000000000075 RCX: ffff8807d3b0bc00
      Jan 10 14:50:49 snx11000n004 kernel: [340395.555743] RDX: 0000000000000000 RSI: 0000000000000046 RDI: ffff8806ebf95f00
      Jan 10 14:50:49 snx11000n004 kernel: [340395.563142] RBP: ffff8805de375b08 R08: 0000000000000000 R09: 0000000000000080
      Jan 10 14:50:49 snx11000n004 kernel: [340395.570521] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880324a63490
      Jan 10 14:50:49 snx11000n004 kernel: [340395.577903] R13: ffff880596f74408 R14: 0000000000000082 R15: ffff88066edde7b8
      Jan 10 14:50:49 snx11000n004 kernel: [340395.585286] FS: 00007f58836fe740(0000) GS:ffff880044600000(0000) knlGS:0000000000000000
      Jan 10 14:50:49 snx11000n004 kernel: [340395.593619] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      Jan 10 14:50:49 snx11000n004 kernel: [340395.599539] CR2: 00007f6553fd90a0 CR3: 0000000779cc0000 CR4: 00000000000406f0
      Jan 10 14:50:49 snx11000n004 kernel: [340395.606922] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      Jan 10 14:50:49 snx11000n004 kernel: [340395.614299] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Jan 10 14:50:49 snx11000n004 kernel: [340395.621678] Process umount (pid: 11724, threadinfo ffff8805de374000, task ffff8806c96e80c0)
      Jan 10 14:50:49 snx11000n004 kernel: [340395.630271] Stack:
      Jan 10 14:50:49 snx11000n004 kernel: [340395.632464] ffff880500000075 0000000000000073 ffff880500000000 000000000706cc35
      Jan 10 14:50:49 snx11000n004 kernel: [340395.639970] <0> 0000000000000075 000000000000007c ffff8805de375a98 ffffffff811a4066
      Jan 10 14:50:49 snx11000n004 kernel: [340395.648046] <0> ffff8807d3b0bc00 ffff8807a0f1a800 ffff88066edde7b8 0000000000fe0000
      Jan 10 14:50:49 snx11000n004 kernel: [340395.656371] Call Trace:
      Jan 10 14:50:49 snx11000n004 kernel: [340395.659000] [<ffffffff811a4066>] ? __wait_on_buffer+0x26/0x30
      Jan 10 14:50:49 snx11000n004 kernel: [340395.665024] [<ffffffffa092556e>] ldiskfs_discard_preallocations+0x1fe/0x490 [ldiskfs]
      Jan 10 14:50:49 snx11000n004 kernel: [340395.673193] [<ffffffffa093e1c6>] ldiskfs_clear_inode+0x16/0x50 [ldiskfs]
      Jan 10 14:50:49 snx11000n004 kernel: [340395.680168] [<ffffffff8118ceaf>] clear_inode+0x8f/0x110
      Jan 10 14:50:49 snx11000n004 kernel: [340395.685655] [<ffffffff8118cf70>] dispose_list+0x40/0x120
      Jan 10 14:50:49 snx11000n004 kernel: [340395.691236] [<ffffffff8118d41a>] invalidate_inodes+0xea/0x190
      Jan 10 14:50:49 snx11000n004 kernel: [340395.697249] [<ffffffff81174f2c>] generic_shutdown_super+0x4c/0xe0
      Jan 10 14:50:49 snx11000n004 kernel: [340395.703603] [<ffffffff81174ff1>] kill_block_super+0x31/0x50
      Jan 10 14:50:49 snx11000n004 kernel: [340395.709455] [<ffffffff811760a0>] deactivate_super+0x70/0x90
      Jan 10 14:50:49 snx11000n004 kernel: [340395.715291] [<ffffffff811915af>] mntput_no_expire+0xbf/0x110
      Jan 10 14:50:49 snx11000n004 kernel: [340395.721253] [<ffffffffa10eb9c4>] unlock_mntput+0x64/0x70 [obdclass]
      Jan 10 14:50:49 snx11000n004 kernel: [340395.727818] [<ffffffffa10f3ae3>] server_put_super+0x433/0x13e0 [obdclass]
      Jan 10 14:50:49 snx11000n004 kernel: [340395.734875] [<ffffffff8108e120>] ? autoremove_wake_function+0x0/0x40
      Jan 10 14:50:49 snx11000n004 kernel: [340395.741494] [<ffffffff8118d426>] ? invalidate_inodes+0xf6/0x190
      Jan 10 14:50:49 snx11000n004 kernel: [340395.747672] [<ffffffff81174f3b>] generic_shutdown_super+0x5b/0xe0
      Jan 10 14:50:49 snx11000n004 kernel: [340395.754054] [<ffffffff81175026>] kill_anon_super+0x16/0x60
      Jan 10 14:50:49 snx11000n004 kernel: [340395.759856] [<ffffffffa10ea166>] lustre_kill_super+0x36/0x60 [obdclass]
      Jan 10 14:50:49 snx11000n004 kernel: [340395.766760] [<ffffffff811760a0>] deactivate_super+0x70/0x90
      Jan 10 14:50:49 snx11000n004 kernel: [340395.772612] [<ffffffff811915af>] mntput_no_expire+0xbf/0x110
      Jan 10 14:50:49 snx11000n004 kernel: [340395.778555] [<ffffffff811919db>] sys_umount+0x7b/0x3a0
      Jan 10 14:50:49 snx11000n004 kernel: [340395.783971] [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
      same crash hit twice in 4 attempts. logs attached (kern, message, conman); will upload dump to ftp server.

      Attachments

        Activity

          People

            ys Yang Sheng
            lokesh.jaliminche Lokesh Nagappa Jaliminche (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: