Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4452

Lustre 1.8.8 client causes kernel panic

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 1.8.9
    • None
    • CentOS6.4
    • 3
    • 12207

    Description

      When 2 OSTs, out of 32, went unhealthy (LUN went offline and lustre server reporting io refusing services), accessing files striping across the OSTs would cause client kernel panic. Here are the client dumps:

      —

      Jan 4 01:12:43 trestles-2-17.sdsc.edu: kernel: LustreError: 3595:0:(osc_request.c:1652:osc_brw_redo_request()) @@@ redo for recoverable error 5 req@ffff881023c69c00 x1454016746327144/t0 o3>puma-OST0000_UUID@172.25.33.113@tcp:6/4 lens 448/592 e 0 to 1 dl 1388826770 ref 2 fl Interpret:R/0/0 rc -5/-5
      Jan 4 01:12:43 trestles-2-17.sdsc.edu: kernel: LustreError: 3595:0:(osc_request.c:1652:osc_brw_redo_request()) Skipped 2 previous similar messages
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: LustreError: 3595:0:(osc_request.c:2330:brw_interpret()) puma-OST0000-osc-ffff880c2515e400: too many resent retries for object: 23302047, rc = -5.
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: LustreError: 3595:0:(osc_request.c:2357:brw_interpret()) ASSERTION(!(aa->aa_oa->o_valid & OBD_MD_FLHANDLE)) failed
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: LustreError: 3595:0:(osc_request.c:2357:brw_interpret()) LBUG
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: Pid: 3595, comm: ptlrpcd
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel:
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: Call Trace:
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: BUG: unable to handle kernel NULL pointer dereference at (null)
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: IP: [<(null)>] (null)
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: PGD 0
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: Oops: 0010 1 SMP
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: last sysfs file: /sys/devices/system/node/node7/meminfo
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: CPU 30
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: Modules linked in: mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ptlrpc(U) nfs lockd fscache auth_rpcgss nfs_acl limic(U) knem(U) autofs4 ksocklnd(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) sunrpc ipmi_devintf ipt_REJECT iptable_filter ip_tables rdma_ucm(U) ib_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ipv6 ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_en(U) mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U) mlx4_core(U) compat(U) tcp_htcp igb dca ptp pps_core microcode sg serio_raw k10temp amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel:
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: Pid: 3595, comm: ptlrpcd Tainted: G W --------------- 2.6.32-358.23.2.el6.x86_64 #1 Supermicro H8QG6/H8QG6
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: RIP: 0010:[<0000000000000000>] [<(null)>] (null)
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: RSP: 0018:ffff8807d27bdb48 EFLAGS: 00010246
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: RAX: ffff8807d27bdbac RBX: ffff8807d27bdba0 RCX: ffffffffa0366260
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: RDX: ffff8807d27bdbe0 RSI: ffff8807d27bdba0 RDI: ffff8807d27bc000
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: RBP: ffff8807d27bdbe0 R08: 0000000000000000 R09: 0000000000000000
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: R10: 0000000000000003 R11: 0000000000000000 R12: 000000000000cbe0
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: R13: ffffffffa0366260 R14: 0000000000000000 R15: ffff880e2f483fc0
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: FS: 00002b844dc5ed80(0000) GS:ffff880e2f480000(0000) knlGS:0000000000000000
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: CR2: 0000000000000000 CR3: 0000000001a85000 CR4: 00000000000007e0
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: Process ptlrpcd (pid: 3595, threadinfo ffff8807d27bc000, task ffff88082474e040)
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: Stack:
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: ffffffff8100e4a0 ffff8807d27bdbac ffff88082474e040 ffffffffa0699f78
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: <d> 00000000a069a9a8 ffff8807d27bc000 ffff8807d27bdfd8 ffff8807d27bc000
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: <d> 000000000000001e ffff880e2f480000 ffff8807d27bdbe0 ffff8807d27bdbb0
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: Call Trace:
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: [<ffffffff8100e4a0>] ? dump_trace+0x190/0x3b0
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: [<ffffffffa035a835>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: [<ffffffffa035ae65>] lbug_with_loc+0x75/0xe0 [libcfs]
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: [<ffffffffa03635d6>] libcfs_assertion_failed+0x66/0x70 [libcfs]
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: [<ffffffffa06903ff>] brw_interpret+0xcff/0xe90 [osc]
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: [<ffffffffa04b6a9a>] ptlrpc_check_set+0x24a/0x16b0 [ptlrpc]
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: [<ffffffff81081b5b>] ? try_to_del_timer_sync+0x7b/0xe0
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: [<ffffffff81081be2>] ? del_timer_sync+0x22/0x30
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: [<ffffffffa04ed7ad>] ptlrpcd_check+0x18d/0x270 [ptlrpc]
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: [<ffffffffa04eda50>] ptlrpcd+0x160/0x270 [ptlrpc]
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: [<ffffffff81063990>] ? default_wake_function+0x0/0x20
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: [<ffffffffa04ed8f0>] ? ptlrpcd+0x0/0x270 [ptlrpc]
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: Code: Bad RIP value.
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: RIP [<(null)>] (null)
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: RSP <ffff8807d27bdb48>
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: CR2: 0000000000000000
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: --[ end trace e64f567342ffc045 ]--
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: Kernel panic - not syncing: Fatal exception
      Jan 4 01:12:55 trestles-2-17.sdsc.edu: kernel: Pid: 3595, comm: ptlrpcd Tainted: G D W --------------- 2.6.32-358.23.2.el6.x86_64 #1

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              haisong Haisong Cai (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: