Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • RHEL6.2 the Sequoia MDS
    • 3
    • Orion
    • 5228

    Description

      Observed running a current version of master, 2.3.53.

      BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      IP: [<ffffffffa06ad93e>] seq_server_alloc_meta+0x51e/0x700 [fid]
      PGD 0 
      Oops: 0000 [#1] SMP
      last sysfs file: /sys/devices/pci0000:80/0000:80:02.2/0000:83:00.0/host7/port-7:0/expander-7:0/port-7:0:13/end_device-7:0:13/target7:0:17/7:0:17:0/timeout
      CPU 9 
      
      Pid: 33477, comm: mdt_mdss_0003 Tainted: P        W  ----------------   2.6.32-220.23.1.1chaos.ch5.x86_64 #1 appro 2620x-in/S2600GZ
      RIP: 0010:[<ffffffffa06ad93e>]  [<ffffffffa06ad93e>] seq_server_alloc_meta+0x51e/0x700 [fid]
      RSP: 0018:ffff881fa8007ca0  EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 0000000200003e98 RCX: 0000000200004280
      RDX: 00000000000003e8 RSI: ffff881faadb40c0 RDI: ffff880fcc9e9500
      RBP: ffff881fa8007ce0 R08: 0000000000000000 R09: ffff881e0ee63e00
      R10: 0000000000000009 R11: ffffffffa09e2090 R12: ffff881e0ee63fe8
      R13: ffff881faadb4130 R14: ffff881faadb40c0 R15: ffff880fcc9e9500
      FS:  00007ffff7fdc700(0000) GS:ffff881078820000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000010 CR3: 0000000001a85000 CR4: 00000000000406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
      Process mdt_mdss_0003 (pid: 33477, threadinfo ffff881fa8006000, task ffff8820150b8080)
      Stack:
       ffff881fa8007cb0 ffff881a5019a400 ffff880fcc9e9b40 ffff881a5019a400
      <0> ffff880fcc9e9b40 ffff880fcc9e9500 ffff881e0ee63fe8 00000000ffffffea
      <0> ffff881fa8007d30 ffffffffa06ade9f ffff881fa8007d10 ffffc900c2888988
      Call Trace:
       [<ffffffffa06ade9f>] seq_query+0x37f/0x6d0 [fid]
       [<ffffffffa0f39322>] mdt_handle_common+0x932/0x1760 [mdt]
       [<ffffffffa0f3a1c5>] mdt_mdss_handle+0x15/0x20 [mdt]
       [<ffffffffa0948bfc>] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
       [<ffffffffa05b26be>] ? cfs_timer_arm+0xe/0x10 [libcfs]
       [<ffffffffa05c414f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
       [<ffffffffa093ffb9>] ? ptlrpc_wait_event+0xa9/0x2a0 [ptlrpc]
       [<ffffffff81051ba3>] ? __wake_up+0x53/0x70
       [<ffffffffa094a1ec>] ptlrpc_main+0xc0c/0x19f0 [ptlrpc]
       [<ffffffffa09495e0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
       [<ffffffff8100c14a>] child_rip+0xa/0x20
       [<ffffffffa09495e0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
       [<ffffffffa09495e0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
       [<ffffffff8100c140>] ? child_rip+0x0/0x20
      

      Attachments

        Issue Links

          Activity

            [LU-2186] seq_server_alloc_meta() NULL deref
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.4.0 [ 10154 ]
            bzzz Alex Zhuravlev made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: In Progress [ 3 ] New: Resolved [ 5 ]

            landed on master

            bzzz Alex Zhuravlev added a comment - landed on master
            ian Ian Colle (Inactive) made changes -
            Priority Original: Critical [ 2 ] New: Blocker [ 1 ]

            I added patch 4280 to our 2.3.54-llnl branch.

            morrone Christopher Morrone (Inactive) added a comment - I added patch 4280 to our 2.3.54-llnl branch.
            morrone Christopher Morrone (Inactive) made changes -
            Link New: This issue is duplicated by LU-2256 [ LU-2256 ]
            bzzz Alex Zhuravlev added a comment - please try with http://review.whamcloud.com/4280
            bzzz Alex Zhuravlev made changes -
            Status Original: Open [ 1 ] New: In Progress [ 3 ]
            morrone Christopher Morrone (Inactive) made changes -
            Labels New: topsequoia
            (gdb) list *(seq_server_alloc_meta+0x51e)
            0x196e is in seq_server_alloc_meta (/builddir/build/BUILD/lustre-2.3.53/lustre/fid/fid_handler.c:211).
            206     /builddir/build/BUILD/lustre-2.3.53/lustre/fid/fid_handler.c: No such file or directory.
                    in /builddir/build/BUILD/lustre-2.3.53/lustre/fid/fid_handler.c
            
            
                    if (range_is_exhausted(loset)) {
                            /* reached high water mark. */
            >>>             struct lu_device *dev = seq->lss_site->ms_lu->ls_top_dev;
                            int obd_num_clients = dev->ld_obd->obd_num_exports;
                            __u64 set_sz;
                    }
            

            It looks like seq->lss_site->ms_lu = NULL. At least that's consistent with what offset in the NULL deref and is roughly where gdb pointed me. How that can happen I'm not sure.

            behlendorf Brian Behlendorf added a comment - (gdb) list *(seq_server_alloc_meta+0x51e) 0x196e is in seq_server_alloc_meta (/builddir/build/BUILD/lustre-2.3.53/lustre/fid/fid_handler.c:211). 206 /builddir/build/BUILD/lustre-2.3.53/lustre/fid/fid_handler.c: No such file or directory. in /builddir/build/BUILD/lustre-2.3.53/lustre/fid/fid_handler.c if (range_is_exhausted(loset)) { /* reached high water mark. */ >>> struct lu_device *dev = seq->lss_site->ms_lu->ls_top_dev; int obd_num_clients = dev->ld_obd->obd_num_exports; __u64 set_sz; } It looks like seq->lss_site->ms_lu = NULL. At least that's consistent with what offset in the NULL deref and is roughly where gdb pointed me. How that can happen I'm not sure.

            People

              bzzz Alex Zhuravlev
              behlendorf Brian Behlendorf
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: