Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1531

Accessing .lustre/fid/[0x1:0x0:0x0] triggers LBUG in osd_compat_objid_lookup()

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.4.0
    • Lustre 2.3.0
    • None
    • # uname -r
      2.6.32-220.13.1.el6.l22.x86_64
      # cat /proc/fs/lustre/version
      lustre: 2.2.55
      kernel: patchless_client
      build: 2.2.55--CHANGED-2.6.32-220.13.1.el6.l22.x86_64
    • 3
    • 6380

    Description

      [sanity@r62-lustre lustre]$ cat .lustre/fid/[0x1:0x0:0x0]

      LustreError: 2316:0:(osd_compat.c:383:osd_compat_objid_lookup()) ASSERTION( map ) failed:
      LustreError: 2316:0:(osd_compat.c:383:osd_compat_objid_lookup()) LBUG
      Pid: 2316, comm: mdt_00

      Call Trace:
      [<ffffffffa02c2905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [<ffffffffa02c2f17>] lbug_with_loc+0x47/0xb0 [libcfs]
      [<ffffffffa0b7f889>] osd_compat_objid_lookup+0x479/0x520 [osd_ldiskfs]
      [<ffffffffa0b74165>] osd_oi_lookup+0x55/0xb0 [osd_ldiskfs]
      [<ffffffffa0b6ea90>] osd_object_init+0x260/0x6a0 [osd_ldiskfs]
      [<ffffffffa048ab0c>] ? lu_object_add+0x2c/0x30 [obdclass]
      [<ffffffffa048c5d5>] lu_object_alloc+0xd5/0x310 [obdclass]
      [<ffffffffa048cba8>] ? htable_lookup+0x108/0x1c0 [obdclass]
      [<ffffffffa048ce61>] lu_object_find_at+0x201/0x450 [obdclass]
      [<ffffffff81273c47>] ? vsscanf+0x617/0x7c0
      [<ffffffffa048d0ef>] lu_object_find_slice+0x1f/0x80 [obdclass]
      [<ffffffffa0a4f160>] mdd_object_find+0x10/0x70 [mdd]
      [<ffffffffa0a79006>] obf_lookup+0xd6/0x270 [mdd]
      [<ffffffffa0b3f136>] cml_lookup+0x66/0x1b0 [cmm]
      [<ffffffffa0ad50e7>] ? mdt_version_get_check+0x47/0xe0 [mdt]
      [<ffffffffa0ae9c37>] mdt_reint_open+0x6f7/0x18b0 [mdt]
      [<ffffffffa0a7782e>] ? md_ucred+0x1e/0x60 [mdd]
      [<ffffffffa0ab81a5>] ? mdt_ucred+0x15/0x20 [mdt]
      [<ffffffffa0acf21c>] ? mdt_root_squash+0x2c/0x3e0 [mdt]
      [<ffffffffa0ad3b21>] mdt_reint_rec+0x41/0xe0 [mdt]
      [<ffffffffa0acd37a>] mdt_reint_internal+0x50a/0x810 [mdt]
      [<ffffffffa0acd94d>] mdt_intent_reint+0x1ed/0x500 [mdt]
      [<ffffffffa0ac9c91>] mdt_intent_policy+0x371/0x6a0 [mdt]
      [<ffffffffa05ae831>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
      [<ffffffffa05d611a>] ldlm_handle_enqueue0+0x48a/0xf40 [ptlrpc]
      [<ffffffffa0ac9836>] mdt_enqueue+0x46/0x130 [mdt]
      [<ffffffffa0abf2a2>] mdt_handle_common+0x922/0x1740 [mdt]
      [<ffffffffa0ac0195>] mdt_regular_handle+0x15/0x20 [mdt]
      [<ffffffffa0604782>] ptlrpc_server_handle_request+0x412/0xeb0 [ptlrpc]
      [<ffffffffa02c365e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      [<ffffffffa02d3d9f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
      [<ffffffffa05fd5e2>] ? ptlrpc_wait_event+0xb2/0x2c0 [ptlrpc]
      [<ffffffff81051ab3>] ? __wake_up+0x53/0x70
      [<ffffffffa06059f7>] ptlrpc_main+0x7d7/0x1610 [ptlrpc]
      [<ffffffffa0605220>] ? ptlrpc_main+0x0/0x1610 [ptlrpc]
      [<ffffffff8100c14a>] child_rip+0xa/0x20
      [<ffffffffa0605220>] ? ptlrpc_main+0x0/0x1610 [ptlrpc]
      [<ffffffffa0605220>] ? ptlrpc_main+0x0/0x1610 [ptlrpc]
      [<ffffffff8100c140>] ? child_rip+0x0/0x20

      LustreError: dumping log to /tmp/lustre-log.1339790039.2316

      Attachments

        Issue Links

          Activity

            [LU-1531] Accessing .lustre/fid/[0x1:0x0:0x0] triggers LBUG in osd_compat_objid_lookup()
            rhenwood Richard Henwood (Inactive) added a comment - a fix has landed in master: http://review.whamcloud.com/#change,4255

            This issue is present on Master.

            lfs fid2path /mnt/lustre/ [0x1:0x2:0x0]
            
            LustreError: 31926:0:(mdt_handler.c:2447:mdt_obj()) ASSERTION( lu_device_is_mdt(o->lo_dev) ) failed: 
            LustreError: 31926:0:(mdt_handler.c:2447:mdt_obj()) LBUG
            Pid: 31926, comm: mdt00_000
            
            Call Trace:
             [<ffffffffa03d3905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
             [<ffffffffa03d3f17>] lbug_with_loc+0x47/0xb0 [libcfs]
             [<ffffffffa0b851cf>] mdt_obj+0x5f/0x80 [mdt]
             [<ffffffffa0b892e6>] mdt_object_find+0x66/0x170 [mdt]
             [<ffffffffa0b8dcaa>] mdt_get_info+0x22a/0xa90 [mdt]
             [<ffffffffa0b8943d>] ? mdt_unpack_req_pack_rep+0x4d/0x4c0 [mdt]
             [<ffffffffa0b91322>] mdt_handle_common+0x932/0x1740 [mdt]
             [<ffffffffa0b92205>] mdt_regular_handle+0x15/0x20 [mdt]
             [<ffffffffa06ed7ac>] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
             [<ffffffffa03d465e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
             [<ffffffffa06e4b87>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
             [<ffffffff810533f3>] ? __wake_up+0x53/0x70
             [<ffffffffa06eed81>] ptlrpc_main+0xbf1/0x19e0 [ptlrpc]
             [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
             [<ffffffff8100c14a>] child_rip+0xa/0x20
             [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
             [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
             [<ffffffff8100c140>] ? child_rip+0x0/0x20
            
            Kernel panic - not syncing: LBUG
            Pid: 31926, comm: mdt00_000 Not tainted 2.6.32-279.5.1.el6_lustre.x86_64 #1
            Call Trace:
             [<ffffffff814fd58a>] ? panic+0xa0/0x168
             [<ffffffffa03d3f6b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
             [<ffffffffa0b851cf>] ? mdt_obj+0x5f/0x80 [mdt]
             [<ffffffffa0b892e6>] ? mdt_object_find+0x66/0x170 [mdt]
             [<ffffffffa0b8dcaa>] ? mdt_get_info+0x22a/0xa90 [mdt]
             [<ffffffffa0b8943d>] ? mdt_unpack_req_pack_rep+0x4d/0x4c0 [mdt]
             [<ffffffffa0b91322>] ? mdt_handle_common+0x932/0x1740 [mdt]
             [<ffffffffa0b92205>] ? mdt_regular_handle+0x15/0x20 [mdt]
             [<ffffffffa06ed7ac>] ? ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
             [<ffffffffa03d465e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
             [<ffffffffa06e4b87>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
             [<ffffffff810533f3>] ? __wake_up+0x53/0x70
             [<ffffffffa06eed81>] ? ptlrpc_main+0xbf1/0x19e0 [ptlrpc]
             [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
             [<ffffffff8100c14a>] ? child_rip+0xa/0x20
             [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
             [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
             [<ffffffff8100c140>] ? child_rip+0x0/0x20
            
            rhenwood Richard Henwood (Inactive) added a comment - This issue is present on Master. lfs fid2path /mnt/lustre/ [0x1:0x2:0x0] LustreError: 31926:0:(mdt_handler.c:2447:mdt_obj()) ASSERTION( lu_device_is_mdt(o->lo_dev) ) failed: LustreError: 31926:0:(mdt_handler.c:2447:mdt_obj()) LBUG Pid: 31926, comm: mdt00_000 Call Trace: [<ffffffffa03d3905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa03d3f17>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0b851cf>] mdt_obj+0x5f/0x80 [mdt] [<ffffffffa0b892e6>] mdt_object_find+0x66/0x170 [mdt] [<ffffffffa0b8dcaa>] mdt_get_info+0x22a/0xa90 [mdt] [<ffffffffa0b8943d>] ? mdt_unpack_req_pack_rep+0x4d/0x4c0 [mdt] [<ffffffffa0b91322>] mdt_handle_common+0x932/0x1740 [mdt] [<ffffffffa0b92205>] mdt_regular_handle+0x15/0x20 [mdt] [<ffffffffa06ed7ac>] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc] [<ffffffffa03d465e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa06e4b87>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc] [<ffffffff810533f3>] ? __wake_up+0x53/0x70 [<ffffffffa06eed81>] ptlrpc_main+0xbf1/0x19e0 [ptlrpc] [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc] [<ffffffff8100c14a>] child_rip+0xa/0x20 [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc] [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc] [<ffffffff8100c140>] ? child_rip+0x0/0x20 Kernel panic - not syncing: LBUG Pid: 31926, comm: mdt00_000 Not tainted 2.6.32-279.5.1.el6_lustre.x86_64 #1 Call Trace: [<ffffffff814fd58a>] ? panic+0xa0/0x168 [<ffffffffa03d3f6b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [<ffffffffa0b851cf>] ? mdt_obj+0x5f/0x80 [mdt] [<ffffffffa0b892e6>] ? mdt_object_find+0x66/0x170 [mdt] [<ffffffffa0b8dcaa>] ? mdt_get_info+0x22a/0xa90 [mdt] [<ffffffffa0b8943d>] ? mdt_unpack_req_pack_rep+0x4d/0x4c0 [mdt] [<ffffffffa0b91322>] ? mdt_handle_common+0x932/0x1740 [mdt] [<ffffffffa0b92205>] ? mdt_regular_handle+0x15/0x20 [mdt] [<ffffffffa06ed7ac>] ? ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc] [<ffffffffa03d465e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa06e4b87>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc] [<ffffffff810533f3>] ? __wake_up+0x53/0x70 [<ffffffffa06eed81>] ? ptlrpc_main+0xbf1/0x19e0 [ptlrpc] [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc] [<ffffffff8100c14a>] ? child_rip+0xa/0x20 [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc] [<ffffffffa06ee190>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc] [<ffffffff8100c140>] ? child_rip+0x0/0x20

            Given Andreas's comment that this issue was introduced by an Orion commit, 2.3 should not be affected. I've checked it anyway, and I couldn't reproduce.

            However, this issue is fixed.

            rhenwood Richard Henwood (Inactive) added a comment - Given Andreas's comment that this issue was introduced by an Orion commit, 2.3 should not be affected. I've checked it anyway, and I couldn't reproduce. However, this issue is fixed.

            double checking if this is actually fixed on Lustre 2.3 ...

            rhenwood Richard Henwood (Inactive) added a comment - double checking if this is actually fixed on Lustre 2.3 ...

            Correcting resolution message. This issue was a real problem that was /fixed/.

            rhenwood Richard Henwood (Inactive) added a comment - Correcting resolution message. This issue was a real problem that was /fixed/.

            Richard, it would probably be mired appropriate to have closed this as "Fixed" rather than "Cannot Reproduce", since it was a real bug that was fixed with a code change. The "Cannot Reproduce" label indicates that a reported bug was not seen in later testing for whatever reason, and no change was made to the code to resolve the issue.

            In any case, I'm glad this problem is gone.

            adilger Andreas Dilger added a comment - Richard, it would probably be mired appropriate to have closed this as "Fixed" rather than "Cannot Reproduce", since it was a real bug that was fixed with a code change. The "Cannot Reproduce" label indicates that a reported bug was not seen in later testing for whatever reason, and no change was made to the code to resolve the issue. In any case, I'm glad this problem is gone.

            Seems like this issue has been fixed by LU-1518.

            rhenwood Richard Henwood (Inactive) added a comment - Seems like this issue has been fixed by LU-1518 .

            I couldn't reproduce this issue with a recent Master build (924) and I can't reproduce.

            # cat .lustre/fid/[0x1:0x0:0x0]
            cat: .lustre/fid/[0x1:0x0:0x0]: Invalid argument
            
            rhenwood Richard Henwood (Inactive) added a comment - I couldn't reproduce this issue with a recent Master build (924) and I can't reproduce. # cat .lustre/fid/[0x1:0x0:0x0] cat: .lustre/fid/[0x1:0x0:0x0]: Invalid argument
            jhammond John Hammond added a comment -

            You can also get here using lfs path2fid, in fact by using the sample FID provided.

            [root]# lfs fid2path /mnt/lustre PANTS
            bad FID format [PANTS], should be [0x1:0x2:0x0]

            fid2path error: Invalid argument
            [root]# lfs fid2path /mnt/lustre [0x1:0x2:0x0]

            LustreError: 2446:0:(osd_compat.c:381:osd_compat_objid_lookup()) ASSERTION( map ) failed:
            LustreError: 2446:0:(osd_compat.c:381:osd_compat_objid_lookup()) LBUG
            Pid: 2446, comm: mdt00_002

            Call Trace:
            [<ffffffffa0308905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            [<ffffffffa0308f17>] lbug_with_loc+0x47/0xb0 [libcfs]
            [<ffffffffa0b73c59>] osd_compat_objid_lookup+0x479/0x520 [osd_ldiskfs]
            [<ffffffff81275d46>] ? vsnprintf+0x2b6/0x5f0
            [<ffffffffa0b68565>] osd_oi_lookup+0x55/0xc0 [osd_ldiskfs]
            [<ffffffffa0b5dbf8>] osd_object_init+0x338/0xd80 [osd_ldiskfs]
            [<ffffffffa048a22e>] ? dt_object_init+0xe/0x10 [obdclass]
            [<ffffffffa0486f65>] lu_object_alloc+0xd5/0x310 [obdclass]
            [<ffffffffa0487538>] ? htable_lookup+0x108/0x1c0 [obdclass]
            [<ffffffffa04877f1>] lu_object_find_at+0x201/0x450 [obdclass]
            [<ffffffffa05f4271>] ? lustre_pack_reply_v2+0x1e1/0x280 [ptlrpc]
            [<ffffffffa05f330d>] ? lustre_msg_buf+0x5d/0x60 [ptlrpc]
            [<ffffffffa0487a56>] lu_object_find+0x16/0x20 [obdclass]
            [<ffffffffa0a71da6>] mdt_object_find+0x56/0x170 [mdt]
            [<ffffffffa0a75fda>] mdt_get_info+0x22a/0xa90 [mdt]
            [<ffffffffa0a71f0d>] ? mdt_unpack_req_pack_rep+0x4d/0x4d0 [mdt]
            [<ffffffffa0a7a922>] mdt_handle_common+0x922/0x1740 [mdt]
            [<ffffffffa0a7b815>] mdt_regular_handle+0x15/0x20 [mdt]
            [<ffffffffa060457d>] ptlrpc_server_handle_request+0x40d/0xea0 [ptlrpc]
            [<ffffffffa030965e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
            [<ffffffffa05fba37>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
            [<ffffffff81051ba3>] ? __wake_up+0x53/0x70
            [<ffffffffa0605b79>] ptlrpc_main+0xb69/0x1870 [ptlrpc]
            [<ffffffffa0605010>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
            [<ffffffff8100c14a>] child_rip+0xa/0x20
            [<ffffffffa0605010>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
            [<ffffffffa0605010>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
            [<ffffffff8100c140>] ? child_rip+0x0/0x20

            jhammond John Hammond added a comment - You can also get here using lfs path2fid, in fact by using the sample FID provided. [root] # lfs fid2path /mnt/lustre PANTS bad FID format [PANTS] , should be [0x1:0x2:0x0] fid2path error: Invalid argument [root] # lfs fid2path /mnt/lustre [0x1:0x2:0x0] LustreError: 2446:0:(osd_compat.c:381:osd_compat_objid_lookup()) ASSERTION( map ) failed: LustreError: 2446:0:(osd_compat.c:381:osd_compat_objid_lookup()) LBUG Pid: 2446, comm: mdt00_002 Call Trace: [<ffffffffa0308905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0308f17>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0b73c59>] osd_compat_objid_lookup+0x479/0x520 [osd_ldiskfs] [<ffffffff81275d46>] ? vsnprintf+0x2b6/0x5f0 [<ffffffffa0b68565>] osd_oi_lookup+0x55/0xc0 [osd_ldiskfs] [<ffffffffa0b5dbf8>] osd_object_init+0x338/0xd80 [osd_ldiskfs] [<ffffffffa048a22e>] ? dt_object_init+0xe/0x10 [obdclass] [<ffffffffa0486f65>] lu_object_alloc+0xd5/0x310 [obdclass] [<ffffffffa0487538>] ? htable_lookup+0x108/0x1c0 [obdclass] [<ffffffffa04877f1>] lu_object_find_at+0x201/0x450 [obdclass] [<ffffffffa05f4271>] ? lustre_pack_reply_v2+0x1e1/0x280 [ptlrpc] [<ffffffffa05f330d>] ? lustre_msg_buf+0x5d/0x60 [ptlrpc] [<ffffffffa0487a56>] lu_object_find+0x16/0x20 [obdclass] [<ffffffffa0a71da6>] mdt_object_find+0x56/0x170 [mdt] [<ffffffffa0a75fda>] mdt_get_info+0x22a/0xa90 [mdt] [<ffffffffa0a71f0d>] ? mdt_unpack_req_pack_rep+0x4d/0x4d0 [mdt] [<ffffffffa0a7a922>] mdt_handle_common+0x922/0x1740 [mdt] [<ffffffffa0a7b815>] mdt_regular_handle+0x15/0x20 [mdt] [<ffffffffa060457d>] ptlrpc_server_handle_request+0x40d/0xea0 [ptlrpc] [<ffffffffa030965e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa05fba37>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc] [<ffffffff81051ba3>] ? __wake_up+0x53/0x70 [<ffffffffa0605b79>] ptlrpc_main+0xb69/0x1870 [ptlrpc] [<ffffffffa0605010>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] [<ffffffff8100c14a>] child_rip+0xa/0x20 [<ffffffffa0605010>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] [<ffffffffa0605010>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] [<ffffffff8100c140>] ? child_rip+0x0/0x20

            This is code that was landed to master from orion in commit 4980567857699c7f902ebda336ea98fdc4b83100.

            It definitely shouldn't be possible for regular users to trigger an LASSERT(), or otherwise access invalid FIDs via .lustre, so the MDS needs to validate the FID is sane and allowed to be accessed before passing it down to the lower layers.

            adilger Andreas Dilger added a comment - This is code that was landed to master from orion in commit 4980567857699c7f902ebda336ea98fdc4b83100. It definitely shouldn't be possible for regular users to trigger an LASSERT(), or otherwise access invalid FIDs via .lustre, so the MDS needs to validate the FID is sane and allowed to be accessed before passing it down to the lower layers.

            People

              rhenwood Richard Henwood (Inactive)
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: