Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3239

ofd_internal.h:518:ofd_info_init()) ASSERTION( info )

Details

    • 3
    • 7954

    Description

      Lustre: DEBUG MARKER: == parallel-scale test simul: simul == 22:08:33 (1367125713)
      LustreError: 19190:0:(ofd_internal.h:518:ofd_info_init()) ASSERTION( info ) failed:
      LustreError: 19182:0:(ofd_internal.h:518:ofd_info_init()) ASSERTION( info ) failed:
      LustreError: 19182:0:(ofd_internal.h:518:ofd_info_init()) LBUG
      Pid: 19182, comm: ll_ost_io00_001

      Call Trace:
      [<ffffffffa044e895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [<ffffffffa044ee97>] lbug_with_loc+0x47/0xb0 [libcfs]
      [<ffffffffa0e03e62>] ofd_info_init+0x92/0x130 [ofd]
      [<ffffffffa0e05835>] ofd_get_info+0x2e5/0xa90 [ofd]
      [<ffffffff812805cd>] ? pointer+0x8d/0x830
      [<ffffffffa029f7e5>] ? lprocfs_counter_add+0x125/0x182 [lvfs]
      [<ffffffffa078528a>] nrs_orr_range_fill_physical+0x18a/0x540 [ptlrpc]
      [<ffffffffa0762dd6>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      [<ffffffffa073e630>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc]
      [<ffffffffa07871d7>] nrs_orr_res_get+0x817/0xb80 [ptlrpc]
      [<ffffffffa077d306>] nrs_resource_get+0x56/0x110 [ptlrpc]
      [<ffffffffa077dccb>] nrs_resource_get_safe+0x8b/0x100 [ptlrpc]
      [<ffffffffa0780248>] ptlrpc_nrs_req_initialize+0x38/0x90 [ptlrpc]
      [<ffffffffa074cff0>] ptlrpc_main+0x1170/0x16f0 [ptlrpc]
      [<ffffffffa074be80>] ? ptlrpc_main+0x0/0x16f0 [ptlrpc]
      [<ffffffff8100c0ca>] child_rip+0xa/0x20
      [<ffffffffa074be80>] ? ptlrpc_main+0x0/0x16f0 [ptlrpc]
      [<ffffffffa074be80>] ? ptlrpc_main+0x0/0x16f0 [ptlrpc]
      [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

      LustreError: dumping log to /tmp/lustre-log.1367125716.19182
      LustreError: 19190:0:(ofd_internal.h:518:ofd_info_init()) LBUG
      Pid: 19190, comm: ll_ost_io03_000

      Call Trace:
      [<ffffffffa044e895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [<ffffffffa044ee97>] lbug_with_loc+0x47/0xb0 [libcfs]
      [<ffffffffa0e03e62>] ofd_info_init+0x92/0x130 [ofd]
      [<ffffffffa0e05835>] ofd_get_info+0x2e5/0xa90 [ofd]
      [<ffffffff812805cd>] ? pointer+0x8d/0x830
      [<ffffffffa029f7e5>] ? lprocfs_counter_add+0x125/0x182 [lvfs]
      [<ffffffffa078528a>] nrs_orr_range_fill_physical+0x18a/0x540 [ptlrpc]
      [<ffffffffa0762dd6>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      [<ffffffffa073e630>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc]
      [<ffffffffa07871d7>] nrs_orr_res_get+0x817/0xb80 [ptlrpc]
      [<ffffffffa077d306>] nrs_resource_get+0x56/0x110 [ptlrpc]
      [<ffffffffa077dccb>] nrs_resource_get_safe+0x8b/0x100 [ptlrpc]
      [<ffffffffa0780248>] ptlrpc_nrs_req_initialize+0x38/0x90 [ptlrpc]
      [<ffffffffa074cff0>] ptlrpc_main+0x1170/0x16f0 [ptlrpc]
      [<ffffffffa074be80>] ? ptlrpc_main+0x0/0x16f0 [ptlrpc]
      [<ffffffff8100c0ca>] child_rip+0xa/0x20
      [<ffffffffa074be80>] ? ptlrpc_main+0x0/0x16f0 [ptlrpc]
      [<ffffffffa074be80>] ? ptlrpc_main+0x0/0x16f0 [ptlrpc]
      [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

      LustreError: dumping log to /tmp/lustre-log.1367125719.19190
      LNet: Service thread pid 19182 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Pid: 19182, comm: ll_ost_io00_001

      I set nrs on mds to crrn, oss02 to orr, oss03 to trr and run parallel-scale

      Attachments

        Issue Links

          Activity

            [LU-3239] ofd_internal.h:518:ofd_info_init()) ASSERTION( info )
            di.wang Di Wang added a comment -

            hmm, yes, probably env_refill should be better. I change it back to env_refill here in the update patch.

            di.wang Di Wang added a comment - hmm, yes, probably env_refill should be better. I change it back to env_refill here in the update patch.

            get_info(KEY_FIEMAP) is used to order requests with regard physical offset, AFAIK.

            bzzz Alex Zhuravlev added a comment - get_info(KEY_FIEMAP) is used to order requests with regard physical offset, AFAIK.
            di.wang Di Wang added a comment -

            hmm, only for ofd_get_info, because get_info is the only use case from handle_request_in, where env(le_ses) is not initialized correctly. so I figured initializing env(le_ses) for all requests in req_in might not worth. Only do this env_init in ofd_get_info should be enough. Besides we can not use env_refill here, because le_ses might be in some uninitialized state, you can see from Minh's test. Hmm, I did not see ofd_get_info is called in the process of read/write. Do I miss sth?

            di.wang Di Wang added a comment - hmm, only for ofd_get_info, because get_info is the only use case from handle_request_in, where env(le_ses) is not initialized correctly. so I figured initializing env(le_ses) for all requests in req_in might not worth. Only do this env_init in ofd_get_info should be enough. Besides we can not use env_refill here, because le_ses might be in some uninitialized state, you can see from Minh's test. Hmm, I did not see ofd_get_info is called in the process of read/write. Do I miss sth?

            env_refill() is fine, but recreation env from scratch on potentially every read/write is not, i think.

            bzzz Alex Zhuravlev added a comment - env_refill() is fine, but recreation env from scratch on potentially every read/write is not, i think.
            di.wang Di Wang added a comment -

            well, those envs might be initialized too early before ofd stack setup. that is why we do env_refill in ost_handle?

            di.wang Di Wang added a comment - well, those envs might be initialized too early before ofd stack setup. that is why we do env_refill in ost_handle?

            we do already have env in ptlrpc_main()..

            bzzz Alex Zhuravlev added a comment - we do already have env in ptlrpc_main()..

            I tend to agree, if this is the only single case.

            tappro Mikhail Pershin added a comment - I tend to agree, if this is the only single case.
            di.wang Di Wang added a comment - - edited

            I just updated the patch. To avoid the hassle in handle_req_in, ofd_get_info(KEY_FIEMAP) will initialize the env itself, since ofd_get_info(KEY_FIEMAP) is the only use case from handle_req_in, adding env initialization in handle_req_in might not worth for now.

            di.wang Di Wang added a comment - - edited I just updated the patch. To avoid the hassle in handle_req_in, ofd_get_info(KEY_FIEMAP) will initialize the env itself, since ofd_get_info(KEY_FIEMAP) is the only use case from handle_req_in, adding env initialization in handle_req_in might not worth for now.
            di.wang Di Wang added a comment -

            yes, I forgot this is in handle_req_in, instead of handle_request.

            di.wang Di Wang added a comment - yes, I forgot this is in handle_req_in, instead of handle_request.

            This is not about missed keys but non-initialized context at all because obd_get_info() is called not from ptlrpc_server_handle_request(). We need all lu_context setup things in ptlrpc_server_handle_req_in like that is done in ptlrpc_server_handle_request() since ptlrpc_server_handle_req_in() can cause calls to device stack now.

            tappro Mikhail Pershin added a comment - This is not about missed keys but non-initialized context at all because obd_get_info() is called not from ptlrpc_server_handle_request(). We need all lu_context setup things in ptlrpc_server_handle_req_in like that is done in ptlrpc_server_handle_request() since ptlrpc_server_handle_req_in() can cause calls to device stack now.
            mdiep Minh Diep added a comment -

            Wang Di,

            I hit this after running your patch

            general protection fault: 0000 1 SMP ^M
            last sysfs file: /sys/devices/pci0000:80/0000:80:05.0/0000:84:00.0/local_cpus^M
            CPU 0 ^M
            Modules linked in: osp(U) ofd(U) ost(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) ldiskfs(U) lquota(U) jbd2 mdd(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic libcfs(U) nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 microcode serio_raw mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core ses enclosure sg igb dca ptp pps_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mpt2sas scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]^M
            ^M
            Pid: 13927, comm: ll_ost_io00_003 Not tainted 2.6.32-358.6.1.el6_lustre.gce942d0.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH^M
            RIP: 0010:[<ffffffffa05aa678>] [<ffffffffa05aa678>] keys_fill+0x48/0x190 [obdclass]^M
            RSP: 0018:ffff8806a9a4d9c0 EFLAGS: 00010246^M
            RAX: 5a5a5a5a5a5a5a5a RBX: 0000000000000000 RCX: 0000000000000000^M
            RDX: 00000000000000f8 RSI: ffff8806a9a4dbd6 RDI: ffff88069d88af40^M
            RBP: ffff8806a9a4da00 R08: 0000000000000000 R09: ffff8806a9a4dcd0^M
            R10: 000000000000000f R11: 0000000000000001 R12: ffff88069d88af40^M
            R13: ffffffffa0613a00 R14: ffff8806a9a4dbd0 R15: ffff880fb6667800^M
            FS: 00007f4b3d486700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000^M
            CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M
            CR2: 00000000006d4420 CR3: 00000010335b5000 CR4: 00000000000007f0^M
            DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M
            DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400^M
            Process ll_ost_io00_003 (pid: 13927, threadinfo ffff8806a9a4c000, task ffff8806b04deaa0)^M
            Stack:^M
            ffffffffffffffff 0000000000000000 0000000000000001 ffff8806ae9a06c0^M
            <d> ffff880fccc03000 0000000000000000 ffff8806a9a4dbd0 ffff880fb6667800^M
            <d> ffff8806a9a4da10 ffffffffa05aa7dd ffff8806a9a4da30 ffffffffa05aa818^M
            Call Trace:^M
            [<ffffffffa05aa7dd>] lu_context_refill+0x1d/0x30 [obdclass]^M
            [<ffffffffa05aa818>] lu_env_refill+0x28/0x30 [obdclass]^M
            [<ffffffffa0e06852>] ofd_get_info+0x302/0xba0 [ofd]^M
            [<ffffffff812805fd>] ? pointer+0x8d/0x830^M
            [<ffffffffa029f7e5>] ? lprocfs_counter_add+0x125/0x182 [lvfs]^M
            [<ffffffffa078528a>] nrs_orr_range_fill_physical+0x18a/0x540 [ptlrpc]^M
            [<ffffffffa0762dd6>] ? __req_capsule_get+0x166/0x700 [ptlrpc]^M
            [<ffffffffa073e630>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc]^M
            [<ffffffffa07871c6>] nrs_orr_res_get+0x806/0xb90 [ptlrpc]^M
            [<ffffffffa077d306>] nrs_resource_get+0x56/0x110 [ptlrpc]^M
            [<ffffffffa077dccb>] nrs_resource_get_safe+0x8b/0x100 [ptlrpc]^M
            [<ffffffffa0780248>] ptlrpc_nrs_req_initialize+0x38/0x90 [ptlrpc]^M
            [<ffffffffa074cff0>] ptlrpc_main+0x1170/0x16f0 [ptlrpc]^M
            [<ffffffffa074be80>] ? ptlrpc_main+0x0/0x16f0 [ptlrpc]^M
            [<ffffffff8100c0ca>] child_rip+0xa/0x20^M
            [<ffffffffa074be80>] ? ptlrpc_main+0x0/0x16f0 [ptlrpc]^M
            [<ffffffffa074be80>] ? ptlrpc_main+0x0/0x16f0 [ptlrpc]^M
            [<ffffffff8100c0c0>] ? child_rip+0x0/0x20^M

            mdiep Minh Diep added a comment - Wang Di, I hit this after running your patch general protection fault: 0000 1 SMP ^M last sysfs file: /sys/devices/pci0000:80/0000:80:05.0/0000:84:00.0/local_cpus^M CPU 0 ^M Modules linked in: osp(U) ofd(U) ost(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) ldiskfs(U) lquota(U) jbd2 mdd(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic libcfs(U) nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 microcode serio_raw mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core ses enclosure sg igb dca ptp pps_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mpt2sas scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] ^M ^M Pid: 13927, comm: ll_ost_io00_003 Not tainted 2.6.32-358.6.1.el6_lustre.gce942d0.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH^M RIP: 0010: [<ffffffffa05aa678>] [<ffffffffa05aa678>] keys_fill+0x48/0x190 [obdclass] ^M RSP: 0018:ffff8806a9a4d9c0 EFLAGS: 00010246^M RAX: 5a5a5a5a5a5a5a5a RBX: 0000000000000000 RCX: 0000000000000000^M RDX: 00000000000000f8 RSI: ffff8806a9a4dbd6 RDI: ffff88069d88af40^M RBP: ffff8806a9a4da00 R08: 0000000000000000 R09: ffff8806a9a4dcd0^M R10: 000000000000000f R11: 0000000000000001 R12: ffff88069d88af40^M R13: ffffffffa0613a00 R14: ffff8806a9a4dbd0 R15: ffff880fb6667800^M FS: 00007f4b3d486700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000^M CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M CR2: 00000000006d4420 CR3: 00000010335b5000 CR4: 00000000000007f0^M DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400^M Process ll_ost_io00_003 (pid: 13927, threadinfo ffff8806a9a4c000, task ffff8806b04deaa0)^M Stack:^M ffffffffffffffff 0000000000000000 0000000000000001 ffff8806ae9a06c0^M <d> ffff880fccc03000 0000000000000000 ffff8806a9a4dbd0 ffff880fb6667800^M <d> ffff8806a9a4da10 ffffffffa05aa7dd ffff8806a9a4da30 ffffffffa05aa818^M Call Trace:^M [<ffffffffa05aa7dd>] lu_context_refill+0x1d/0x30 [obdclass] ^M [<ffffffffa05aa818>] lu_env_refill+0x28/0x30 [obdclass] ^M [<ffffffffa0e06852>] ofd_get_info+0x302/0xba0 [ofd] ^M [<ffffffff812805fd>] ? pointer+0x8d/0x830^M [<ffffffffa029f7e5>] ? lprocfs_counter_add+0x125/0x182 [lvfs] ^M [<ffffffffa078528a>] nrs_orr_range_fill_physical+0x18a/0x540 [ptlrpc] ^M [<ffffffffa0762dd6>] ? __req_capsule_get+0x166/0x700 [ptlrpc] ^M [<ffffffffa073e630>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc] ^M [<ffffffffa07871c6>] nrs_orr_res_get+0x806/0xb90 [ptlrpc] ^M [<ffffffffa077d306>] nrs_resource_get+0x56/0x110 [ptlrpc] ^M [<ffffffffa077dccb>] nrs_resource_get_safe+0x8b/0x100 [ptlrpc] ^M [<ffffffffa0780248>] ptlrpc_nrs_req_initialize+0x38/0x90 [ptlrpc] ^M [<ffffffffa074cff0>] ptlrpc_main+0x1170/0x16f0 [ptlrpc] ^M [<ffffffffa074be80>] ? ptlrpc_main+0x0/0x16f0 [ptlrpc] ^M [<ffffffff8100c0ca>] child_rip+0xa/0x20^M [<ffffffffa074be80>] ? ptlrpc_main+0x0/0x16f0 [ptlrpc] ^M [<ffffffffa074be80>] ? ptlrpc_main+0x0/0x16f0 [ptlrpc] ^M [<ffffffff8100c0c0>] ? child_rip+0x0/0x20^M

            People

              di.wang Di Wang
              mdiep Minh Diep
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: