Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9328

if there is error starting OSP, device is not cleaned up

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.9.0
    • None
    • 3
    • 9223372036854775807

    Description

      I've been running into an issue where if gss_sec_lookup_ctx_kr fails during the osp connect, there is a problem unloading the OSP device:

      1. lctl dl
        0 UP osd-ldiskfs test-MDT0000-osd test-MDT0000-osd_UUID 4
        8 IN osp test-OST0000-osc-MDT0000 test-MDT0000-mdtlov_UUID 5

      Here are the debug messages:

      00000100:00000001:0.0:1492030557.994779:0:2267:0:(import.c:657:ptlrpc_connect_import()) Process entered
      00000100:00080000:0.0:1492030557.994780:0:2267:0:(import.c:675:ptlrpc_connect_import()) ffff880010d7f800 test-OST0000_UUID: changing import state from NEW to CONNECTING
      00000100:00000001:0.0:1492030557.994782:0:2267:0:(import.c:504:import_select_connection()) Process entered
      00000100:00080000:0.0:1492030557.994784:0:2267:0:(import.c:519:import_select_connection()) test-OST0000-osc-MDT0000: connect to NID 192.168.122.32@tcp last attempt 0
      00000100:00000001:0.0:1492030557.994786:0:2267:0:(connection.c:126:ptlrpc_connection_addref()) Process entered
      00000100:00000040:0.0:1492030557.994788:0:2267:0:(connection.c:131:ptlrpc_connection_addref()) conn=ffff880010da0420 refcount 3 to 192.168.122.32@tcp
      00000100:00000001:0.0:1492030557.994789:0:2267:0:(connection.c:133:ptlrpc_connection_addref()) Process leaving (rc=18446612132596941856 : -131941112609760 : ffff880010da0420)
      00000020:00000001:0.0:1492030557.994791:0:2267:0:(genops.c:707:class_conn2export()) Process entered
      00000020:00000040:0.0:1492030557.994791:0:2267:0:(genops.c:719:class_conn2export()) looking for export cookie 0xc5f1c14268facf09
      00000020:00000001:0.0:1492030557.994792:0:2267:0:(lustre_handles.c:156:class_handle2object()) Process entered
      00000020:00000040:0.0:1492030557.994792:0:2267:0:(genops.c:803:class_export_get()) GETting export ffff8800108f1400 : new refcount 3
      00000020:00000001:0.0:1492030557.994793:0:2267:0:(lustre_handles.c:179:class_handle2object()) Process leaving (rc=18446612132592030720 : -131941117520896 : ffff8800108f1400)
      00000020:00000001:0.0:1492030557.994795:0:2267:0:(genops.c:721:class_conn2export()) Process leaving (rc=18446612132592030720 : -131941117520896 : ffff8800108f1400)
      00000100:00000001:0.0:1492030557.994796:0:2267:0:(connection.c:126:ptlrpc_connection_addref()) Process entered
      00000100:00000040:0.0:1492030557.994797:0:2267:0:(connection.c:131:ptlrpc_connection_addref()) conn=ffff880010da0420 refcount 4 to 192.168.122.32@tcp
      00000100:00000001:0.0:1492030557.994798:0:2267:0:(connection.c:133:ptlrpc_connection_addref()) Process leaving (rc=18446612132596941856 : -131941112609760 : ffff880010da0420)
      00000020:00000040:0.0:1492030557.994799:0:2267:0:(genops.c:813:class_export_put()) PUTting export ffff8800108f1400 : new refcount 2
      00000100:00080000:0.0:1492030557.994801:0:2267:0:(import.c:597:import_select_connection()) test-OST0000-osc-MDT0000: import ffff880010d7f800 using connection 192.168.122.32@tcp/192.168.122.32@tcp
      00000100:00000001:0.0:1492030557.994802:0:2267:0:(import.c:601:import_select_connection()) Process leaving (rc=0 : 0 : 0)
      02000000:00000001:0.0:1492030557.994804:0:2267:0:(sec.c:1434:sptlrpc_import_sec_adapt()) Process entered
      02000000:00000001:0.0:1492030557.994805:0:2267:0:(sec_config.c:900:sptlrpc_conf_choose_flavor()) Process entered
      02000000:00000400:0.0:1492030557.994806:0:2267:0:(sec_config.c:567:target2fsname()) tgt: test-OST0000_UUID fsname test
      02000000:00000001:0.0:1492030557.994808:0:2267:0:(sec_config.c:632:sptlrpc_conf_get()) Process entered
      02000000:00000001:0.0:1492030557.994809:0:2267:0:(sec_config.c:604:sptlrpc_conf_get_tgt()) Process entered
      02000000:00000001:0.0:1492030557.994810:0:2267:0:(sec_config.c:423:sptlrpc_rule_set_choose()) Process entered
      02000000:00000400:0.0:1492030557.994811:0:2267:0:(sec_config.c:425:sptlrpc_rule_set_choose()) dumping rule set, rule count now: 3
      02000000:08000000:0.0:1492030557.994812:0:2267:0:(sec_config.c:458:sptlrpc_rule_set_dump()) <00> from 0 to 2, net 20000, rpc 20222
      02000000:08000000:0.0:1492030557.994814:0:2267:0:(sec_config.c:458:sptlrpc_rule_set_dump()) <01> from 0 to 1, net 20000, rpc 20222
      02000000:08000000:0.0:1492030557.994815:0:2267:0:(sec_config.c:458:sptlrpc_rule_set_dump()) <02> from ff to ff, net ffffffff, rpc 20222
      02000000:00000001:0.0:1492030557.994817:0:2267:0:(sec_config.c:444:sptlrpc_rule_set_choose()) Process leaving (rc=1 : 1 : 1)
      02000000:08000000:0.0:1492030557.994819:0:2267:0:(sec.c:1487:sptlrpc_import_sec_adapt()) import test-OST0000-osc-MDT0000->192.168.122.32@tcp netid 20000: select flavor ski
      02000000:00000001:0.0:1492030557.994820:0:2267:0:(sec.c:1331:sptlrpc_sec_create()) Process entered
      02000000:08000000:0.0:1492030557.994821:0:2267:0:(sec.c:1349:sptlrpc_sec_create()) osp test-OST0000-osc-MDT0000: select security flavor ski
      02000000:00000001:0.0:1492030557.994824:0:2267:0:(gss_keyring.c:613:gss_sec_create_kr()) Process entered
      02000000:00000010:0.0:1492030557.994825:0:2267:0:(gss_keyring.c:615:gss_sec_create_kr()) kmalloced 'gsec_kr': 208 at ffff88003c962100.
      00000020:00000040:0.0:1492030557.994827:0:2267:0:(genops.c:1006:class_import_get()) import ffff880010d7f800 refcount=2 obd=test-OST0000-osc-MDT0000
      02000000:08000000:0.0:1492030557.994829:0:2267:0:(sec_gss.c:1120:gss_sec_create_common()) create gss.keyring@ffff88003c962100
      02000000:00000001:0.0:1492030557.994830:0:2267:0:(gss_keyring.c:636:gss_sec_create_kr()) Process leaving (rc=18446612133330690304 : -131940378861312 : ffff88003c962100)
      02000000:08000000:0.0:1492030557.994832:0:2267:0:(sec_gc.c:75:sptlrpc_gc_add_sec()) added sec ffff88003c962100(gss.keyring)
      02000000:00000001:0.0:1492030557.994833:0:2267:0:(sec.c:1370:sptlrpc_sec_create()) Process leaving (rc=18446612133330690304 : -131940378861312 : ffff88003c962100)
      02000000:00000001:0.0:1492030557.994835:0:2267:0:(sec.c:1505:sptlrpc_import_sec_adapt()) Process leaving (rc=0 : 0 : 0)
      00000100:00000001:0.0:1492030557.994836:0:2267:0:(obd_class.h:822:obd_reconnect()) Process entered
      00000100:00000001:0.0:1492030557.994837:0:2267:0:(obd_class.h:832:obd_reconnect()) Process leaving (rc=0 : 0 : 0)
      00000100:00000010:0.0:1492030557.994839:0:2267:0:(client.c:492:ptlrpc_request_cache_alloc()) slab-alloced 'req': 752 at ffff880035bcc000.
      00000020:00000040:0.0:1492030557.994840:0:2267:0:(genops.c:1006:class_import_get()) import ffff880010d7f800 refcount=3 obd=test-OST0000-osc-MDT0000
      00000100:00000001:0.0:1492030557.994842:0:2267:0:(client.c:699:ptlrpc_request_bufs_pack()) Process entered
      02000000:00000001:0.0:1492030557.994843:0:2267:0:(sec.c:434:sptlrpc_req_get_ctx()) Process entered
      02000000:00000001:0.0:1492030557.994844:0:2267:0:(gss_keyring.c:728:gss_sec_lookup_ctx_kr()) Process entered
      02000000:00000010:0.0:1492030557.994847:0:2267:0:(gss_keyring.c:823:gss_sec_lookup_ctx_kr()) kmalloced 'coinfo': 232 at ffff88000fc04900.
      02000000:08000000:0.0:1492030557.994850:0:2267:0:(gss_keyring.c:834:gss_sec_lookup_ctx_kr()) requesting key for 0@2
      02000000:00000010:0.0:1492030557.999044:0:2267:0:(gss_keyring.c:840:gss_sec_lookup_ctx_kr()) kfreed 'coinfo': 232 at ffff88000fc04900.
      02000000:00020000:0.0:1492030557.999047:0:2267:0:(gss_keyring.c:843:gss_sec_lookup_ctx_kr()) failed request key: -126
      02000000:00000001:0.0:1492030557.999095:0:2267:0:(gss_keyring.c:894:gss_sec_lookup_ctx_kr()) Process leaving (rc=0 : 0 : 0)
      02000000:00020000:0.0:1492030557.999096:0:2267:0:(sec.c:448:sptlrpc_req_get_ctx()) req ffff880035bcc000: fail to get context
      02000000:00000001:0.0:1492030557.999133:0:2267:0:(sec.c:449:sptlrpc_req_get_ctx()) Process leaving (rc=18446744073709551505 : -111 : ffffffffffffff91)
      00000100:00000001:0.0:1492030557.999135:0:2267:0:(client.c:710:ptlrpc_request_bufs_pack()) Process leaving via out_free (rc=18446744073709551505 : -111 : 0xffffffffffffff91)
      00000020:00000001:0.0:1492030557.999136:0:2267:0:(genops.c:1013:class_import_put()) Process entered
      00000020:00000040:0.0:1492030557.999137:0:2267:0:(genops.c:1020:class_import_put()) import ffff880010d7f800 refcount=2 obd=test-OST0000-osc-MDT0000
      00000020:00000001:0.0:1492030557.999138:0:2267:0:(genops.c:1029:class_import_put()) Process leaving
      00000100:00000010:0.0:1492030557.999140:0:2267:0:(client.c:498:ptlrpc_request_cache_free()) slab-freed '(req)': 752 at ffff880035bcc000.
      00000100:00000001:0.0:1492030557.999141:0:2267:0:(import.c:719:ptlrpc_connect_import()) Process leaving via out (rc=18446744073709551505 : -111 : 0xffffffffffffff91)
      00000100:00080000:0.0:1492030557.999143:0:2267:0:(import.c:768:ptlrpc_connect_import()) ffff880010d7f800 test-OST0000_UUID: changing import state from CONNECTING to DISCONN
      00000100:00000001:0.0:1492030557.999144:0:2267:0:(import.c:770:ptlrpc_connect_import()) Process leaving (rc=18446744073709551505 : -111 : ffffffffffffff91)
      00000004:00020000:0.0:1492030557.999145:0:2267:0:(osp_dev.c:1435:osp_obd_connect()) test-OST0000-osc-MDT0000: can't connect obd: rc = -111
      00000004:00000001:0.0:1492030557.999186:0:2267:0:(osp_dev.c:1436:osp_obd_connect()) Process leaving via out (rc=18446744073709551505 : -111 : 0xffffffffffffff91)
      00000004:00000001:0.0:1492030557.999187:0:2267:0:(osp_dev.c:1444:osp_obd_connect()) Process leaving (rc=18446744073709551505 : -111 : ffffffffffffff91)
      00000004:00000001:0.0:1492030557.999188:0:2267:0:(obd_class.h:808:obd_connect()) Process leaving (rc=18446744073709551505 : -111 : ffffffffffffff91)
      00000004:00000010:0.0:1492030557.999190:0:2267:0:(lod_lov.c:299:lod_add_device()) kfreed 'data': 192 at ffff8800108f9600.
      00000004:00020000:0.0:1492030557.999190:0:2267:0:(lod_lov.c:302:lod_add_device()) test-OST0000-osc-MDT0000: cannot connect to next dev test-OST0000_UUID (-111)
      00000004:00000001:0.0:1492030557.999238:0:2267:0:(lod_lov.c:303:lod_add_device()) Process leaving via out_cleanup (rc=18446744073709551505 : -111 : 0xffffffffffffff91)
      00000004:00000001:0.0:1492030557.999240:0:2267:0:(osp_dev.c:629:osp_process_config()) Process entered
      00000004:00000001:0.0:1492030557.999241:0:2267:0:(osp_dev.c:586:osp_shutdown()) Process entered
      00000100:00000001:0.0:1492030557.999242:0:2267:0:(import.c:211:ptlrpc_deactivate_and_unlock_import()) Process entered
      00000100:00080000:0.0:1492030557.999242:0:2267:0:(import.c:214:ptlrpc_deactivate_and_unlock_import()) setting import test-OST0000_UUID INVALID
      00000100:00000001:0.0:1492030557.999243:0:2267:0:(client.c:3089:ptlrpc_abort_inflight()) Process entered
      00000100:00000001:0.0:1492030557.999244:0:2267:0:(client.c:3139:ptlrpc_abort_inflight()) Process leaving
      00000100:00000001:0.0:1492030557.999245:0:2267:0:(obd_class.h:1141:obd_import_event()) Process entered
      00000004:00080000:0.0:1492030557.999247:0:2267:0:(osp_dev.c:1604:osp_import_event()) got inactive
      00000100:00000001:0.0:1492030557.999248:0:2267:0:(obd_class.h:1151:obd_import_event()) Process leaving
      00000100:00000001:0.0:1492030557.999248:0:2267:0:(import.c:222:ptlrpc_deactivate_and_unlock_import()) Process leaving
      00000100:00000001:0.0:1492030557.999249:0:2267:0:(pinger.c:426:ptlrpc_pinger_del_import()) Process entered
      00000100:00000001:0.0:1492030557.999250:0:2267:0:(pinger.c:429:ptlrpc_pinger_del_import()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe)
      00000100:00000001:0.0:1492030557.999251:0:2267:0:(import.c:1560:ptlrpc_disconnect_import()) Process entered
      00000100:00000001:0.0:1492030557.999251:0:2267:0:(import.c:1609:ptlrpc_disconnect_import()) Process leaving via out (rc=0 : 0 : 0x0)
      00000100:00080000:0.0:1492030557.999252:0:2267:0:(import.c:1638:ptlrpc_disconnect_import()) ffff880010d7f800 test-OST0000_UUID: changing import state from DISCONN to CLOSED
      00000100:00000001:0.0:1492030557.999253:0:2267:0:(import.c:1644:ptlrpc_disconnect_import()) Process leaving (rc=0 : 0 : 0)
      00000100:00100000:0.0:1492030557.999254:0:2267:0:(import.c:318:ptlrpc_invalidate_import()) Sleeping 100 sec for inflight to error out
      00000100:00000001:0.0:1492030557.999255:0:2267:0:(obd_class.h:1141:obd_import_event()) Process entered
      00000001:00000001:0.0:1492030557.999256:0:2267:0:(hash.c:1663:cfs_hash_for_each_nolock()) Process entered
      00000001:00000001:0.0:1492030557.999257:0:2267:0:(hash.c:1578:cfs_hash_for_each_relax()) Process entered
      00000001:00000001:0.0:1492030557.999275:0:2267:0:(hash.c:1679:cfs_hash_for_each_nolock()) Process leaving (rc=0 : 0 : 0)
      00000001:00000001:0.0:1492030557.999276:0:2267:0:(hash.c:1663:cfs_hash_for_each_nolock()) Process entered
      00000001:00000001:0.0:1492030557.999276:0:2267:0:(hash.c:1578:cfs_hash_for_each_relax()) Process entered
      00000001:00000001:0.0:1492030557.999291:0:2267:0:(hash.c:1679:cfs_hash_for_each_nolock()) Process leaving (rc=0 : 0 : 0)
      00000100:00000001:0.0:1492030557.999292:0:2267:0:(obd_class.h:1151:obd_import_event()) Process leaving
      02000000:00000001:0.0:1492030557.999293:0:2267:0:(gss_keyring.c:1011:gss_sec_flush_ctx_cache_kr()) Process entered
      02000000:08000000:0.0:1492030557.999293:0:2267:0:(gss_keyring.c:1016:gss_sec_flush_ctx_cache_kr()) sec ffff88003c962100(2, nctx 0), uid -1, grace 1, force 1
      02000000:00000001:0.0:1492030557.999295:0:2267:0:(gss_keyring.c:965:flush_spec_ctx_cache_kr()) Process entered
      02000000:00000001:0.0:1492030557.999295:0:2267:0:(gss_keyring.c:1004:flush_spec_ctx_cache_kr()) Process leaving
      02000000:00000001:0.0:1492030557.999296:0:2267:0:(gss_keyring.c:1023:gss_sec_flush_ctx_cache_kr()) Process leaving (rc=0 : 0 : 0)
      00000004:00000001:0.0:1492030557.999297:0:2267:0:(osp_dev.c:481:osp_disconnect()) Process leaving (rc=0 : 0 : 0)
      00000004:00000001:0.0:1492030557.999298:0:2267:0:(osp_sync.c:1470:osp_sync_fini()) Process entered
      00000004:00080000:0.0:1492030557.999452:0:2276:0:(osp_sync.c:1100:osp_sync_process_queues()) stop llog processing
      

      You can see that gss_sec_create_kr is called, which takes a reference on the import, but gss_sec_destroy_kr is never called. It looks like sptlrpc_sec_put is what should be called, perhaps via sptlrpc_import_sec_put and client_destroy_import. I don't know enough about the osp life cycle to say for sure though.

      Attachments

        Activity

          People

            wc-triage WC Triage
            kit.westneat Kit Westneat
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: