[LU-11243]  Assertion and hang upon lod_add_device failure Created: 14/Aug/18  Updated: 01/Apr/19  Resolved: 08/Mar/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0, Lustre 2.10.5
Fix Version/s: Lustre 2.13.0, Lustre 2.12.1

Type: Bug Priority: Major
Reporter: Wang Shilong (Inactive) Assignee: Wang Shilong (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

See following assertion:

    (lod_lov.c:361:lod_add_device()) lustre-OSTe42a-osc-MDT0000: can't set up pool, failed with -12
    10059:0:(osp_dev.c:473:osp_disconnect()) ASSERTION( imp != ((void *)0) ) failed:
    10059:0:(osp_dev.c:473:osp_disconnect()) LBUG
    CPU: 1 PID: 10059 Comm: llog_process_th Kdump: loaded Tainted: G
    
    Problem is obd_disconnect() will cleanup @imp and set NULL.
    |->osp_obd_disconnect
        |->class_manual_cleanup
           |->class_process_config
                 |->class_cleanup
                      |->obd_precleanup
                            |->osp_device_fini
                                  |->client_obd_cleanup
    
    While ldo_process_config() will try to access @imp again:
    |->ldo_process_config
      |->osp_shutdown
       |->osp_disconnect
         LASSERT(imp != NULL) --->fail here
    
    Another problem is if we failed before obd_connect().
    we will hang on with mount:
    |->ldo_process_config
      |->osp_shutdown
       |->osp_disconnect
        |->ptlrpc_disconnect_import
         |->rc = l_wait_event(imp->imp_recovery_waitq,
                !ptlrpc_import_in_recovery(imp), &lwi);

Since connect is not called, imp state will keep DISCONN.



 Comments   
Comment by Gerrit Updater [ 14/Aug/18 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/32994
Subject: LU-11243 lod: fix assertion and hang upon lod_add_device failure
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b5d19865a70f46890ce488d4950384001f48cbfb

Comment by Gerrit Updater [ 08/Mar/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32994/
Subject: LU-11243 lod: fix assertion and hang upon lod_add_device failure
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f28353b3d810cfbec018a263556ceac84ab9413e

Comment by Peter Jones [ 08/Mar/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 19/Mar/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34450
Subject: LU-11243 lod: fix assertion and hang upon lod_add_device failure
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 494bd1a49962b701dbcd0e0aa6c2fa53f4aabc6c

Comment by Gerrit Updater [ 01/Apr/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34450/
Subject: LU-11243 lod: fix assertion and hang upon lod_add_device failure
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 6c13ed6decc680092d3610518dc30aba40c83563

Generated at Sat Feb 10 02:42:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.