Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11241

Unable to umount during MDT fail back

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.10.5
    • 3
    • 9223372036854775807

    Description

      env: 2.10.5-RC1 b2_10-ib build #70

      During soak test, I found the failover MDT(MDT3) can not be umounted on MDS2, which cause the MDT can not be fall back to the original MDS(MDS3), then the whole recovery process stuck.

      MDS 2 console

      LustreError: 0-0: Forced cleanup waiting for soaked-MDT0000-osp-MDT0003 namespace with 3 resources in use, (rc=-110)
       Lustre: 3899:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1534181819/real 1534181819]  req@ffffa06fd3332a00 x1608657214545248/t0(0) o38->soaked-MDT0003-osp-MDT0002@192.168.1.111@o2ib:24/4 lens 520/544 e 0 to 1 dl 1534181830 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1^M
      [58161.752944] Lustre: 3899:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 6 previous similar messages^M
      LustreError: 0-0: Forced cleanup waiting for soaked-MDT0000-osp-MDT0003 namespace with 3 resources in use, (rc=-110)
      LustreError: 0-0: Forced cleanup waiting for soaked-MDT0000-osp-MDT0003 namespace with 3 resources in use, (rc=-110)
      

      The happened after 30+ hours running (36 times successful recovery), and it only failover MDS during the test.

      On MDS0, I can not see the stack trace in console log but found this, is this an known issue?

      [52069.665828] LNet: 3948:0:(linux-debug.c:185:libcfs_call_trace()) can't show stack: kernel doesn't export show_task^M
      [52069.681120] LNet: 3948:0:(linux-debug.c:185:libcfs_call_trace()) Skipped 3 previous similar messages^M
      [52069.694717] LustreError: dumping log to /tmp/lustre-log.1534184159.5310^M
      

      Also there are many multipath error seen on all MDS during failover. BTW, when the system first time mounted before soak started, this kind of error caused MDS2 mount hung, but after reboot, it seems gone, not sure if it is related hardware problem.

      [64147.395418] device-mapper: multipath: Reinstating path 8:128.^M
      [64147.403572] device-mapper: multipath: Failing path 8:128.^M
      [64149.405305] device-mapper: multipath: Reinstating path 8:96.^M
      [64149.413410] device-mapper: multipath: Failing path 8:96.^M
      

      I will run the soak again with both MDS and OSS failover/restart enabled, and see if this problem can be reproduced.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: