Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4385

replay-single test 61d causes oops in osd_device_fini()

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • None
    • None
    • 3
    • 12020

    Description

      My local test runs shows this bug almost every time in test 61d replay-single.sh

      Dec 14 13:09:17 nodez kernel: Lustre: DEBUG MARKER: == replay-single test 61d: error in llog_setup should cleanup the llog context correctly == 13:09:16 (1387012156)
      Dec 14 13:09:17 nodez kernel: Lustre: Failing over lustre-MDT0000
      Dec 14 13:09:17 nodez kernel: Lustre: server umount lustre-MDT0000 complete
      Dec 14 13:09:17 nodez kernel: LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quota=on. Opts: 
      Dec 14 13:09:17 nodez kernel: Lustre: *** cfs_fail_loc=605, val=0***
      Dec 14 13:09:17 nodez kernel: LustreError: 8279:0:(llog_obd.c:207:llog_setup()) MGS: ctxt 0 lop_setup=ffffffffa0e26d90 failed: rc = -95
      Dec 14 13:09:17 nodez kernel: LustreError: 8279:0:(obd_config.c:572:class_setup()) setup MGS failed (-95)
      Dec 14 13:09:17 nodez kernel: LustreError: 8279:0:(obd_mount.c:199:lustre_start_simple()) MGS setup error -95
      Dec 14 13:09:17 nodez kernel: LustreError: 8279:0:(obd_mount_server.c:134:server_deregister_mount()) MGS not registered
      Dec 14 13:09:17 nodez kernel: LustreError: 15e-a: Failed to start MGS 'MGS' (-95). Is the 'mgs' module loaded?
      Dec 14 13:09:17 nodez kernel: LustreError: 8279:0:(obd_mount_server.c:844:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client.
      Dec 14 13:09:17 nodez kernel: LustreError: 8279:0:(obd_mount_server.c:1419:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2)
      Dec 14 13:09:17 nodez kernel: LustreError: 8279:0:(obd_mount_server.c:1449:server_put_super()) no obd lustre-MDT0000
      Dec 14 13:09:17 nodez kernel: LustreError: 8279:0:(obd_mount_server.c:134:server_deregister_mount()) lustre-MDT0000 not registered
      Dec 14 13:09:18 nodez kernel: general protection fault: 0000 [#1] SMP 
      Dec 14 13:09:18 nodez kernel: last sysfs file: /sys/devices/system/cpu/possible
      Dec 14 13:09:18 nodez kernel: CPU 1 
      Dec 14 13:09:18 nodez kernel: Modules linked in: lustre ofd osp lod ost mdt mdd mgs osd_ldiskfs ldiskfs lquota lfsck obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass ksocklnd lnet libcfs zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl vboxsf vboxguest [last unloaded: libcfs]
      Dec 14 13:09:18 nodez kernel: 
      Dec 14 13:09:18 nodez kernel: Pid: 8279, comm: mount.lustre Tainted: P           ---------------  T 2.6.32 #0 innotek GmbH VirtualBox/VirtualBox
      Dec 14 13:09:18 nodez kernel: RIP: 0010:[<ffffffffa0e46f03>]  [<ffffffffa0e46f03>] lprocfs_remove_nolock+0x33/0x100 [obdclass]
      Dec 14 13:09:18 nodez kernel: RSP: 0018:ffff88003d34d928  EFLAGS: 00010202
      Dec 14 13:09:18 nodez kernel: RAX: ffffffffa0ec08e0 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000
      Dec 14 13:09:18 nodez kernel: RDX: 0000000000000000 RSI: 0000000000000030 RDI: ffff8800327b73c0
      Dec 14 13:09:18 nodez kernel: RBP: 6b6b6b6b6b6b6b6b R08: 0000000000000158 R09: 0000000000000000
      Dec 14 13:09:18 nodez kernel: R10: ffff880033c82a98 R11: ffff880033c829c0 R12: ffff8800327b74c8
      Dec 14 13:09:18 nodez kernel: R13: 6b6b6b6b6b6b6b6b R14: 0000000000000002 R15: ffff88003c5e7aa0
      Dec 14 13:09:18 nodez kernel: FS:  00007fadb4325700(0000) GS:ffff880001e80000(0000) knlGS:0000000000000000
      Dec 14 13:09:18 nodez kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      Dec 14 13:09:18 nodez kernel: CR2: 00007f7e81b12ea0 CR3: 000000002b85f000 CR4: 00000000000006e0
      Dec 14 13:09:18 nodez kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      Dec 14 13:09:18 nodez kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Dec 14 13:09:18 nodez kernel: Process mount.lustre (pid: 8279, threadinfo ffff88003d34c000, task ffff88003e7547f0)
      Dec 14 13:09:18 nodez kernel: Stack:
      Dec 14 13:09:18 nodez kernel: ffff88003d6a2ed8 ffff880036490b78 ffff88003d6a2f80 ffff8800327b73c0
      Dec 14 13:09:18 nodez kernel: <d> ffff88003d34d9d8 ffff8800327b74c8 0000000000000008 ffffffffa0e474a8
      Dec 14 13:09:18 nodez kernel: <d> ffff8800327b7330 ffffffffa0660952 ffff88003d620000 ffff88003d34d9d8
      Dec 14 13:09:18 nodez kernel: Call Trace:
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0e474a8>] ? lprocfs_remove+0x18/0x30 [obdclass]
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0660952>] ? qsd_fini+0x72/0x440 [lquota]
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0742152>] ? osd_shutdown+0x32/0xe0 [osd_ldiskfs]
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0742549>] ? osd_device_fini+0x119/0x180 [osd_ldiskfs]
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0e56784>] ? class_cleanup+0x804/0xd90 [obdclass]
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0e35ae0>] ? class_name2dev+0x70/0xd0 [obdclass]
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0e5b645>] ? class_process_config+0x1d45/0x2e50 [obdclass]
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0e5ca0a>] ? class_manual_cleanup+0x2ba/0xd60 [obdclass]
      Dec 14 13:09:18 nodez kernel: [<ffffffff810e6f44>] ? cache_alloc_debugcheck_after+0x123/0x192
      Dec 14 13:09:18 nodez kernel: [<ffffffff810e88bc>] ? __kmalloc+0x123/0x18e
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0e5cc8d>] ? class_manual_cleanup+0x53d/0xd60 [obdclass]
      Dec 14 13:09:18 nodez kernel: [<ffffffffa074a6c4>] ? osd_obd_disconnect+0x164/0x1d0 [osd_ldiskfs]
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0e6243d>] ? lustre_put_lsi+0x19d/0xe90 [obdclass]
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0e641d8>] ? lustre_common_put_super+0x5b8/0xbe0 [obdclass]
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0e95802>] ? server_put_super+0x172/0x2190 [obdclass]
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0e97f8d>] ? server_fill_super+0x76d/0x15c0 [obdclass]
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0e673c0>] ? lustre_fill_super+0x0/0x520 [obdclass]
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0e67598>] ? lustre_fill_super+0x1d8/0x520 [obdclass]
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0e673c0>] ? lustre_fill_super+0x0/0x520 [obdclass]
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0e673c0>] ? lustre_fill_super+0x0/0x520 [obdclass]
      Dec 14 13:09:18 nodez kernel: [<ffffffff810f863f>] ? get_sb_nodev+0x4e/0x84
      Dec 14 13:09:18 nodez kernel: [<ffffffffa0e5f52c>] ? lustre_get_sb+0x1c/0x30 [obdclass]
      Dec 14 13:09:18 nodez kernel: [<ffffffff810f838d>] ? vfs_kern_mount+0x96/0x15b
      Dec 14 13:09:18 nodez kernel: [<ffffffff810f84b3>] ? do_kern_mount+0x49/0xe7
      Dec 14 13:09:18 nodez kernel: [<ffffffff8110dcd5>] ? do_mount+0x7a1/0x824
      Dec 14 13:09:18 nodez kernel: [<ffffffff8110dde0>] ? sys_mount+0x88/0xc4
      Dec 14 13:09:18 nodez kernel: [<ffffffff81008a42>] ? system_call_fastpath+0x16/0x1b
      Dec 14 13:09:18 nodez kernel: Code: ec 18 48 8b 1f 48 c7 07 00 00 00 00 48 85 db 74 4c 48 81 fb 00 f0 ff ff 77 43 4c 8b 6b 48 4d 85 ed 75 08 e9 90 00 00 00 48 89 eb <48> 8b 6b 50 48 85 ed 75 f4 4c 8b 63 08 48 8b 6b 48 4c 89 e7 e8 
      Dec 14 13:09:18 nodez kernel: RIP  [<ffffffffa0e46f03>] lprocfs_remove_nolock+0x33/0x100 [obdclass]
      Dec 14 13:09:18 nodez kernel: RSP <ffff88003d34d928>
      Dec 14 13:09:18 nodez kernel: ---[ end trace 5f7830ce85deef31 ]---
      Dec 14 13:09:18 nodez kernel: Kernel panic - not syncing: Fatal exception
      

      I've found that osd_device_fini() cleanup things in wrong order, it should cleanup procfs after osd_shutdown() but not before because quota uses osd procfs as well.

      Attachments

        Issue Links

          Activity

            [LU-4385] replay-single test 61d causes oops in osd_device_fini()

            Patch http://review.whamcloud.com/8506 from LU-3857 was landed to master.

            adilger Andreas Dilger added a comment - Patch http://review.whamcloud.com/8506 from LU-3857 was landed to master.
            di.wang Di Wang added a comment - duplicate with https://jira.hpdd.intel.com/browse/LU-3857 , same patch in http://review.whamcloud.com/#/c/8506/
            tappro Mikhail Pershin added a comment - http://review.whamcloud.com/#/c/8579/

            People

              tappro Mikhail Pershin
              tappro Mikhail Pershin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: