Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2310

Kernel panic when trying to mount MDS after system upgrade from 1.8.8-wc1 to master.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • Before Upgrade:
      server: 1.8.8-wc1 RHEL5
      client: 1.8.8-wc1 RHEL5/6

      After Upgrade:
      server: lustre master build #1035 RHEL6
      client: lustre master build #1035 RHEL6
    • 3
    • 5523

    Description

      After upgrade the system from 1.8.8-wc1 to master, hit kernel panic when trying to mount MDS. Here is the console log from MDS:

      [root@fat-amd-1 ~]# mount -t lustre -o user_xattr,acl  /dev/sdb1 /mnt/mds
      LDISKFS-fs (sdb1): recovery complete
      LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=off. Opts: 
      Lustre: MGC10.10.4.132@tcp: Reactivating import
      Lustre: MGS: Logs for fs lustre were removed by user request.  All servers must be restarted in order to regenerate the logs.
      Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000
      Lustre: Setting parameter lustre-clilov.lov.stripesize in log lustre-client
      LustreError: 2535:0:(mgc_request.c:248:do_config_log_add()) failed processing sptlrpc log: -2
      Lustre: lustre-MDT0000: used disk, loading
      Lustre: Mounting lustre-MDT0000 at first time on 1.8 FS, remove all clients for interop needs
      LustreError: 2609:0:(sec_config.c:1024:sptlrpc_target_local_copy_conf()) missing llog context
      Lustre: lustre-MDT0000: Migrate inode quota from old admin quota file(admin_quotafile_v2.usr) to new IAM quota index([0x200000006:0x10000:0x0]).
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
      IP: [<ffffffffa09cb2bd>] ldiskfs_quota_read+0x3d/0x140 [ldiskfs]
      PGD 0 
      Oops: 0000 [#1] SMP 
      last sysfs file: /sys/module/mdd/initstate
      CPU 3 
      Modules linked in: mdt(U) lod(U) mgs(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) ldiskfs(U) lquota(U) jbd2 mdd(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic libcfs(U) nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa igb dca mlx4_ib ib_mad ib_core mlx4_en mlx4_core microcode serio_raw k10temp amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core sg shpchp ext3 jbd mbcache sd_mod crc_t10dif ata_generic pata_acpi pata_atiixp ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      
      Pid: 2609, comm: llog_process_th Not tainted 2.6.32-279.5.1.el6_lustre.x86_64 #1 Supermicro H8DGT/H8DGT
      RIP: 0010:[<ffffffffa09cb2bd>]  [<ffffffffa09cb2bd>] ldiskfs_quota_read+0x3d/0x140 [ldiskfs]
      RSP: 0018:ffff88041515b620  EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff88011c4943c0 RCX: 000000000000000c
      RDX: ffff88010a285c00 RSI: 0000000000000000 RDI: ffff88041d1adc00
      RBP: ffff88041515b680 R08: 0000000000000400 R09: ffff88041d1adc00
      R10: 0000000000000004 R11: 0000000000000000 R12: ffff88010a285c00
      R13: 0000000000000000 R14: ffff88010a285c00 R15: 0000000000000400
      FS:  00007f30c588d700(0000) GS:ffff8800282c0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000068 CR3: 0000000001a85000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process llog_process_th (pid: 2609, threadinfo ffff88041515a000, task ffff880415152ae0)
      Stack:
       ffff88011fc21600 0000000000000400 0000000000000000 ffff88041d1adc00
      <d> 0000000000000000 0000000000000400 0000000000000010 ffff88011c4943c0
      <d> ffff88041515bbd0 0000000000000000 ffff88010a285c00 0000000000000400
      Call Trace:
       [<ffffffffa0a424d7>] quota_read_blk+0x217/0x280 [osd_ldiskfs]
       [<ffffffffa0a425c2>] find_tree_dqentry+0x82/0x750 [osd_ldiskfs]
       [<ffffffffa0a173ec>] ? osd_trans_stop+0x12c/0x450 [osd_ldiskfs]
       [<ffffffffa0a22302>] ? osd_index_iam_insert+0x252/0x560 [osd_ldiskfs]
       [<ffffffffa0a2eb40>] ? iam_lfix_ipd_alloc+0x0/0x20 [osd_ldiskfs]
       [<ffffffffa0a3cb70>] osd_it_admin_get+0x50/0x230 [osd_ldiskfs]
       [<ffffffffa0a3eeac>] ? write_quota_rec+0x26c/0xb80 [osd_ldiskfs]
       [<ffffffffa0a3cd8b>] osd_it_admin_load+0x3b/0xd0 [osd_ldiskfs]
       [<ffffffffa0a40ca1>] convert_quota_file+0x7b1/0xd80 [osd_ldiskfs]
       [<ffffffffa0a41ceb>] osd_quota_migration+0xa7b/0x1050 [osd_ldiskfs]
       [<ffffffffa0a25c3e>] osd_index_try+0x28e/0x610 [osd_ldiskfs]
       [<ffffffffa0926ad4>] lquota_disk_glb_find_create+0x2c4/0x410 [lquota]
       [<ffffffffa0948bf1>] qmt_pool_prepare+0x1b1/0x710 [lquota]
       [<ffffffffa093b4e3>] qmt_device_prepare+0xd3/0x210 [lquota]
       [<ffffffffa0b7fbf5>] mdt_quota_init+0xdc5/0x1730 [mdt]
       [<ffffffffa0b83aa7>] mdt_device_alloc+0xb47/0xf60 [mdt]
       [<ffffffffa04cd007>] obd_setup+0x1d7/0x2f0 [obdclass]
       [<ffffffffa04cd328>] class_setup+0x208/0x890 [obdclass]
       [<ffffffffa04d48dc>] class_process_config+0xc0c/0x1ca0 [obdclass]
       [<ffffffffa036dbe0>] ? cfs_alloc+0x30/0x60 [libcfs]
       [<ffffffffa04ce533>] ? lustre_cfg_new+0x353/0x7e0 [obdclass]
       [<ffffffffa04d6a3b>] class_config_llog_handler+0x9db/0x1850 [obdclass]
       [<ffffffff814ff18e>] ? mutex_lock+0x1e/0x50
       [<ffffffffa0491b9b>] llog_process_thread+0x8fb/0xe10 [obdclass]
       [<ffffffffa04ee58f>] ? keys_fill+0x6f/0x1a0 [obdclass]
       [<ffffffffa04920b0>] ? llog_process_thread_daemonize+0x0/0x80 [obdclass]
       [<ffffffffa04920b0>] ? llog_process_thread_daemonize+0x0/0x80 [obdclass]
       [<ffffffffa04920fc>] llog_process_thread_daemonize+0x4c/0x80 [obdclass]
       [<ffffffff8100c14a>] child_rip+0xa/0x20
       [<ffffffffa04920b0>] ? llog_process_thread_daemonize+0x0/0x80 [obdclass]
       [<ffffffffa04920b0>] ? llog_process_thread_daemonize+0x0/0x80 [obdclass]
       [<ffffffff8100c140>] ? child_rip+0x0/0x20
      Code: 1f 44 00 00 31 c0 48 63 f6 48 89 7d b8 48 89 4d a8 48 8b b4 f7 a0 01 00 00 49 89 d4 48 89 75 b0 0f b6 4f 20 c7 45 cc 00 00 00 00 <48> 8b 56 68 4c 8b 77 18 49 39 d0 0f 8f c6 00 00 00 48 8b 5d a8 
      RIP  [<ffffffffa09cb2bd>] ldiskfs_quota_read+0x3d/0x140 [ldiskfs]
       RSP <ffff88041515b620>
      CR2: 0000000000000068
      Initializing cgroup subsys cpuset
      Initializing cgroup subsys cpu
      Linux version 2.6.32-279.5.1.el6_lustre.x86_64 (jenkins@builder-2-sdd1-el6-x8664.lab.whamcloud.com) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Mon Oct 29 00:58:20 PDT 2012
      Command line: ro root=UUID=6e137aae-0326-4846-9256-0aed1e37b7d5 rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD console=ttyS0,115200 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off  memmap=exactmap memmap=538K@64K memmap=132562K@49690K elfcorehdr=182252K memmap=64K$0K memmap=38K$602K memmap=104K$920K memmap=8K$3668600K memmap=72K#3668608K memmap=184K#3668680K memmap=263296K$3668864K memmap=2048K$4192256K
      KERNEL supported cpus:
        Intel GenuineIntel
        AMD AuthenticAMD
      

      Attachments

        Activity

          People

            niu Niu Yawei (Inactive)
            sarah Sarah Liu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: