Allen Brain Atlas


Allen Brain Atlas项目是一个公开可用的人脑基因表达信息的在线资源,已有数百篇文章采用了该数据库。基因组学的数据主要是采用了microarray(基因芯片)检测基因序列表达的同时保留空间信息的一种方式。


  • 转录组学:简单理解,研究RNA表达的,由于DNA需要表达成蛋白质才能够起作用,而RNA是翻译成蛋白质所必须的,故而对RNA进行研究
  • 空间转录组学:在研究RNA表达的同时保留空间信息,即知道在大脑的哪个地方表达了哪个基因
  • RNA-seq:RNA sequencing,即RNA测序
  • 基因表达谱:某一状态下的细胞或组织的基因表达信息
  • 基因表达阵列:包含样本和基因表达值的矩阵
  • 基因表达:由DNA的编码部分转录成RNA或转录并翻译成蛋白质的过程及结果
  • 转录:DNA->RNA,范围上绝大部分RNA∈DNA
  • 翻译:RNA->蛋白质,范围上蛋白质∈RNA
  • 探针:只结合某一段基因的序列,可以特异性识别某个基因








  1. 经过标准化的基因芯片数据集(in March 2013),包含全部的六个患者的基因表达阵列,ID:H0351.2001, H0351.2002, H0351.1009, H0351.1012, H0351.1015, H0351.1016.
  2. RNA-seq数据集,通过测序技术得到的两个患者的基因表达量,ID:H0351.2001, H0351.2002.
  3. 旧的基因芯片数据集,ID:H0351.2001, H0351.2002, H0351.1009, H0351.1012.

具体的标准化过程可以参考技术白皮书, Microarray Data Normalization.



颜色代表基因表达的水平,红色为高,cortex and subcortex (C&SC), brainstem (BS) and cerebellum (CB)


  1. H0351.1009 / H0351.1012

    • T1-weighted MPRAGE structural MRI with 0.98 x 0.98 x1 mm voxels, three-dimensional acquisition, three averages, TI = 900 ms, TR = 1900 ms, TE = 3.03 ms, 9° flip angle, image matrix = 256 x 256 x 176.
    • T2-weighted images were taken in three-dimensions with 0.98 x 0.98 x 1 voxels, TR = 3200 ms, TE = 449 ms, 120° flip angle, image matrix = 256 x 256 x 192 voxels
    • FLAIR images acquired in 0.98 x 0.98 mm axial slices and 5 mm slices, TI = 2500 ms, TR= 10000 ms, TE= 73 ms, 150° flip angle, image matrix = 256 x 256 x 20 voxels.
    • T2-weighted gradient echo images taken in axial slices with 0.72 x 0.72 mm and 5 mm slices, TR = 889 ms, TE = 18 ms, 60° flip angle, image matrix = 320 x 290 x 24 voxels.
  2. H0351.1015 / H0351.1016

    • T1-weighted MPRAGE structural MRI with 0.98 x 0.98 x1 mm voxels, three-dimensional acquisition, three averages, TI = 900 ms, TR = 1900 ms, TE = 2.63 ms, 9° flip angle, image matrix = 256 x 256 x 176.
    • T2-weighted / FLAIR / T2-weighted gradient echo同上
  3. H0351.2001

    • T1-weighted MPRAGE structural MRI with 1 mm isotropic voxels, three-dimensional acquisition, three averages, TI = 900 ms, TR = 1900 ms, TE = 2.63 ms, 9o flip angle, image matrix = 256 x 192 x 256.
    • T2-weighted images were taken in three-dimensions with 0.9 mm isotropic voxels, TR = 3210 ms, TE = 540 ms, 120o flip angle, image matrix = 224 x 256 x 160 voxels.
    • DT images taken in 64 directions using an Echo Planar Imaging sequence taken in 2 mm axial slices with 1.875 x 1.875 mm in plane voxels, TR = 9300 ms, TE = 94 ms, 90o flip angle, image matrix = 128 x 128 x 68.
    • FLAIR images acquired in 0.86 mm x 0.86 mm axial slices and 5 mm slices with a 1 mm gap, TI = 2500 ms, TR= 8000 ms, TE= 67 ms, 170o flip angle, image matrix = 208 x 256 x 26 voxels.
    • T2-weighted gradient echo images taken in axial slices with 0.86 mm x 0.86 mm and 5 mm slices, TR = 613 ms, TE = 20 ms, 20o flip angle, image matrix = 256 x 256 x 26 voxels
    • Inversion recovery images taken with sagittal sections with 0.86 mm x 0.86 sagittal sections and 4 mm slices, TI = 185 ms, TR = 5523.2 ms, TE = 62 ms, 170o flip angle, image matrix = 248 x 256 x 33 voxels.
  4. H0351.2002

    • T1-weighted MPRAGE structural MRI with 1 mm isotropic voxels, three-dimensional acquisition, three averages, TI = 900 ms, TR = 1900 ms, TE = 2.63 ms, 9o flip angle, image matrix = 256 x 192 x 256.
    • T2-weighted images were taken in three-dimensions with 0.9 mm isotropic voxels, TR = 3200 ms, TE = 535 ms, 120o flip angle, image matrix = 224 x 256 x 160 voxels.
    • DT images taken in 64 directions using an Echo Planar Imaging sequence taken in 5mm axial slices with 1.875 x 1.875 mm in plane voxels, TR = 9300 ms, TE = 94 ms, 90o flip angle, image matrix = 128 x 128 x 24.
    • FLAIR images acquired in 0.94 mm x 0.94 mm axial slices and 5 mm slices with a 1 mm gap, TI = 2000 ms, TR= 8000 ms, TE= 67 ms, 170o flip angle, image matrix = 208 x 256 x 24 voxels.
    • T2-weighted gradient echo images taken in axial slices with 0.94 mm x 0.94 mm and 5 mm slices, TR = 665 ms, TE = 20 ms, 20o flip angle, image matrix = 256 x 256 x 24 voxels.
    • Inversion recovery images taken with sagittal sections with 0.86 mm x 0.86 mm sagittal sections and 3 mm slices, TI = 185 ms, TR = 4230 ms, TE = 62 ms, 170o flip angle, image matrix = 248 x 256 x 25 voxels.


1. MicroArray(基因芯片)

  • 可以在线搜索指定的基因来查看在不同的患者中的表达情况
  • 可以以基因的分类进行查看,即一个类别的基因的表达情况
  • 可以对不同的脑结构进行差异表达基因的分析

2. ISH(原位杂交,空间转录组学)


  • Neurotransmitter Study: 176 genes across cortical regions and 88 genes across subcortical regions in 4 control cases
  • Cortex Study: 1,000 genes in visual and temporal cortices in multiple adult control brains
  • Subcortex Study: 55 genes across subcortical regions and 10 additional genes in hypothalamus in one male and one female donor
  • Schizophrenia Study: 60 genes in dorsolateral prefrontal cortex of over 50 control and schizophrenia cases
  • Autism Study: 25 genes in frontal, temporal and occipital cortical regions of 11 control and 11 autism cases

3. MRI


Donor Age Sex Ethnicity PMI (hours) Image Files
H0351.2001 24 yrs M Black or African American 23 DTI T1 T2
H0351.2002 39 yrs M Black or African American 10 DTI T1 T2
H372.0006 44 yrs M White or Caucasian 24 DTI T1 T2
H0351.2003 48 yrs F White or Caucasian 24 DTI T1 T2
H0351.1009 57 yrs M White or Caucasian 26 T1 T2
H0351.1012 31 yrs M White or Caucasian 17 T1 T2
H0351.1015 49 yrs F Hispanic 30 T1 T2
H0351.1016 55 yrs M White or Caucasian 18 T1 T2






1. 探针重注释




We found that 45,821 probes (78%) were uniquely annotated to a gene and could be related to an entrez ID.
A total of 19% of probes were not mapped to a gene, and just under 3% were mapped to multiple genes and could not be unambiguously annotated.
Of the probes that were unambiguously annotated to a gene, 3438 (7.5%) of the annotations differed from those provided by the AHBA: 1287 probes were re-annotated to new genes and 2151 probes that were not previously assigned to any gene in the AHBA could now be annotated.
Additionally, 6211 (∼ 10%) probes in the initial AHBA dataset had an inconsistent gene symbol, ID or gene name information according to the NCBI database, as of 5th March 2018

2. 数据清洗




if we exclude probes that did not exceed the background in at least 50% of all cortical and subcortical samples across all subjects, we exclude 30% of probes (13,844 out of 45,821), assaying 4486 out of 20,232 genesIn other words, if no filtering is performed, > 22% of genes will have expression levels consistent with background noise in at least half of the tissue samples

3. 探针选择





4. 样本映射到脑区



5. 数据归一化



  1. 将一个区域内的所有样本进行平均
  2. 在个体水平进行平均再汇总


6. 基因筛选


7. 去除空间效应


“Although this spatial autocorrelation is, in itself, an important neurobiological feature of the brain transcriptome (Gryglewski et al., 2018), it is critical for any analysis claiming a specific association between spatial variations in gene expression and a given IDP to show that the association exceeds what would be predicted by lower-order spatial gradients of gene expression.” (Arnatkeviciute 等。, 2019, p. 5)

8. 总结



"arnatkeviciute_practical_2019"提供了一套相应的代码, 基于Matlab

"markello_standardizing_2021"根据上述论文的步骤提供了开源工具包abagen (documentation)



  1. 输入atlas(默认DesikanKilliany),输出分割后的预处理的区域基因表达矩阵
  2. 输入mask,输出包含mask的所有预处理过的所有样本的表达数据


  1. Command Line

    1. 下载数据


      abagen -v --donors all --data_dir $HOME/abagen-data --n_proc 6 atlas
      #--donors (i.e., 9861, 10021) or UIDs (i.e., H0351.2001)
      #--n_proc 下载线程数
    2. 处理数据


      abagen -v \
           --ibf_threshold 0.5 \
           --probe_selection diff_stability \
           --lr_mirror None \
           --sim_threshold None \  
           --missing None \
           --tol 2 \
           --sample_norm srs \
           --gene_norm srs \
           --norm_all \
           --region_agg donors \
           --agg_metric mean \
           --output-file $PWD/abagen_expression.csv \
           --save-counts \
           --save-donors \
      #--ibf_threshold  probe超过背景噪音的阈值,单位%
      #--probe_selection    选择作为代表基因的探针
      #--lr_mirror      由于6个患者中有4个只有左脑数据,是否进行镜像
      #--sim_threshold  关联度过滤阈值
      #--missing            如何处理没关联上样本的区域
      #--tol                匹配到分区的距离阈值
      #--sample_norm        样本间归一化表达矩阵的方法
      #--gene_norm      基因间归一化表达矩阵的方法
      #--norm_all           归一化的范围是否为全部样本而不是match到脑区的样本
      #--norm_structures    归一化的范围是否为结构之间而不是全部样本
      #--region_agg     当多个样本匹配到同一区域时先患者内部合并还是一起合并
      #--agg_metric     合并方法
      #--no-reannotated 是否不要重注释探针, 为进行重注释
      #--no-corrected-mni   用原始MNI坐标系还是用alleninf修正后的坐标系,False为用修正后的
  2. Python

    1. 下载及加载数据

      import abagen
      #可从    上导入
      #files = abagen.fetch_microarray(donors='all')
      files = abagen.fetch_microarray(donors=['12876', '15496'], data_dir='/path/to/my/data/')  


      ├── normalized_microarray_donor10021/
      │   ├── MicroarrayExpression.csv
      │   ├── Ontology.csv
      │   ├── PACall.csv
      │   ├── Probes.csv
      │   └── SampleAnnot.csv
      ├── normalized_microarray_donor12876/
      ├── normalized_microarray_donor14380/
      ├── normalized_microarray_donor15496/
      ├── normalized_microarray_donor15697/
      └── normalized_microarray_donor9861/
    2. 分割脑区

      #可接受MNI、fsaverage/fsaverage5、 Desikan-Killiany等模板空间
      atlas = abagen.fetch_desikan_killiany()





数据文件存放在this figshare repository,根据文件说明,仅包含左脑皮层处理后的数据,其他数据需要自行计算。



  1. 在左脑的处理中经过如下筛选仅剩下10 027个基因:

    1. 探针注释采用的Re-Annotator包
    2. 移除在超过一半的样本中都没有高于背景噪音的探针
    3. 移除RNA-seq中未出现的基因
    4. 移除与RNA-seq中低相关性的探针
    5. 根据RNA-seq的结果选择最高相关性的探针


    • ROIxGene_aparcaseg_RNAseq.mat
      34 ROIs per hemisphere: mean 37.8 ± 22.5 (SD) samples assigned per ROI, min= 5; max = 92;
      No regions have been excluded.
    • ROIxGene_cust100_RNAseq.mat
      100 ROIs per hemisphere: mean 7.4 ± 6.5 (SD) samples assigned per ROI, min= 0; max = 37;
      Region 54 has been excluded.
    • ROIxGene_cust250_RNAseq.mat.
      250 ROIs per hemisphere: mean 5.1 ± 3.5 (SD) samples assigned per ROI, min= 0; max = 18;
      Regions 122, 127, 180, 183, 223, 230 have been excluded.
    • ROIxGene_HCP_RNAseq.mat
      180 ROIs per hemisphere: mean 7.1 ± 6.7 (SD) samples assigned per ROI, min= 0; max = 41;
      Regions 23, 89 and 104 have been excluded.
  2. 在以下的筛选中得到15 745个基因:

    1. 探针注释采用的Re-Annotator包
    2. 移除在超过一半的样本中都没有高于背景噪音的探针


    • ROIxGene_aparcaseg_INT.mat
      34 ROIs per hemisphere: mean 37.8 ± 22.5 (SD) samples assigned per ROI, min= 5; max = 92;
      No regions have been excluded.
    • ROIxGene_cust100_INT.mat
      100 ROIs per hemisphere: mean 7.4 ± 6.5 (SD) samples assigned per ROI, min= 0; max = 37;
      Region 54 has been excluded.
    • ROIxGene_cust250_INT.mat
      250 ROIs per hemisphere: mean 5.1 ± 3.5 (SD) samples assigned per ROI, min= 0; max = 18;
      Regions 122, 127, 180, 183, 223, 230 have been excluded.
    • ROIxGene_HCP_INT.mat
      180 ROIs per hemisphere: mean 7.1 ± 6.7 (SD) samples assigned per ROI, min= 0; max = 41;
      Regions 23, 89 and 104 have been excluded.


  1. 分割

    • 对6个患者均包括了四种分割方式
      1. defaultparc_NativeAnat.nii
        34 regions per hemisphere + 7 subcortical regions; Desikan et al, 2006.
      2. random200_acpc_uncorr_asegparc_NativeAnat.nii
        100 random regions per hemisphere + 10 subcortical regions.
      3. HCPMMP1_acpc_uncorr.nii
        180 regions per hemisphere; Glasser et al., 2016.
      4. random500_acpc_uncorr_asegparc_NativeAnat.nii
        250 random regions per hemisphere + 15 subcortical regions.
    • 每种分割方式均包含了左脑皮层的Annotation文件
      1. lh.aparc.annot
      2. lh.random200.annot
      3. lh.HCP-MMP1.annot
      4. lh.random500.annot
    • FreeSurfer的average模板中左脑皮层白质和灰质的surfaces
      1. lhfsaverage.pial
      2. lhfsaverage.white
    • 左脑皮质surface的球形展示
      1. lh.sphere
    • HCPMMP1在MNI木板上的volume分割
      1. MMPinMNI.nii
  2. 探针重注释

    1. hba_microarray_probes_fixed.xlsx
    2. probes2annotateALL.fasta
    3. probes2annotateALL_merged_readAnnotation.txt
  3. 处理后的数据

    1. 为defaultparc_NativeAnat分割预先计算的在皮层surface和灰质volume的样本间距离
    2. 预先计算的在不同探针选择方法过滤前和过滤后的相关性
  4. 原始数据


    • reannotatedProbes.mat


    • Probes.xlsx


    • mart_export_updatedProbes.txt



  1. AddPaths.m


  2. processingPipeline.m



    clear all;
    close all;
    options = struct();
    options.ExcludeCBandBS = true;
    options.useCUSTprobes = true;
    #mart_export_updatedProbes.txt in data/genes/rawData
    #reannotatedProbes.mat in data/genes/rawData
    options.updateProbes = 'reannotator';
    #Variance, RNAseq, PC, maxIntensity, maxCorrelation_intensity, maxCorrelation_variance, CV, LessNoise, Mean, Random, DS
    options.probeSelections = {'RNAseq'};
    #'HCP': 180 nodes per hemisphere
    #'aparcaseg': 34 nodes per hemisphere + subcortex
    #'cust100': 100 nodes per hemisphere + subcortex
    #'cust250': 250 nodes per hemisphere + subcortex
    options.parcellations = {'aparcaseg'};
    #'ontology' - samples are separated into cortical and subcortical based on AHBA ontology information
    #'listCortex' - samples are separated into cortical and subcortical based on re-defined list of region names;
    options.divideSamples = 'listCortex'; 
    options.excludeHippocampus = false;
    options.distanceThreshold = 2;
    options.signalThreshold = 0.5;
    options.VARfilter = false; 
    options.VARscale = 'normal';
    options.VARperc = 50;
    options.RNAsignThreshold = false;
    options.RNAseqThreshold = 0.2;
    options.correctDistance = false;
    options.calculateDS = true;
    options.distanceCorrection = 'Euclidean';
    options.Fit = {'exp'};
    options.normaliseWhat = 'Lcortex'; 
    options.normMethod = 'scaledRobustSigmoid';
    options.percentDS =  100;
    options.doNormalise = true;
    options.saveOutput = true;
    options.normaliseWithinSample = true;
    options.xrange = [0 220];
    options.plotCGE = true;
    options.plotResiduals = true;
    options.meanSamples = 'meanSamples';
  3. S[1-4]_*.m

    • S1_extractData.m
    • S2_probes.m,选择探针
    • S3_samples2parcellation.m,样本映射到脑区
    • S4_normalisation.m,归一化数据,制作region x gene和CGE矩阵


Allen的基因数据处理后得到的结果形式是region x gene的一个矩阵,内部的值为基因的表达量,是一个相对值,即不能进行绝对值相关的研究


“Interestingly, no statistically significant hemispheric differences could be identified at this fine structural level that were corroborated in both brains (paired one-sided t-tests, P , 0.01, BenjaminiHochberg (BH)-corrected)” (Hawrylycz 等。, 2012, p. 393) (pdf)

“Preliminary analyses of these data revealed minimal lateralization of microarray expression, and so samples were collected exclusively from the left hemisphere for the following four donors” (Markello 等。, 2021, p. 15) (pdf)





  • 额叶和顶叶之间的基因表达差异、灰质和白质的基因表达差异、FA高表达和FA低表达区域的基因表达差异
  • 男性和女性海马区的基因表达差异
  • gene A高表达和gene A低表达区域的(其他)基因表达差异
  • 在组间进行GSEA分析,即不同分组间的基因富集差异
  • ……


  • 我们可以做出一些图来表示出这些差异基因的占比、差异程度、正向反向等信息
  • 对这些差异基因进行富集分析,看看这些基因都在哪些通路上,即这些差异基因都执行了什么功能
  • 以差异程度做GSEA分析,即差异的基因在哪些通路上的贡献比较大
  • GSVA
  • PPI分析



基因共表达研究的是gene A与gene B是否在空间的分布上存在相关性



而当上述思路进行推广可得全脑区的基因表达谱相关性热图,即region x region的矩阵,值为相互间的基因表达谱相关性,若在此基础上结合其他脑影像指标的区域间相关性热图的region x region矩阵,便可相结合进行该指标与基因表达谱之间的相关性,即该指标的变化是否与一组基因相关。


您的电子邮箱地址不会被公开。 必填项已用 * 标注