Research
I. Pathway and Network Analysis
Multivariate pathway analysis (also known as gene-set enrichment analysis) forms the backbone of my research on disease risk genes and mechanisms. While standard GWAS examines the individual effect of a given genetic variant with a phenotype, pathway analysis examines whether sets of genetic variants with nominal association, typically defined by groups of functionally related genes, are cumulatively more highly associated with a phenotype than would be expected by chance. Pathway analysis thus utilizes the established knowledge of molecular function and biology, which can also offer direct insights into the disease mechanisms.
 
I developed INRICH ( https://atgu.mgh.harvard.edu/inrich ), a pathway analysis method that has several unique and essential features for the study of whole genome variant data (Lee et al., 2012). Using INRICH, I have demonstrated that disease risk variants underlying major mental disorders converge on common molecular and cellular pathways. I reported these pathway associations in the GWAS of the Psychiatric Genomics Consortium (PGC): PGC1-BP (Nat Genet 2011), PGC1-CDG (Lancet 2013), PGC2-SCZ (Nature 2014), PGC2-SCZ-NPA (in prep), and PGC1-ASD (submitted). The PGC is the largest international consortium in psychiatry, initiated for collaborative genome-wide genetic data analysis ( https://www.med.unc.edu/pgc ). Recently, I have led a collaborative study with a group of the PGC Network and Pathway Analysis (NPA) investigators (PGC1-NPA, Nat Neurosci 2015; co-first author).        
 
My collaborators are: 
II. Cross Disorder Analysis
The second line of my research focuses on clarifying the nosologic boundaries and etiologic overlap among mental disorders. Psychiatry is the only medical field that defines disorders based on descriptive symptom profiles instead of pathologic evidence. Thus, to make categorical distinctions between ‘normal’ and ‘disordered’ with respect to mental functioning, arbitrary decisions are inevitable. Furthermore, there is obvious overlap of clinical symptoms across various psychiatric and neurodevelopmental disorders, making it challenging to draw diagnostic boundaries. There is a need to re-conceptualize the nosology of mental disorders that reflects underlying neurobiologic bases.    
 
Working towards this goal, I developed a log-linear modeling-based testing strategy specifically designed for evaluating shared genetic risk effects in cross disorder GWAS (Lee et al., 2011). Using this method, I performed cross disorder modeling analyses reported in the PGC Cross Disorder Group (CDG) study (Lancet 2013). Using the genome-wide SNP data from 33,332 cases and 27,888 controls, primary meta-analysis identified four genomic loci supported by significant risk effects to a range of psychiatric disorders, particularly autism, ADHD, bipolar disorder, major depressive disorder, and schizophrenia. I applied my modeling method to predict specific pleiotropic disorder models for top GWAS loci underlying core psychopathology. In this work, I also demonstrated: (1) a shared etiologic role of calcium signaling pathway genes (as noted above) and (2) broad implication of altered brain gene expression in mental disorders using post-mortem brain eQTL (expression Quantitative Trait Loci) data. 
 
As a lead analytic member of the PGC-CDG, my research in this area will focus on cross disorder analyses of nine psychiatric disorders currently studied at the PGC. This PGC2 cross-disorder study expands the prior PGC1-CDG work in a major way. It allows us to study a much broader spectrum of psychopathology with the statistical power of genetic data that is unparalleled in psychiatric genomics research. As of now, I have collected genome-wide association data from 239,551 individuals of 151 PGC study cohorts for this analysis (total of 91,234 cases and 148,317 controls). The final data-freeze of this multi-year international effort is planned in early 2016.     
 
My collaborators are:           
III. Shared NeuroGenetic Basis of Autism and Schizophrenia  
(NIMH K99/R00 Pathway to Independence Award) 
The third line of my research centers on GWAS of brain neuroimaging, behavioral, and cognitive measures. A major motivation driving this work is to study the shared etiology of distinct neurodevelopmental disorders using an integrative analysis of genomic and neuroimaging data. This motivation led to my K99/R00 research of social dysfunction in autism spectrum disorders (ASD) and schizophrenia (SCZ). ASD and SCZ are two of the most heritable and pervasive neurodevelopmental disorders. They are markedly distinct in terms of developmental trajectories and clinical presentations. Nevertheless, social cognitive deficits are central features of both ASD and SCZ. Moreover, converging evidence from recent epidemiologic, neuroscience and genetic studies suggest a potential neurobiological link between ASD and SCZ. My K99/R00 research aims to study whether this overlap in social cognitive deficits reflects common etiological mechanisms or represents artifacts of comorbidity or misdiagnosis.
 
For my K99/R00 study, I have secured highly unique data resources containing both genomic and neuroimaging data through multidisciplinary collaboration. The genetic data for ASD and SCZ include whole-genome SNP genotype (N=32,921) and whole-exome sequencing data (N=7,000). I am also using an independent neuroimaging cohort of ASD and SCZ cases and healthy controls for whom a rich set of brain imaging, genomic, and behavioral/cognitive data are available (N=3,752). To facilitate the analyses of these multidisciplinary data, I successfully completed research training in next-generation sequencing data analysis, and I am currently receiving in-depth training in brain imaging technologies and applications (Lee et al., in prep).
 
My K99 mentors and adivisory board members are:
IV. Neuroimaging Genomics: The GSP and ENIGMA Studies
In parallel to my K99 research training, I have participated in several GWAS of neuroimaging and behavioral/cognitive phenotypes. The Brain Genomics Superstruct Project (GSP)  is a neuroimaging genetics research project that aims to study the link between brain structure, function, behavior and genetic variation ( http://neuroinformatics.harvard.edu/gsp/ ). Using the first wave of 1,050 healthy young adult neuroimaging genetics samples, my first neuroimaging GWAS study established that variability in amygdala-medial prefrontal cortex circuitry was associated with individuals’ genetic risk for depression within the general population (Holmes and Lee et al. 2012; co-first author). Our latest work includes new method development for brain heritability analysis (Ge et al. 2015) and application of partitioning-based heritability analysis to decipher the shared genetic basis between neuropsychiatric disorders and brain structural traits (Lee et al. submitted). 
 
I also participated in GWAS meta-analyses of brain neuroanatomical measures led by the ENIGMA consortium ( http://enigma.ini.usc.edu/ ), which is the largest international effort in neuroimaging genomics (Stein et al. 2012, Hibar et al. 2015). These collaborative studies demonstrated that with sufficient sample sizes, it is feasible to identify and replicate genetic influences on human brain structure, despite large heterogeneity in imaging data generation. 
 
My collaborators are:
V. Whole Genome Sequencing Data Analysis of Autism
Advances in next generation sequencing technology have provided a powerful, unbiased way to dissect the complex genetic architecture of autism spectrum disorders (ASD). Multiple, independent exome sequencing studies have demonstrated an etiologic role of de novo loss-of-function mutations, as well as the discovery of rare inherited risk mutations.  While encouraging, our understanding of the genetic landscape of ASD is still far from complete; Evaluation of rare genetic variants has focused on protein coding regions - overlooking more than 97% of the human DNA - as a result of limited power due to small sample sizes. Another crucial challenge is a lack of analytic strategies that can effectively harness the enormous number of non-coding variants.
 
In the next five years, my goal is to define a unified statistical framework and develop the accompanying computational software to address these challenges. Through collaborations with the faculty at the Broad Institute, UCLA, and Stanford University, I have secured access to genetic data on people with autism, specifically whole genome sequencing (N=2,500), whole-exome array (N=15,966), and whole-genome SNP data (N=10,610). Through a NIMH/NHGRI research grant, which I am currently writing as the principal investigator, my goal is to develop and demonstrate the performance of this new analytic strategy.      
 
My collaborators are:
Collaborative Studies in Psychiatric Genomics 
I work with international collaborators on the following genetics studies of psychiatric disorders and brain traits 
The PGC ( https://www.med.unc.edu/pgc ) is the largest international consortium in psychiatry to pursue meta- and mega-analyses of genome-wide genomic data for psychiatric disorders. The PGC was initiated in early 2007 and has rapidly evolved into a collaborative confederation of over 800 investigators from 38 countries. Starting from 2009, I have contributed to many of the recent PGC studies, leading analyses in several areas including:
 
  • Pathway analysis in: PGC1-BD (2011), PGC1-CDG (2013), PGC2-SCZ (2014), PGC1-NPA (2015), PGC2-SCZ-NPA (in prep)​
  • Cross disorder modeling analysis in: PGC1-CDG (2013)
  • Brain eQTL data analysis in: PGC1-CDG (2013), PGC2-SCZ (2014)  
The ICCBD represents international collaborative efforts to establish large-scale DNA and data resources to study bipolar disorder (BD). Through the ICCBD,  a large cohort of new bipolar cases (N = 9,000) and unaffected controls (N = 9,000) were acertained over five years at two U.S. sites (Boston and LA). Along with separately collected European case-control samples (10,000 cases and 10,000 controls) from the UK and Sweden, harmonized data resources for genetic studies were constructed. I am a co-investigator of the NIMH ICCBD grant for genetic data analyses of these resources. Using this data, I am conducting the following studies.
 
  • Subtype phenotype data analysis in BD (with Chiayen Chen, Jordan Smoller)
  • Suicide GWAS in BD (with Sarah Bergen)
The ENIGMA consortium ( http://enigma.ini.usc.edu/ ) is field-leading international efforts to gather researchers in imaging genomics, to understand brain structure, function, and disease, based on brain imaging and genetic data. It's led by Dr. Paul Thompson (IGC) and other key investiators worldwide.  I collaborate with the ENIGMA consortium investigators in several research projects including:  
 
  • The ENIGMA GWAS (with Derrek Hibar)
  • Collaborative studies in the ENIGMA DTI (with Neda Jahanshad) 
  • Sharing algorithms, data, and information on research studies and methods in neuroimaging genetics