RNA-sequence re-analysis for the correlation of
differential gene expression between fetal and adult brains
Summary
I do this project in order to practice and understand deeply how to do genomic data science.
You can see its source Github
I get the data at the link below to re-analyze. The purpose of this re-analysis is to examine the correlation of differential gene expression between fetal and adult brains,
which is evaluated through RNA-sequencing. If it has correlation, then count how many up-regulated and down-regulated genes.
All of them are done in R, RStudio.
Moreover, I will use the genomic datasets (already statistically analyzed data) in order to predict and classify some characteristics of samples (gender, age).
All of them are done in Python, Google Colab.
Links to access the data source
The article "Developmental regulation of human cortex transcription and its clinical relevance at base resolution"
The article's RNA-seq data
The article's phenotype meta-data for the samples
Brief Overview
- Download RNA-seq data vs phenotype metadata(checking code book.docx for more details)
- Use The Galaxy Project to do (checking code book.docx for more details):
- FASTQ Quality Control
- Alignment with HISAT2
- Get feature count from "featureCounts" in RNA-seq
- Get tidy data (count table)
- Do exploratory analysis and statistical analysis in R (PDF, HTML)
- Predict and classify characteristics of samples in Python
Drawback
Use 10 samples, which means the sample size is too small to infer for the large population and can be biased
Reference
Genomic Data Science Specialization audit courses
friveramariani/GenomicDataScience_FetalAdultBrain.git
jtleek/datasharing.git
jtleek.com/genstats_site/