Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Tuesday, November 29, 2022

7 links to deeply understand heatmap

Making a heatmap is an essential skill for a bioinformatician. Just check how many figures are heatmap or heatmap variants in the genomics or single cell paper.

But you probably do not understand heatmap. 7 reading resources to understand heatmap!

1/  Mapping quantitative data to color https://www.nature.com/articles/nmeth.2134 

2/  Heat map from Nature Method column  https://www.nature.com/articles/nmeth.1902

3/  A tale of two heatmap functions https://rpubs.com/crazyhottommy/a-tale-of-two-heatmap-functions An old post by me.

4/  Heatmap demystified  https://rpubs.com/crazyhottommy/heatmap_demystified yet another post by me

5/  understand color mapping is key https://jokergoo.github.io/ComplexHeatmap-reference/book/a-single-heatmap.html#colors

6/ understand rastering  https://jokergoo.github.io/2020/06/30/rasterization-in-complexheatmap/

7/  what happens when you have a huge matrix 20,000 rows/genes  x 50 columns to plot?  https://gdevailly.netlify.app/post/plotting-big-matrices-in-r/


I learned so much from Zuguang Gu, thanks for his awesome Complexheatmap package https://jokergoo.github.io/ComplexHeatmap-reference/book/index.html . it is my go-to tool for making heatmaps.

Monday, November 28, 2022

6 training resources for data management


* Best Practices for Biomedical Research Data Management https://learn.canvas.net/courses/1854

* Research Data Management Librarian Academy (https://rdmla.github.io/)

* DataONE Data Management Skillbuilding Hub  (https://dataoneorg.github.io/Education)

* Data Management Training Clearinghouse (https://dmtclearinghouse.esipfed.org/)

* Research data management open training materials Zenodo Community (https://zenodo.org/communities/dcc-rdm-training-materials)

* Consortium of European Social Science Data Archives (CESSDA) Training Resources (https://www.cessda.eu/Training-Resources)

Bonus:

Learn from TCGA # Collaborative Genomics Projects: A Comprehensive Guide https://www.sciencedirect.com/book/9780128021439/collaborative-genomics-projects-a-comprehensive-guide

Sunday, November 27, 2022

8 R/command line tools to deal with excel, tsv and csv files

 R packages:

* [readxl](https://readxl.tidyverse.org/)

* [tidyxl](https://github.com/nacnudus/tidyxl)

* [janitor](https://github.com/sfirke/janitor)


command line tools:

* [VisiData](https://www.visidata.org/) is an interactive multitool for tabular data. It combines the clarity of a spreadsheet, the efficiency of the terminal, and the power of Python, into a lightweight utility which can handle millions of rows with ease.

* [csvkit](https://csvkit.readthedocs.io/en/latest/index.html#)

* [csvtk](https://bioinf.shenwei.me/csvtk/usage/) a cross-platform, efficient and practical CSV/TSV toolkit.

* [Miller](https://miller.readthedocs.io/en/latest/) is a command-line tool for querying, shaping, and reformatting data files in various formats including CSV, TSV, JSON, and JSON Lines.

* [eBay's TSV Utilities](https://opensource.ebay.com/tsv-utils/)

Tuesday, November 15, 2022

8 Resources to study Transcription factor binding, enhancers and histone modification distribution

 1. ENCODE https://www.encodeproject.org/

2. The International Human Epigenome Consortium (IHEC) epigenome data portal http://epigenomesportal.ca/ihec/index.html?as=1

3. Blueprint epigenome http://dcc.blueprint-epigenome.eu/#/home

4. EpiFactors http://epifactors.autosome.ru/ is a database for epigenetic factors, corresponding genes and products.

5. CistromeDB http://cistrome.org/db/#/ by Shirley Liu group

6. Remap https://remap2022.univ-amu.fr/ is a large scale integrative analysis of DNA-binding experiments for Homo sapiens, Mus musculus, Drosophila melanogaster and Arabidopsis thaliana transcriptional regulators.

7. ChIP-Atlas http://chip-atlas.org/  An integrative, comprehensive database to explore public Epigenetic dataset, including ChIP-Seq, DNase-Seq, ATAC-Seq, and Bisulfite-Seq data: ChIP-Atlas covers almost all public data archived in Sequence Read Archive of NCBI, EBI, and DDBJ with over 224,000 experiments.

8. Fantom5 https://fantom.gsc.riken.jp/5/

Sunday, November 13, 2022

7 Books for you to learn bioinformatics

1.  Data Analysis for the Life Sciences https://leanpub.com/dataanalysisforthelifesciences You can get it for free!

2. practical computing for biologist https://practicalcomputing.org/ My first ever book to start learning computational biology.

3. A Primer for Computational Biology https://open.oregonstate.education/computationalbiology/

4. Computational Genomics with R  http://compgenomr.github.io/book/

5. The Biologist’s Guide to Computing https://book.biologistsguide2computing.com/en/stable

6. Bioinformatics Data Skills https://www.oreilly.com/library/view/bioinformatics-data-skills/9781449367480/ A must read to upgrade your bioinformatics skills once you know the basics.

7. Bioinformatics Workbook: A tutorial to help scientists design their projects and analyze their data. https://bioinformaticsworkbook.org/#gsc.tab=0

Thursday, November 10, 2022

7 FREE Books to learn data science

1. Data science: A first introduction https://datasciencebook.ca/

2. Introduction to Data Science http://rafalab.dfci.harvard.edu/dsbook/

3. Agile Data Science with R https://edwinth.github.io/ADSwR/index.html

4. Tidy Modeling with R https://www.tmwr.org/

5. Feature Engineering and Selection: A Practical Approach for Predictive Models https://bookdown.org/max/FES/

6. Another Book on Data Science https://www.anotherbookondatascience.com/ compare R and python side by side

7. Research Software Engineering with Python https://merely-useful.tech/py-rse/

Wednesday, November 9, 2022

12 resources to bookmark for reproducible computational research

1. a reproducible workflow. https://www.youtube.com/watch?v=s3JldKoA0zw This two minute video will change your mind on reproducible research 

2. Parallel sequencing lives, or what makes large sequencing projects successful https://academic.oup.com/gigascience/article/6/11/gix100/4557140?login=false

3. Common-sense approaches to sharing tabular data alongside publication https://www.sciencedirect.com/science/article/pii/S2666389921002300

4. A Reproducible Data Analysis Workflow with R Markdown, Git, Make, and Docker https://psyarxiv.com/8xzqy/

5. Practical Computational Reproducibility in the Life Sciences https://www.cell.com/cell-systems/fulltext/S2405-4712(18)30140-6

6. A video by Dr.Keith A. Baggerly from MD Anderson [The Importance of Reproducible Research in High-Throughput Biology](https://www.youtube.com/watch?v=7gYIs7uYbMo) highly recommended.

7. Ten Simple Rules for Reproducible Computational Research http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285)

8. Good Enough Practices in Scientific Computing http://arxiv.org/abs/1609.00037 

9. Best Practices for Scientific Computing https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745

10. A Quick Guide to Organizing Computational Biology Projects http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.100042  A must read for computational biologists!

11. Reproducibility of computational workflows is automated using continuous analysis https://www.nature.com/articles/nbt.3780

12. Five selfish reasons to work reproducibly https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0850-7

Monday, November 7, 2022

9 tools for interactive exploring single-cell RNAseq data

1. cellxgene https://github.com/chanzuckerberg/cellxgene

2. cellar https://github.com/euxhenh/cellar

3. scSVA: an interactive tool for big data visualization and exploration in single-cell omics https://www.biorxiv.org/content/10.1101/512582v1

4. ASAP: a web-based platform for the analysis and interactive visualization of single-cell RNA-seq data https://academic.oup.com/bioinformatics/article/33/19/3123/3852081?login=false

5. [iSEE](https://bioconductor.org/packages/release/bioc/html/iSEE.html) Provides functions for creating an interactive Shiny-based graphical user interface for exploring data stored in SummarizedExperiment objects, including row- and column-level metadata

6. [VISION](https://github.com/YosefLab/VISION) A high-throughput and unbiased module for interpreting scRNA-seq data.

7. [DISCO](http://immunesinglecell.org/): Deep Integration of Single-Cell Omics. Want to visual millions of cell online and annotate cell type automatically? Try it!!! Make single cell easier and make life easier!

8. [TISCH](http://tisch.comp-genomics.org/) Tumor Immune Single-cell Hub (TISCH) is a scRNA-seq database focusing on tumor microenvironment (TME).

9. [CancerSCEM](https://ngdc.cncb.ac.cn/cancerscem) To date, CancerSCE version 1.0 consists of 208 cancer samples across 28 studies and 20 human cancer types

8 links to BETTER understand principal component analysis (PCA)

9 links to BETTER understand principal component analysis (PCA):

1. https://divingintogeneticsandgenomics.rbind.io/post/pca-in-action/  PCA in action, my blog post to calculate SVD and PCA with #rstats 

2. https://www.youtube.com/watch?v=rYz83XPxiZo MIT 1806 linear algebra  on SVD

3. https://peterbloem.nl/blog/pca-4 THE SINGULAR VALUE DECOMPOSITION (SVD)

4. http://rafalab.github.io/pages/harvardx.html High Dimension data analysis, week 2. 

5. https://towardsdatascience.com/why-pca-looks-triangular-a642daac721a why PCA looks triangular. 

6. https://www.nxn.se/valent/2017/6/12/how-to-read-pca-plots How to read PCA plots for single-cell data.

7. https://twitter.com/AedinCulhane/status/1007110262187544577 PCA horseshoe artifact

8. https://www.youtube.com/watch?v=_UVHneBUBW0  by Josh Starmer

Thursday, November 3, 2022

5 tools to visualize genomic datasets

 1. Karyoploter https://bernatgel.github.io/karyoploter_tutorial/Tutorial/PlotCoverage/PlotCoverage.html I used that to plot single-cell ATACseq tracks https://github.com/crazyhottommy/scATACutils/#plot-atacseq-tracks-for-each-cluster-of-cells, more examples https://rpubs.com/crazyhottommy/scATAC_tracks

2. plotgardener is a genomic data visualization package for R. Using `grid` graphics, `plotgardener` empowers users to programmatically and flexibly generate multi-panel figures 

https://github.com/PhanstielLab/plotgardener 

3. The goal of **g(r)osling** https://github.com/gosling-lang/grosling is to help you build interactive genomics visualizations with [Gosling](https://github.com/gosling-lang/gosling.js). This package uses [reticulate](https://rstudio.github.io/reticulate/) to provide an interface to the [Gos](https://github.com/gosling-lang/gos) Python package. https://github.com/gosling-lang/grosling

4.  Intervene: a tool for intersection and visualization of multiple gene or genomic region sets 

 https://bitbucket.org/CBGR/intervene/src/master/

 5. https://42basepairs.com/ saw it yesterday by @RobAboukhalil

Wednesday, November 2, 2022

8 links to bookmark for better data visualization

 Data visualization is a critical step in data analysis, 8 links to bookmark for better data visualization :

1. Nature Methods point of view data visualization  http://blogs.nature.com/methagora/2013/07/data-visualization-points-of-view.html the columns on color mapping and heatmap are very nice.

2. Ten simple rules to colorize biological data visualization https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008259

3. data visualization resources https://sabahzero.github.io/dataviz/resources

4. Fundamentals of Data Visualization https://clauswilke.com/dataviz/ 

5. Data Visualization https://socviz.co/  by Kieran Healy. I've read book and 4 and 5.

6. [R Graphics Cookbook](http://www.cookbook-r.com/Graphs/) by Winston Chang.

7. [ggplot2: Elegant Graphics for Data Analysis](https://www.amazon.com/ggplot2-Elegant-Graphics-Data-Analysis/dp/0387981403) by Hadely Wickham.

8. https://www.data-to-viz.com/ help you to choose the right chart

Tuesday, November 1, 2022

6 links on workflow to make your life easier

 Bioinformatics analysis involves a lot of steps, 6 links on workflow to make your life easier:

1. over hundreds of workflow tools and engines https://github.com/pditommaso/awesome-pipeline 

2. see also from the CWL wiki https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems

3. A review of bioinformatic pipeline frameworks https://academic.oup.com/bib/article/18/3/530/2562749

4. discussion on biostars https://www.biostars.org/p/115745/

5. two papers by Titus Brown [Ten simple rules and a template for creating workflows-as-applications](https://osf.io/preprints/8w5j3/)

6.  Streamlining Data-Intensive Biology With Workflow Systems https://dib-lab.github.io/2020-workflows-paper/