Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Sunday, January 29, 2023

10 tips for learning git

1/ Several basic commands will serve you a long way:

git clone
git add
git commit -m
git push
Those are enough to get you started. To be honest, those are still the most frequent commands I use.

2/ understand git and github. You use git to track files locally, and github can host your repos. You can start with the github skill page
gitlab is an alternative to github

3/ software carpentry git workshop is a nice resource to learn git

4/ An open source game about learning Git!

5/ Learn it for free on Udemy

6/ The best interactive tutorial for learing git branching
 I had a lot of fun playing it.

Oh Shit, Git!?! You know, sometimes it messed up so much locally I just delete my local copy and do a fresh git clone :)
8/ How to use git with R.

9/ git cheatsheet

10/ if you collaborate with others, you need to understand the gihub flow

Tuesday, December 13, 2022

15 tools/papers for multi-sample multi-group single-cell RNAseq differential expression analysis

 1/  [An Empirical Bayes Method for Differential Expression Analysis of Single Cells with Deep Generative Models]( scVI-DE

2/  [muscat](

3/  [Confronting false discoveries in single-cell differential expression]( "These observations suggest that, in practice, pseudobulk approaches provide an excellent trade-off between speed and accuracy for single-cell DE analysis." One needs to considder biolgoical replicates, pseduobulk works well.

4/  [Modelling group heteroscedasticity in single-cellRNA-seq pseudo-bulk data](

5/  [BSDE: barycenter single-cell differential expression for case–control studies](

 6/ [distinct]( Both are from Mark Robinson group.

7/ [nebula](

8/  [Fast identification of differential distributions in single-cell RNA-sequencing data with waddR](

9/ [CoCoA-diff: counterfactual inference for single-cell gene expression analysis](

10/ [Bias, robustness and scalability in single-cell differential expression analysis]( From Mark Robinson group.

11/ [Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data]( "We observed that current methods designed for scRNAseq data do not tend to show better performance compared to methods designed for bulk RNAseq data."

12/  [Tree-based Correlation Screen and Visualization for Exploring Phenotype-Cell Type Association in Multiple Sample Single-Cell RNA-Sequencing Experiments]( TreeCorTreat is an open source R package that tackles this problem by using a tree-based correlation screen to analyze and visualize the association between phenotype and transcriptomic features and cell types at multiple cell type resolution levels.

13/ [Quantifying the effect of experimental perturbations in single-cell RNA-sequencing data using graph signal processing]( read this thread

14/  [Causal identification of single-cell experimental perturbation effects with CINEMA-OT](


15/ [IDEAS: individual level differential expression analysis for single-cell RNA-seq data](

Sunday, December 11, 2022

32 resources for (to-be) faculty on salary negotiation, grant writing, funding, and lab management

1/ Tips for negotiating salary and startup for newly-hired tenure-track faculty](

2/  [Creating accessibility in academic negotiations](

3/ [Ten Simple Rules to becoming a principal investigator](

4/  [applying for a faculty position]( by Roger Peng.

5/ [A list of publicly available grant proposals in the biological sciences](

6/ [open grant]( find other people's grants.

7/  [Early Career Funding, Awards, and Other Funding]( 


9/  [Funding schemes for postdoctoral fellowships](

10/  [Postdoctoral Funding Opportunities by Johns Hopkins](

11/  [Early Career Funding Opportunities by Johns Hopkins](

12/   [The CommKit]( is a collection of guides to successful communication in the biological sciences, written by the BRCL Fellows.

13/  [writing in sciences stanford online course](

14 / [Ten simple rules for structuring papers](

15/  [NIH grant podcasts](

16/  [NIC guide](

17/ [Thoughts on reviewing NIH proposals: What is the difference between a 2.0 and 3.0 in initial score?]( a blog post.

18/  [how to write a K99]( by Samir Amin (my good buddy). Go and check out this treasure.

19/  [seeking the k99]( a blog post by Tim Stuart.

20/  [AuthorArranger: Conquer journal title pages in seconds](

21/  [typeset]( The quickest way to read and understand scientific literature

22/  [cocites](

23/  [connected papers](

24/  [ZoteroBib]( is a free service that helps you quickly create a bibliography in any citation style.

25/  [How to craft a figure legend for scientific papers]( 

26/  [Ten quick tips for making things findable](

27/  [Making experimental data tables in the life sciences more FAIR: a pragmatic approach](

28/  protocols:

29/  [electronic lab notebooks review by Harvard HMS](

30/ [Rspace]( Next-gen Elab notebook.

31/  [How to grow a healthy lab](  Nature collections

32/  [Bench Sci]( Run Successful Experiments with the Right Antibody. Let our AI decode the literature to provide antibody usage data that's unbiased and experiment-specific

Wednesday, December 7, 2022

23 tools to work with (single-cell) TCR/BCR-seq immune repertoire data

1/  [immunarch]( 

2/ [scRepertoire]( 

3/ [dandelion](  python package for analyzing single cell BCR/TCR data from 10x Genomics 5’ solution! 

4/ [TRUST4]( developed in Shirley Liu's group. Use it to extract TCR/BCR information from bulk RNAseq or 5' scRNAseq data.

5/  a dramatic speedup for one of the core computations for adaptive immune receptor repertoire (AIRR) analysis - the discovery and counting of receptors that overlap between repertoires! Check out  [CompAIRR]( With 10^4 repertoires of 10^5 sequences each, CompAIRR ran in 17 minutes while the fastest existing tool took 10 days, amounting to a ~1000x speedup

6/ [ClusTCR]( a Python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity;

7/ [GLIPH2](

8/  [GIANA allows computationally-efficient TCR clustering and multi-disease repertoire classification by isometric transformation]( from Bo Li.

9/  [tcrdist3]( is a python API-enabled toolkit for analyzing T-cell receptor repertoires

10/ [TCRex]( a web tool for the prediction of TCR–epitope recognition

11/  [ImRex]( TCR-epitope recognition prediction using combined sequence input represention for convolutional neural networks.

12/  [NetTCR - 2.0]( Sequence-based prediction of peptide-TCR binding

13/  [CellaRepertorium](

14/  [enclone]( from 10x. we should give this a try if we want to cluster TCR and BCR clonotypes.

15/  [migec]( RepSeq processing swiss-knife.

16/  [MiXCR]( is a universal software for fast and accurate analysis of T- and B- cell receptor repertoire sequencing data.

17/ [ImReP]( is a computational method for rapid and accurate profiling of the adaptive immune repertoire from regular RNA-Seq data.

18/ [TcellMatch]( Predicting T-cell to epitope specificity. cellMatch is a collection of models to predict antigen specificity of **single T cells** based on CDR3 sequences and other single cell modalities, such as RNA counts and surface protein counts

19/ [scirpy]( A scanpy extension for single-cell TCR analysis. 

20/  [Tessa]( is a Bayesian model to integrate T cell receptor (TCR) sequence profiling with transcriptomes of T cells. Enabled by the recently developed single cell sequencing techniques, which provide both TCR sequences and RNA sequences of each T cell concurrently, Tessa maps the functional landscape of the TCR repertoire, and generates insights into understanding human immune response to diseases. 

21/ [DeepTCR]( Deep Learning Methods for Parsing T-Cell Receptor Sequencing (TCRSeq) Data

22/  [Integrating T cell receptor sequences and transcriptional profiles by clonotype neighbor graph analysis (CoNGA)](

23/ [Echidna: Integrated simulations of single-cell immune receptor repertoires and transcriptomes](

Tuesday, November 29, 2022

7 links to deeply understand heatmap

Making a heatmap is an essential skill for a bioinformatician. Just check how many figures are heatmap or heatmap variants in the genomics or single cell paper.

But you probably do not understand heatmap. 7 reading resources to understand heatmap!

1/  Mapping quantitative data to color 

2/  Heat map from Nature Method column

3/  A tale of two heatmap functions An old post by me.

4/  Heatmap demystified yet another post by me

5/  understand color mapping is key

6/ understand rastering

7/  what happens when you have a huge matrix 20,000 rows/genes  x 50 columns to plot?

I learned so much from Zuguang Gu, thanks for his awesome Complexheatmap package . it is my go-to tool for making heatmaps.

Monday, November 28, 2022

6 training resources for data management

* Best Practices for Biomedical Research Data Management

* Research Data Management Librarian Academy (

* DataONE Data Management Skillbuilding Hub  (

* Data Management Training Clearinghouse (

* Research data management open training materials Zenodo Community (

* Consortium of European Social Science Data Archives (CESSDA) Training Resources (


Learn from TCGA # Collaborative Genomics Projects: A Comprehensive Guide

Sunday, November 27, 2022

8 R/command line tools to deal with excel, tsv and csv files

 R packages:

* [readxl](

* [tidyxl](

* [janitor](

command line tools:

* [VisiData]( is an interactive multitool for tabular data. It combines the clarity of a spreadsheet, the efficiency of the terminal, and the power of Python, into a lightweight utility which can handle millions of rows with ease.

* [csvkit](

* [csvtk]( a cross-platform, efficient and practical CSV/TSV toolkit.

* [Miller]( is a command-line tool for querying, shaping, and reformatting data files in various formats including CSV, TSV, JSON, and JSON Lines.

* [eBay's TSV Utilities](

Tuesday, November 15, 2022

8 Resources to study Transcription factor binding, enhancers and histone modification distribution


2. The International Human Epigenome Consortium (IHEC) epigenome data portal

3. Blueprint epigenome

4. EpiFactors is a database for epigenetic factors, corresponding genes and products.

5. CistromeDB by Shirley Liu group

6. Remap is a large scale integrative analysis of DNA-binding experiments for Homo sapiens, Mus musculus, Drosophila melanogaster and Arabidopsis thaliana transcriptional regulators.

7. ChIP-Atlas  An integrative, comprehensive database to explore public Epigenetic dataset, including ChIP-Seq, DNase-Seq, ATAC-Seq, and Bisulfite-Seq data: ChIP-Atlas covers almost all public data archived in Sequence Read Archive of NCBI, EBI, and DDBJ with over 224,000 experiments.

8. Fantom5

Sunday, November 13, 2022

7 Books for you to learn bioinformatics

1.  Data Analysis for the Life Sciences You can get it for free!

2. practical computing for biologist My first ever book to start learning computational biology.

3. A Primer for Computational Biology

4. Computational Genomics with R

5. The Biologist’s Guide to Computing

6. Bioinformatics Data Skills A must read to upgrade your bioinformatics skills once you know the basics.

7. Bioinformatics Workbook: A tutorial to help scientists design their projects and analyze their data.

Thursday, November 10, 2022

7 FREE Books to learn data science

1. Data science: A first introduction

2. Introduction to Data Science

3. Agile Data Science with R

4. Tidy Modeling with R

5. Feature Engineering and Selection: A Practical Approach for Predictive Models

6. Another Book on Data Science compare R and python side by side

7. Research Software Engineering with Python