Diving into Genetics and Genomics

This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Sunday, January 29, 2023

10 tips for learning git

1/ Several basic commands will serve you a long way:

git clone

git add

git commit -m

git push

Those are enough to get you started. To be honest, those are still the most frequent commands I use.

2/ understand git and github. You use git to track files locally, and github can host your repos. You can start with the github skill page https://buff.ly/3tO2iaf

gitlab https://buff.ly/3JlGA69 is an alternative to github

3/ software carpentry git workshop is a nice resource to learn git https://buff.ly/3kUhqB7

4/ An open source game about learning Git! https://buff.ly/2ZPXUrX

5/ Learn it for free on Udemy https://buff.ly/3RvTCA9

6/ The best interactive tutorial for learing git branching https://buff.ly/2tQTJN4

I had a lot of fun playing it.

7/ https://buff.ly/2w5p9zi

Oh Shit, Git!?! You know, sometimes it messed up so much locally I just delete my local copy and do a fresh git clone :)

8/ https://buff.ly/2U9C8hC How to use git with R.

9/ git cheatsheet https://buff.ly/3H2PrWa

10/ if you collaborate with others, you need to understand the gihub flow

https://buff.ly/3CcvTio

Tuesday, December 13, 2022

15 tools/papers for multi-sample multi-group single-cell RNAseq differential expression analysis

1/ [An Empirical Bayes Method for Differential Expression Analysis of Single Cells with Deep Generative Models](https://www.biorxiv.org/content/10.1101/2022.05.27.493625v1) scVI-DE

2/ [muscat](http://www.bioconductor.org/packages/release/bioc/html/muscat.html)

3/ [Confronting false discoveries in single-cell differential expression](https://www.nature.com/articles/s41467-021-25960-2) "These observations suggest that, in practice, pseudobulk approaches provide an excellent trade-off between speed and accuracy for single-cell DE analysis." One needs to considder biolgoical replicates, pseduobulk works well.

4/ [Modelling group heteroscedasticity in single-cellRNA-seq pseudo-bulk data](https://www.biorxiv.org/content/10.1101/2022.09.12.507511v1)

5/ [BSDE: barycenter single-cell differential expression for case–control studies](https://academic.oup.com/bioinformatics/article/38/10/2765/6554192?login=false)

6/ [distinct](http://www.bioconductor.org/packages/release/bioc/html/distinct.html) Both are from Mark Robinson group.

7/ [nebula](https://github.com/lhe17/nebula) https://www.biorxiv.org/content/biorxiv/early/2020/09/25/2020.09.24.311662.full.pdf

8/ [Fast identification of differential distributions in single-cell RNA-sequencing data with waddR](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btab226/6207964) https://github.com/goncalves-lab/waddR

9/ [CoCoA-diff: counterfactual inference for single-cell gene expression analysis](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02438-4)

10/ [Bias, robustness and scalability in single-cell differential expression analysis](https://www.nature.com/articles/nmeth.4612) From Mark Robinson group.

11/ [Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2599-6) "We observed that current methods designed for scRNAseq data do not tend to show better performance compared to methods designed for bulk RNAseq data."

12/ [Tree-based Correlation Screen and Visualization for Exploring Phenotype-Cell Type Association in Multiple Sample Single-Cell RNA-Sequencing Experiments](https://www.biorxiv.org/content/10.1101/2021.10.27.466024v1) TreeCorTreat is an open source R package that tackles this problem by using a tree-based correlation screen to analyze and visualize the association between phenotype and transcriptomic features and cell types at multiple cell type resolution levels.

13/ [Quantifying the effect of experimental perturbations in single-cell RNA-sequencing data using graph signal processing](https://www.biorxiv.org/content/10.1101/532846v3) read this thread https://twitter.com/krishnaswamylab/status/1328876444810960896?s=27

14/ [Causal identification of single-cell experimental perturbation effects with CINEMA-OT](https://www.biorxiv.org/content/10.1101/2022.07.31.502173v1)

github https://github.com/vandijklab/CINEMA-OT

15/ [IDEAS: individual level differential expression analysis for single-cell RNA-seq data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02605-1)

Sunday, December 11, 2022

32 resources for (to-be) faculty on salary negotiation, grant writing, funding, and lab management

1/ Tips for negotiating salary and startup for newly-hired tenure-track faculty](https://dynamicecology.wordpress.com/2017/03/01/tips-for-negotiating-salary-and-startup-for-newly-hired-tenure-track-faculty/)

2/ [Creating accessibility in academic negotiations](https://www.sciencedirect.com/science/article/pii/S0968000422002870?dgcid=authord)

3/ [Ten Simple Rules to becoming a principal investigator](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007448)

4/ [applying for a faculty position](http://effortreport.libsyn.com/15-applying-for-a-faculty-position) by Roger Peng.

5/ [A list of publicly available grant proposals in the biological sciences](https://jabberwocky.weecology.org/2012/08/10/a-list-of-publicly-available-grant-proposals-in-the-biological-sciences/)

6/ [open grant](https://www.ogrants.org/) find other people's grants.

7/ [Early Career Funding, Awards, and Other Funding](https://docs.google.com/spreadsheets/d/1H1aj--VUYr7eMFk_T7x0Oh985LqbyyscXg2wAAevDnU/edit#gid=0)

8/ https://ecrcentral.org/resources

9/ [Funding schemes for postdoctoral fellowships](https://asntech.github.io/postdoc-funding-schemes/)

10/ [Postdoctoral Funding Opportunities by Johns Hopkins](https://research.jhu.edu/rdt/funding-opportunities/postdoctoral/)

11/ [Early Career Funding Opportunities by Johns Hopkins](https://research.jhu.edu/rdt/funding-opportunities/early-career/)

12/ [The CommKit](http://mitcommlab.mit.edu/broad/use-the-commkit/) is a collection of guides to successful communication in the biological sciences, written by the BRCL Fellows.

13/ [writing in sciences stanford online course](https://www.coursera.org/learn/sciwrite/)

14 / [Ten simple rules for structuring papers](http://www.biorxiv.org/content/early/2017/05/23/088278)

15/ [NIH grant podcasts](https://grants.nih.gov/news/virtual-learning/podcasts.htm)

16/ [NIC guide](https://www.niaid.nih.gov/grants-contracts/write-research-plan)

17/ [Thoughts on reviewing NIH proposals: What is the difference between a 2.0 and 3.0 in initial score?](http://mistressoftheanimals.scientopia.org/2018/02/10/thoughts-on-reviewing-nih-proposals-what-is-the-difference-between-a-2-0-and-3-0-in-initial-score/) a blog post.

18/ [how to write a K99](https://k99.sbamin.com/) by Samir Amin (my good buddy). Go and check out this treasure.

19/ [seeking the k99](https://timoast.github.io/blog/seeking-the-k99/) a blog post by Tim Stuart.

20/ [AuthorArranger: Conquer journal title pages in seconds](https://authorarranger.nci.nih.gov/#/)

21/ [typeset](https://www.typeset.io/) The quickest way to read and understand scientific literature

22/ [cocites](http://www.cocites.com/)

23/ [connected papers](https://www.connectedpapers.com/)

24/ [ZoteroBib](https://zbib.org/) is a free service that helps you quickly create a bibliography in any citation style.

25/ [How to craft a figure legend for scientific papers](https://blog.bioturing.com/2018/05/10/how-to-craft-a-figure-legend-for-scientific-papers/)

26/ [Ten quick tips for making things findable](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008469)

27/ [Making experimental data tables in the life sciences more FAIR: a pragmatic approach](https://academic.oup.com/gigascience/article/9/12/giaa144/6034785)

28/ protocols: https://www.protocols.io/

29/ [electronic lab notebooks review by Harvard HMS](https://datamanagement.hms.harvard.edu/electronic-lab-notebooks)

30/ [Rspace](https://www.researchspace.com/) Next-gen Elab notebook.

31/ [How to grow a healthy lab](https://www.nature.com/collections/pmlcrkkyyq) Nature collections

32/ [Bench Sci](https://www.benchsci.com/) Run Successful Experiments with the Right Antibody. Let our AI decode the literature to provide antibody usage data that's unbiased and experiment-specific

Wednesday, December 7, 2022

23 tools to work with (single-cell) TCR/BCR-seq immune repertoire data

1/ [immunarch](https://immunarch.com/index.html)

2/ [scRepertoire](https://github.com/ncborcherding/scRepertoire)

3/ [dandelion](https://sc-dandelion.readthedocs.io/en/latest/) python package for analyzing single cell BCR/TCR data from 10x Genomics 5’ solution!

4/ [TRUST4](https://www.nature.com/articles/s41592-021-01142-2) developed in Shirley Liu's group. Use it to extract TCR/BCR information from bulk RNAseq or 5' scRNAseq data.

5/ a dramatic speedup for one of the core computations for adaptive immune receptor repertoire (AIRR) analysis - the discovery and counting of receptors that overlap between repertoires! Check out [CompAIRR](https://github.com/uio-bmi/compairr). With 10^4 repertoires of 10^5 sequences each, CompAIRR ran in 17 minutes while the fastest existing tool took 10 days, amounting to a ~1000x speedup

6/ [ClusTCR](https://svalkiers.github.io/clusTCR/): a Python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity;

7/ [GLIPH2](https://www.nature.com/articles/s41587-020-0505-4)

8/ [GIANA allows computationally-efficient TCR clustering and multi-disease repertoire classification by isometric transformation](https://www.nature.com/articles/s41467-021-25006-7) from Bo Li.

9/ [tcrdist3](https://github.com/kmayerb/tcrdist3) is a python API-enabled toolkit for analyzing T-cell receptor repertoires

10/ [TCRex](https://tcrex.biodatamining.be/): a web tool for the prediction of TCR–epitope recognition

11/ [ImRex](https://github.com/pmoris/ImRex) TCR-epitope recognition prediction using combined sequence input represention for convolutional neural networks.

12/ [NetTCR - 2.0](https://services.healthtech.dtu.dk/service.php?NetTCR-2.0) Sequence-based prediction of peptide-TCR binding

13/ [CellaRepertorium](https://github.com/amcdavid/CellaRepertorium)

14/ [enclone](https://10xgenomics.github.io/enclone/) from 10x. we should give this a try if we want to cluster TCR and BCR clonotypes.

15/ [migec](https://github.com/mikessh/migec):A RepSeq processing swiss-knife.

16/ [MiXCR](https://github.com/milaboratory/mixcr) is a universal software for fast and accurate analysis of T- and B- cell receptor repertoire sequencing data.

17/ [ImReP](https://sergheimangul.wordpress.com/imrep/) is a computational method for rapid and accurate profiling of the adaptive immune repertoire from regular RNA-Seq data.

18/ [TcellMatch](https://github.com/theislab/tcellmatch): Predicting T-cell to epitope specificity. cellMatch is a collection of models to predict antigen specificity of **single T cells** based on CDR3 sequences and other single cell modalities, such as RNA counts and surface protein counts

19/ [scirpy](https://github.com/icbi-lab/scirpy): A scanpy extension for single-cell TCR analysis.

20/ [Tessa](https://github.com/jcao89757/tessa) is a Bayesian model to integrate T cell receptor (TCR) sequence profiling with transcriptomes of T cells. Enabled by the recently developed single cell sequencing techniques, which provide both TCR sequences and RNA sequences of each T cell concurrently, Tessa maps the functional landscape of the TCR repertoire, and generates insights into understanding human immune response to diseases.

21/ [DeepTCR](https://github.com/sidhomj/DeepTCR) Deep Learning Methods for Parsing T-Cell Receptor Sequencing (TCRSeq) Data

https://twitter.com/John_Will_I_Am/status/1570837756787691527

https://www.science.org/doi/10.1126/sciadv.abq5089

22/ [Integrating T cell receptor sequences and transcriptional profiles by clonotype neighbor graph analysis (CoNGA)](https://www.nature.com/articles/s41587-021-00989-2)

23/ [Echidna: Integrated simulations of single-cell immune receptor repertoires and transcriptomes](https://academic.oup.com/bioinformaticsadvances/advance-article/doi/10.1093/bioadv/vbac062/6687122?login=false)

Tuesday, November 29, 2022

7 links to deeply understand heatmap

Making a heatmap is an essential skill for a bioinformatician. Just check how many figures are heatmap or heatmap variants in the genomics or single cell paper.

But you probably do not understand heatmap. 7 reading resources to understand heatmap!

1/ Mapping quantitative data to color https://www.nature.com/articles/nmeth.2134

2/ Heat map from Nature Method column https://www.nature.com/articles/nmeth.1902

3/ A tale of two heatmap functions https://rpubs.com/crazyhottommy/a-tale-of-two-heatmap-functions An old post by me.

4/ Heatmap demystified https://rpubs.com/crazyhottommy/heatmap_demystified yet another post by me

5/ understand color mapping is key https://jokergoo.github.io/ComplexHeatmap-reference/book/a-single-heatmap.html#colors

6/ understand rastering https://jokergoo.github.io/2020/06/30/rasterization-in-complexheatmap/

7/ what happens when you have a huge matrix 20,000 rows/genes x 50 columns to plot? https://gdevailly.netlify.app/post/plotting-big-matrices-in-r/

I learned so much from Zuguang Gu, thanks for his awesome Complexheatmap package https://jokergoo.github.io/ComplexHeatmap-reference/book/index.html . it is my go-to tool for making heatmaps.

Monday, November 28, 2022

6 training resources for data management

* Best Practices for Biomedical Research Data Management https://learn.canvas.net/courses/1854

* Research Data Management Librarian Academy (https://rdmla.github.io/)

* DataONE Data Management Skillbuilding Hub (https://dataoneorg.github.io/Education)

* Data Management Training Clearinghouse (https://dmtclearinghouse.esipfed.org/)

* Research data management open training materials Zenodo Community (https://zenodo.org/communities/dcc-rdm-training-materials)

* Consortium of European Social Science Data Archives (CESSDA) Training Resources (https://www.cessda.eu/Training-Resources)

Bonus:

Learn from TCGA # Collaborative Genomics Projects: A Comprehensive Guide https://www.sciencedirect.com/book/9780128021439/collaborative-genomics-projects-a-comprehensive-guide

Sunday, November 27, 2022

8 R/command line tools to deal with excel, tsv and csv files

R packages:

* [readxl](https://readxl.tidyverse.org/)

* [tidyxl](https://github.com/nacnudus/tidyxl)

* [janitor](https://github.com/sfirke/janitor)

command line tools:

* [VisiData](https://www.visidata.org/) is an interactive multitool for tabular data. It combines the clarity of a spreadsheet, the efficiency of the terminal, and the power of Python, into a lightweight utility which can handle millions of rows with ease.

* [csvkit](https://csvkit.readthedocs.io/en/latest/index.html#)

* [csvtk](https://bioinf.shenwei.me/csvtk/usage/) a cross-platform, efficient and practical CSV/TSV toolkit.

* [Miller](https://miller.readthedocs.io/en/latest/) is a command-line tool for querying, shaping, and reformatting data files in various formats including CSV, TSV, JSON, and JSON Lines.

* [eBay's TSV Utilities](https://opensource.ebay.com/tsv-utils/)

Tuesday, November 15, 2022

8 Resources to study Transcription factor binding, enhancers and histone modification distribution

1. ENCODE https://www.encodeproject.org/

2. The International Human Epigenome Consortium (IHEC) epigenome data portal http://epigenomesportal.ca/ihec/index.html?as=1

3. Blueprint epigenome http://dcc.blueprint-epigenome.eu/#/home

4. EpiFactors http://epifactors.autosome.ru/ is a database for epigenetic factors, corresponding genes and products.

5. CistromeDB http://cistrome.org/db/#/ by Shirley Liu group

6. Remap https://remap2022.univ-amu.fr/ is a large scale integrative analysis of DNA-binding experiments for Homo sapiens, Mus musculus, Drosophila melanogaster and Arabidopsis thaliana transcriptional regulators.

7. ChIP-Atlas http://chip-atlas.org/ An integrative, comprehensive database to explore public Epigenetic dataset, including ChIP-Seq, DNase-Seq, ATAC-Seq, and Bisulfite-Seq data: ChIP-Atlas covers almost all public data archived in Sequence Read Archive of NCBI, EBI, and DDBJ with over 224,000 experiments.

8. Fantom5 https://fantom.gsc.riken.jp/5/

Sunday, November 13, 2022

7 Books for you to learn bioinformatics

1. Data Analysis for the Life Sciences https://leanpub.com/dataanalysisforthelifesciences You can get it for free!

2. practical computing for biologist https://practicalcomputing.org/ My first ever book to start learning computational biology.

3. A Primer for Computational Biology https://open.oregonstate.education/computationalbiology/

4. Computational Genomics with R http://compgenomr.github.io/book/

5. The Biologist’s Guide to Computing https://book.biologistsguide2computing.com/en/stable/

6. Bioinformatics Data Skills https://www.oreilly.com/library/view/bioinformatics-data-skills/9781449367480/ A must read to upgrade your bioinformatics skills once you know the basics.

7. Bioinformatics Workbook: A tutorial to help scientists design their projects and analyze their data. https://bioinformaticsworkbook.org/#gsc.tab=0

Thursday, November 10, 2022

7 FREE Books to learn data science

1. Data science: A first introduction https://datasciencebook.ca/

2. Introduction to Data Science http://rafalab.dfci.harvard.edu/dsbook/

3. Agile Data Science with R https://edwinth.github.io/ADSwR/index.html

4. Tidy Modeling with R https://www.tmwr.org/

5. Feature Engineering and Selection: A Practical Approach for Predictive Models https://bookdown.org/max/FES/

6. Another Book on Data Science https://www.anotherbookondatascience.com/ compare R and python side by side

7. Research Software Engineering with Python https://merely-useful.tech/py-rse/