1/ Several basic commands will serve you a long way:
Diving into Genetics and Genomics
A wet-dry hybrid biologist's take on genetics and genomics. Mostly is about Linux, R, python, reproducible research, open science and NGS. Grab my book to transform yourself to a computational biologist https://divingintogeneticsandgenomics.ck.page/
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Sunday, January 29, 2023
10 tips for learning git
Tuesday, December 13, 2022
15 tools/papers for multi-sample multi-group single-cell RNAseq differential expression analysis
1/ [An Empirical Bayes Method for Differential Expression Analysis of Single Cells with Deep Generative Models](https://www.biorxiv.org/content/10.1101/2022.05.27.493625v1) scVI-DE
2/ [muscat](http://www.bioconductor.org/packages/release/bioc/html/muscat.html)
3/ [Confronting false discoveries in single-cell differential expression](https://www.nature.com/articles/s41467-021-25960-2) "These observations suggest that, in practice, pseudobulk approaches provide an excellent trade-off between speed and accuracy for single-cell DE analysis." One needs to considder biolgoical replicates, pseduobulk works well.
4/ [Modelling group heteroscedasticity in single-cellRNA-seq pseudo-bulk data](https://www.biorxiv.org/content/10.1101/2022.09.12.507511v1)
5/ [BSDE: barycenter single-cell differential expression for case–control studies](https://academic.oup.com/bioinformatics/article/38/10/2765/6554192?login=false)
6/ [distinct](http://www.bioconductor.org/packages/release/bioc/html/distinct.html) Both are from Mark Robinson group.
7/ [nebula](https://github.com/lhe17/nebula) https://www.biorxiv.org/content/biorxiv/early/2020/09/25/2020.09.24.311662.full.pdf
8/ [Fast identification of differential distributions in single-cell RNA-sequencing data with waddR](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btab226/6207964) https://github.com/goncalves-lab/waddR
9/ [CoCoA-diff: counterfactual inference for single-cell gene expression analysis](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02438-4)
10/ [Bias, robustness and scalability in single-cell differential expression analysis](https://www.nature.com/articles/nmeth.4612) From Mark Robinson group.
11/ [Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2599-6) "We observed that current methods designed for scRNAseq data do not tend to show better performance compared to methods designed for bulk RNAseq data."
12/ [Tree-based Correlation Screen and Visualization for Exploring Phenotype-Cell Type Association in Multiple Sample Single-Cell RNA-Sequencing Experiments](https://www.biorxiv.org/content/10.1101/2021.10.27.466024v1) TreeCorTreat is an open source R package that tackles this problem by using a tree-based correlation screen to analyze and visualize the association between phenotype and transcriptomic features and cell types at multiple cell type resolution levels.
13/ [Quantifying the effect of experimental perturbations in single-cell RNA-sequencing data using graph signal processing](https://www.biorxiv.org/content/10.1101/532846v3) read this thread https://twitter.com/krishnaswamylab/status/1328876444810960896?s=27
14/ [Causal identification of single-cell experimental perturbation effects with CINEMA-OT](https://www.biorxiv.org/content/10.1101/2022.07.31.502173v1)
github https://github.com/vandijklab/CINEMA-OT
15/ [IDEAS: individual level differential expression analysis for single-cell RNA-seq data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02605-1)
Sunday, December 11, 2022
32 resources for (to-be) faculty on salary negotiation, grant writing, funding, and lab management
1/ Tips for negotiating salary and startup for newly-hired tenure-track faculty](https://dynamicecology.wordpress.com/2017/03/01/tips-for-negotiating-salary-and-startup-for-newly-hired-tenure-track-faculty/)
2/ [Creating accessibility in academic negotiations](https://www.sciencedirect.com/science/article/pii/S0968000422002870?dgcid=authord)
3/ [Ten Simple Rules to becoming a principal investigator](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007448)
4/ [applying for a faculty position](http://effortreport.libsyn.com/15-applying-for-a-faculty-position) by Roger Peng.
5/ [A list of publicly available grant proposals in the biological sciences](https://jabberwocky.weecology.org/2012/08/10/a-list-of-publicly-available-grant-proposals-in-the-biological-sciences/)
6/ [open grant](https://www.ogrants.org/) find other people's grants.
7/ [Early Career Funding, Awards, and Other Funding](https://docs.google.com/spreadsheets/d/1H1aj--VUYr7eMFk_T7x0Oh985LqbyyscXg2wAAevDnU/edit#gid=0)
8/ https://ecrcentral.org/resources
9/ [Funding schemes for postdoctoral fellowships](https://asntech.github.io/postdoc-funding-schemes/)
10/ [Postdoctoral Funding Opportunities by Johns Hopkins](https://research.jhu.edu/rdt/funding-opportunities/postdoctoral/)
11/ [Early Career Funding Opportunities by Johns Hopkins](https://research.jhu.edu/rdt/funding-opportunities/early-career/)
12/ [The CommKit](http://mitcommlab.mit.edu/broad/use-the-commkit/) is a collection of guides to successful communication in the biological sciences, written by the BRCL Fellows.
13/ [writing in sciences stanford online course](https://www.coursera.org/learn/sciwrite/)
14 / [Ten simple rules for structuring papers](http://www.biorxiv.org/content/early/2017/05/23/088278)
15/ [NIH grant podcasts](https://grants.nih.gov/news/virtual-learning/podcasts.htm)
16/ [NIC guide](https://www.niaid.nih.gov/grants-contracts/write-research-plan)
17/ [Thoughts on reviewing NIH proposals: What is the difference between a 2.0 and 3.0 in initial score?](http://mistressoftheanimals.scientopia.org/2018/02/10/thoughts-on-reviewing-nih-proposals-what-is-the-difference-between-a-2-0-and-3-0-in-initial-score/) a blog post.
18/ [how to write a K99](https://k99.sbamin.com/) by Samir Amin (my good buddy). Go and check out this treasure.
19/ [seeking the k99](https://timoast.github.io/blog/seeking-the-k99/) a blog post by Tim Stuart.
20/ [AuthorArranger: Conquer journal title pages in seconds](https://authorarranger.nci.nih.gov/#/)
21/ [typeset](https://www.typeset.io/) The quickest way to read and understand scientific literature
22/ [cocites](http://www.cocites.com/)
23/ [connected papers](https://www.connectedpapers.com/)
24/ [ZoteroBib](https://zbib.org/) is a free service that helps you quickly create a bibliography in any citation style.
25/ [How to craft a figure legend for scientific papers](https://blog.bioturing.com/2018/05/10/how-to-craft-a-figure-legend-for-scientific-papers/)
26/ [Ten quick tips for making things findable](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008469)
27/ [Making experimental data tables in the life sciences more FAIR: a pragmatic approach](https://academic.oup.com/gigascience/article/9/12/giaa144/6034785)
28/ protocols: https://www.protocols.io/
29/ [electronic lab notebooks review by Harvard HMS](https://datamanagement.hms.harvard.edu/electronic-lab-notebooks)
30/ [Rspace](https://www.researchspace.com/) Next-gen Elab notebook.
31/ [How to grow a healthy lab](https://www.nature.com/collections/pmlcrkkyyq) Nature collections
32/ [Bench Sci](https://www.benchsci.com/) Run Successful Experiments with the Right Antibody. Let our AI decode the literature to provide antibody usage data that's unbiased and experiment-specific
Wednesday, December 7, 2022
23 tools to work with (single-cell) TCR/BCR-seq immune repertoire data
1/ [immunarch](https://immunarch.com/index.html)
2/ [scRepertoire](https://github.com/ncborcherding/scRepertoire)
3/ [dandelion](https://sc-dandelion.readthedocs.io/en/latest/) python package for analyzing single cell BCR/TCR data from 10x Genomics 5’ solution!
4/ [TRUST4](https://www.nature.com/articles/s41592-021-01142-2) developed in Shirley Liu's group. Use it to extract TCR/BCR information from bulk RNAseq or 5' scRNAseq data.
5/ a dramatic speedup for one of the core computations for adaptive immune receptor repertoire (AIRR) analysis - the discovery and counting of receptors that overlap between repertoires! Check out [CompAIRR](https://github.com/uio-bmi/compairr). With 10^4 repertoires of 10^5 sequences each, CompAIRR ran in 17 minutes while the fastest existing tool took 10 days, amounting to a ~1000x speedup
6/ [ClusTCR](https://svalkiers.github.io/clusTCR/): a Python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity;
7/ [GLIPH2](https://www.nature.com/articles/s41587-020-0505-4)
8/ [GIANA allows computationally-efficient TCR clustering and multi-disease repertoire classification by isometric transformation](https://www.nature.com/articles/s41467-021-25006-7) from Bo Li.
9/ [tcrdist3](https://github.com/kmayerb/tcrdist3) is a python API-enabled toolkit for analyzing T-cell receptor repertoires
10/ [TCRex](https://tcrex.biodatamining.be/): a web tool for the prediction of TCR–epitope recognition
11/ [ImRex](https://github.com/pmoris/ImRex) TCR-epitope recognition prediction using combined sequence input represention for convolutional neural networks.
12/ [NetTCR - 2.0](https://services.healthtech.dtu.dk/service.php?NetTCR-2.0) Sequence-based prediction of peptide-TCR binding
13/ [CellaRepertorium](https://github.com/amcdavid/CellaRepertorium)
14/ [enclone](https://10xgenomics.github.io/enclone/) from 10x. we should give this a try if we want to cluster TCR and BCR clonotypes.
15/ [migec](https://github.com/mikessh/migec):A RepSeq processing swiss-knife.
16/ [MiXCR](https://github.com/milaboratory/mixcr) is a universal software for fast and accurate analysis of T- and B- cell receptor repertoire sequencing data.
17/ [ImReP](https://sergheimangul.wordpress.com/imrep/) is a computational method for rapid and accurate profiling of the adaptive immune repertoire from regular RNA-Seq data.
18/ [TcellMatch](https://github.com/theislab/tcellmatch): Predicting T-cell to epitope specificity. cellMatch is a collection of models to predict antigen specificity of **single T cells** based on CDR3 sequences and other single cell modalities, such as RNA counts and surface protein counts
19/ [scirpy](https://github.com/icbi-lab/scirpy): A scanpy extension for single-cell TCR analysis.
20/ [Tessa](https://github.com/jcao89757/tessa) is a Bayesian model to integrate T cell receptor (TCR) sequence profiling with transcriptomes of T cells. Enabled by the recently developed single cell sequencing techniques, which provide both TCR sequences and RNA sequences of each T cell concurrently, Tessa maps the functional landscape of the TCR repertoire, and generates insights into understanding human immune response to diseases.
21/ [DeepTCR](https://github.com/sidhomj/DeepTCR) Deep Learning Methods for Parsing T-Cell Receptor Sequencing (TCRSeq) Data
https://twitter.com/John_Will_I_Am/status/1570837756787691527
https://www.science.org/doi/10.1126/sciadv.abq5089
22/ [Integrating T cell receptor sequences and transcriptional profiles by clonotype neighbor graph analysis (CoNGA)](https://www.nature.com/articles/s41587-021-00989-2)
23/ [Echidna: Integrated simulations of single-cell immune receptor repertoires and transcriptomes](https://academic.oup.com/bioinformaticsadvances/advance-article/doi/10.1093/bioadv/vbac062/6687122?login=false)
Tuesday, November 29, 2022
7 links to deeply understand heatmap
Making a heatmap is an essential skill for a bioinformatician. Just check how many figures are heatmap or heatmap variants in the genomics or single cell paper.
But you probably do not understand heatmap. 7 reading resources to understand heatmap!
1/ Mapping quantitative data to color https://www.nature.com/articles/nmeth.2134
2/ Heat map from Nature Method column https://www.nature.com/articles/nmeth.1902
3/ A tale of two heatmap functions https://rpubs.com/crazyhottommy/a-tale-of-two-heatmap-functions An old post by me.
4/ Heatmap demystified https://rpubs.com/crazyhottommy/heatmap_demystified yet another post by me
5/ understand color mapping is key https://jokergoo.github.io/ComplexHeatmap-reference/book/a-single-heatmap.html#colors
6/ understand rastering https://jokergoo.github.io/2020/06/30/rasterization-in-complexheatmap/
7/ what happens when you have a huge matrix 20,000 rows/genes x 50 columns to plot? https://gdevailly.netlify.app/post/plotting-big-matrices-in-r/
I learned so much from Zuguang Gu, thanks for his awesome Complexheatmap package https://jokergoo.github.io/ComplexHeatmap-reference/book/index.html . it is my go-to tool for making heatmaps.
Monday, November 28, 2022
6 training resources for data management
* Best Practices for Biomedical Research Data Management https://learn.canvas.net/courses/1854
* Research Data Management Librarian Academy (https://rdmla.github.io/)
* DataONE Data Management Skillbuilding Hub (https://dataoneorg.github.io/Education)
* Data Management Training Clearinghouse (https://dmtclearinghouse.esipfed.org/)
* Research data management open training materials Zenodo Community (https://zenodo.org/communities/dcc-rdm-training-materials)
* Consortium of European Social Science Data Archives (CESSDA) Training Resources (https://www.cessda.eu/Training-Resources)
Bonus:
Learn from TCGA # Collaborative Genomics Projects: A Comprehensive Guide https://www.sciencedirect.com/book/9780128021439/collaborative-genomics-projects-a-comprehensive-guide
Sunday, November 27, 2022
8 R/command line tools to deal with excel, tsv and csv files
R packages:
* [readxl](https://readxl.tidyverse.org/)
* [tidyxl](https://github.com/nacnudus/tidyxl)
* [janitor](https://github.com/sfirke/janitor)
command line tools:
* [VisiData](https://www.visidata.org/) is an interactive multitool for tabular data. It combines the clarity of a spreadsheet, the efficiency of the terminal, and the power of Python, into a lightweight utility which can handle millions of rows with ease.
* [csvkit](https://csvkit.readthedocs.io/en/latest/index.html#)
* [csvtk](https://bioinf.shenwei.me/csvtk/usage/) a cross-platform, efficient and practical CSV/TSV toolkit.
* [Miller](https://miller.readthedocs.io/en/latest/) is a command-line tool for querying, shaping, and reformatting data files in various formats including CSV, TSV, JSON, and JSON Lines.
* [eBay's TSV Utilities](https://opensource.ebay.com/tsv-utils/)
Tuesday, November 15, 2022
8 Resources to study Transcription factor binding, enhancers and histone modification distribution
1. ENCODE https://www.encodeproject.org/
2. The International Human Epigenome Consortium (IHEC) epigenome data portal http://epigenomesportal.ca/ihec/index.html?as=1
3. Blueprint epigenome http://dcc.blueprint-epigenome.eu/#/home
4. EpiFactors http://epifactors.autosome.ru/ is a database for epigenetic factors, corresponding genes and products.
5. CistromeDB http://cistrome.org/db/#/ by Shirley Liu group
6. Remap https://remap2022.univ-amu.fr/ is a large scale integrative analysis of DNA-binding experiments for Homo sapiens, Mus musculus, Drosophila melanogaster and Arabidopsis thaliana transcriptional regulators.
7. ChIP-Atlas http://chip-atlas.org/ An integrative, comprehensive database to explore public Epigenetic dataset, including ChIP-Seq, DNase-Seq, ATAC-Seq, and Bisulfite-Seq data: ChIP-Atlas covers almost all public data archived in Sequence Read Archive of NCBI, EBI, and DDBJ with over 224,000 experiments.
8. Fantom5 https://fantom.gsc.riken.jp/5/
Sunday, November 13, 2022
7 Books for you to learn bioinformatics
1. Data Analysis for the Life Sciences https://leanpub.com/dataanalysisforthelifesciences You can get it for free!
2. practical computing for biologist https://practicalcomputing.org/ My first ever book to start learning computational biology.
3. A Primer for Computational Biology https://open.oregonstate.education/computationalbiology/
4. Computational Genomics with R http://compgenomr.github.io/book/
5. The Biologist’s Guide to Computing https://book.biologistsguide2computing.com/en/stable/
6. Bioinformatics Data Skills https://www.oreilly.com/library/view/bioinformatics-data-skills/9781449367480/ A must read to upgrade your bioinformatics skills once you know the basics.
7. Bioinformatics Workbook: A tutorial to help scientists design their projects and analyze their data. https://bioinformaticsworkbook.org/#gsc.tab=0
Thursday, November 10, 2022
7 FREE Books to learn data science
1. Data science: A first introduction https://datasciencebook.ca/
2. Introduction to Data Science http://rafalab.dfci.harvard.edu/dsbook/
3. Agile Data Science with R https://edwinth.github.io/ADSwR/index.html
4. Tidy Modeling with R https://www.tmwr.org/
5. Feature Engineering and Selection: A Practical Approach for Predictive Models https://bookdown.org/max/FES/
6. Another Book on Data Science https://www.anotherbookondatascience.com/ compare R and python side by side
7. Research Software Engineering with Python https://merely-useful.tech/py-rse/