Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Tuesday, July 21, 2015

I promised a post with heatmaps, and I delivered more with PCA, MDS and clustering

I am back from the 1st summer Institute in statistics for big data held in the university of Washington. I have learned so much!

I wrote up a short tutorial for

PCA, MDS, k-means, Hierarchical clustering and heatmap for microarray data

and put it on Rpubs.
please follow the link https://rpubs.com/crazyhottommy/PCA_MDS

Happy learning!

Tuesday, July 14, 2015

Materials for the 1st Summer Institute in Statistics for Big Data

I am here in the University of Washington for the 1st Summer Institute in Statistics for Big Data. This 2 weeks' summer institute covers from getting big data, visualizing big biomedical data, machine learning to reproducible research.

Check the website here: http://www.biostat.washington.edu/suminst/sisbid/modules
All the studying materials are on github: https://github.com/SISBID
Free ebook for statistical learning: An introduction to statistical learning

Thursday, July 2, 2015

String manipulation in bash

Update on 04/24/2016. see a post Shell parameter substitution

${parameter:-defaultValue}Get default shell variables value
${parameter:=defaultValue}Set default shell variables value
${parameter:?”Error Message”}Display an error message if parameter is not set
${#var}Find the length of the string
${var%pattern}Remove from shortest rear (end) pattern
${var%%pattern}Remove from longest rear (end) pattern
${var:num1:num2}Substring
${var#pattern}Remove from shortest front pattern
${var##pattern}Remove from longest front pattern
${var/pattern/string}Find and replace (only replace first occurrence)
${var//pattern/string}Find and replace all occurrences


String manipulation is very useful in bash scripts; especially when processing a lot of files (in a for loop or xargs) 
with different names, and you want to name your output of each file with part of the input file name. 
command file part_file.result
I am going to demonstrate string manipulation below:
Let's create a variable file and print it out:
In [1]:
file=foo.txt
echo "$file"
foo.txt
I want to change the suffix from txt to pdf. one of the commonly known ways is to use the
 basename built-in function:
In [2]:
echo "$(basename $file .txt).pdf"
foo.pdf
However, there are other built-in ways to get the same task done
references here
find and replace
Replace only first match:
${string/pattern/replacement}
Replace all the matches:
${string//pattern/replacement}
Following syntax replaces with the replacement string, 
only when the pattern matches beginning of the $string. 
${string/#pattern/replacement}

Following syntax replaces with the replacement string, 
only when the pattern matches at the end of the given $string.
${string/%pattern/replacement}
For more complex replacement, use sed. see my previous blog post here
In [3]:
echo "${file/txt/pdf}"
foo.pdf
In [4]:
# a more complex exmaple
file_1=foo.txt.foo.txt
echo "${file_1//foo/bar}"
bar.txt.bar.txt
In [5]:
echo "${file_1/foo/bar}"
bar.txt.foo.txt
In [6]:
echo "${file_1/#foo/bar}"
bar.txt.foo.txt
In [7]:
echo "${file_1/%txt/pdf}"
foo.txt.foo.pdf
${string%substring} will delete the shortest match of substring from back
${string%%substring} will delete the longest match of substring from back
In [8]:
echo "${file_1%txt*}pdf"
foo.txt.foo.pdf
In [9]:
echo "${file_1%%txt*}pdf"
foo.pdf
{string#substring} will delete the shortest match of substring from the begining
{string##substring} will delete the longest match of substring from the begining
In [10]:
echo "bar${file_1#foo*}"
bar.txt.foo.txt
In [11]:
echo "bar${file_1##foo*}.pdf"
bar.pdf
string slicing
${string:position} Extract substring from $string at $position ${string:position:length}
Extract $length of characters substring from $string starting from $position
In [12]:
echo "${file_1:4}"
txt.foo.txt
In [13]:
echo "${file_1:4:7}"
txt.foo
Finally, the length of the string:
In [14]:
echo "${#file_1}"
15