Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Tuesday, July 21, 2015

I promised a post with heatmaps, and I delivered more with PCA, MDS and clustering

I am back from the 1st summer Institute in statistics for big data held in the university of Washington. I have learned so much!

I wrote up a short tutorial for

PCA, MDS, k-means, Hierarchical clustering and heatmap for microarray data

and put it on Rpubs.
please follow the link

Happy learning!

Tuesday, July 14, 2015

Materials for the 1st Summer Institute in Statistics for Big Data

I am here in the University of Washington for the 1st Summer Institute in Statistics for Big Data. This 2 weeks' summer institute covers from getting big data, visualizing big biomedical data, machine learning to reproducible research.

Check the website here:
All the studying materials are on github:
Free ebook for statistical learning: An introduction to statistical learning

Thursday, July 2, 2015

String manipulation in bash

Update on 04/24/2016. see a post Shell parameter substitution

${parameter:-defaultValue}Get default shell variables value
${parameter:=defaultValue}Set default shell variables value
${parameter:?”Error Message”}Display an error message if parameter is not set
${#var}Find the length of the string
${var%pattern}Remove from shortest rear (end) pattern
${var%%pattern}Remove from longest rear (end) pattern
${var#pattern}Remove from shortest front pattern
${var##pattern}Remove from longest front pattern
${var/pattern/string}Find and replace (only replace first occurrence)
${var//pattern/string}Find and replace all occurrences

String manipulation is very useful in bash scripts; especially when processing a lot of files (in a for loop or xargs) 
with different names, and you want to name your output of each file with part of the input file name. 
command file part_file.result
I am going to demonstrate string manipulation below:
Let's create a variable file and print it out:
In [1]:
echo "$file"
I want to change the suffix from txt to pdf. one of the commonly known ways is to use the
 basename built-in function:
In [2]:
echo "$(basename $file .txt).pdf"
However, there are other built-in ways to get the same task done
references here
find and replace
Replace only first match:
Replace all the matches:
Following syntax replaces with the replacement string, 
only when the pattern matches beginning of the $string. 

Following syntax replaces with the replacement string, 
only when the pattern matches at the end of the given $string.
For more complex replacement, use sed. see my previous blog post here
In [3]:
echo "${file/txt/pdf}"
In [4]:
# a more complex exmaple
echo "${file_1//foo/bar}"
In [5]:
echo "${file_1/foo/bar}"
In [6]:
echo "${file_1/#foo/bar}"
In [7]:
echo "${file_1/%txt/pdf}"
${string%substring} will delete the shortest match of substring from back
${string%%substring} will delete the longest match of substring from back
In [8]:
echo "${file_1%txt*}pdf"
In [9]:
echo "${file_1%%txt*}pdf"
{string#substring} will delete the shortest match of substring from the begining
{string##substring} will delete the longest match of substring from the begining
In [10]:
echo "bar${file_1#foo*}"
In [11]:
echo "bar${file_1##foo*}.pdf"
string slicing
${string:position} Extract substring from $string at $position ${string:position:length}
Extract $length of characters substring from $string starting from $position
In [12]:
echo "${file_1:4}"
In [13]:
echo "${file_1:4:7}"
Finally, the length of the string:
In [14]:
echo "${#file_1}"