Diving into Genetics and Genomics: Cutting out 500 columns from a 26G file using command line

Sunday, October 2, 2016

Cutting out 500 columns from a 26G file using command line

I have a 26 G tsv file with several thousand columns. I want to extract 500 columns from it based on the column names in another file.

How should I do it? Reading into R may take forever, although one may recommend using data.table to fread in the data to save some time. However, R is notorious for having to read in the data into memory. 26G is very big and my desktop does not have that power. Handling large data sets in R may give you some alternatives to work with big data in R.

I decided to turn to the all-mighty unix commands.
Since I may use it very often, I made it to a shell script and one can specify the separator to be comma or tab.

Again, Unix commands are awesome!

Diving into Genetics and Genomics

My github papge

Sunday, October 2, 2016

Cutting out 500 columns from a 26G file using command line

No comments:

Post a Comment

Labels

My Blog List