Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Saturday, March 30, 2013

check file and folder size in linux


How to check file and folder size
http://www.cyberciti.biz/faq/how-do-i-find-the-largest-filesdirectories-on-a-linuxunixbsd-filesystem/


How do I find the largest top files and directories on a Linux or Unix like operating systems?

Sometime it is necessary to find out what file(s) or directories are eating up all your disk space. Further, it may be necessary to find out it at the particular location such as /tmp or /var or /home.
There is no simple command available to find out the largest files/directories on a Linux/UNIX/BSD filesystem. However, combination of following three commands (using pipes) you can easily find out list of largest files:
  • du : Estimate file space usage.
  • sort : Sort lines of text files or given input data.
  • head : Output the first part of files i.e. to display first 10 largest file.
Type the following command at the shell prompt to find out top 10 largest file/directories:
# du -a /var | sort -n -r | head -n 10
Output:
1008372 /var
313236  /var/www
253964  /var/log
192544  /var/lib
152628  /var/spool
152508  /var/spool/squid
136524  /var/spool/squid/00
95736   /var/log/mrtg.log
74688   /var/log/squid
62544   /var/cache
If you want more human readable output try:
$ cd /path/to/some/where
$ du -hsx * | sort -rh | head -10

Where,
  • du command -h option : display sizes in human readable format (e.g., 1K, 234M, 2G).
  • du command -s option : show only a total for each argument (summary).
  • du command -x option : skip directories on different file systems.
  • sort command -r option : reverse the result of comparisons.
  • sort command -h option : compare human readable numbers. This is GNU sort specific option only.
  • head command -10 OR -n 10 option : show the first 10 lines.
The above command will only work of GNU/sort is installed. Other Unix like operating system should use the following version (see comments below):
 
for i in G M K; do du -ah | grep [0-9]$i | sort -nr -k 1; done | head -n 11
 
Updated for accuracy!

recursion in python

I just finished watching the MIT open course on python lecture 4
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-00-introduction-to-computer-science-and-programming-fall-2008/video-lectures/lecture-4/

It talks about recursion at the end which I find is really interesting.

Two examples the professor gave are 
1. check a string is a palindrome or not.
2.  Return the Fibonacci number of x.

python code:

def is_palindrome(s):
    if len(s) <= 1;
         return True
    else:
          return s[0]=s[-1] and is_palindrome(s[1:-1])


def fib(x):
    if x==0 or x==1 :
          return 1
    else:
          return fib(x-1) + fib(x-2)

-------------------------------------------------------------------------
There are many algorithms to solve the first problem, but the pythonic way would be:
def is_palindrom_v2(s):
     return s==s[::-1]

#simply reverse the string and check if it is the same as the original one.

update 06/13/13:
it is very expensive for recursion in python in terms of computing time.
watch Lesson 6 here:
https://www.udacity.com/course/viewer#!/c-cs101/l-48756019/m-48532681


   

download data from UCSC database to local drive

I wanted to download UCSC refGene table to my local computer.  http://genome.ucsc.edu/goldenPath/help/mysql.html
However, it looks like it is not possible to direct pull data from UCSC to a local host Database.
on linux command line, mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A
SHOW Databases;
USE hg19;
SHOW tables like "%ref%";


+------------------------+

| Tables_in_hg19 (%ref%) |
+------------------------+
| kgXref                 |
| kgXrefOld5             |
| refFlat                |
| refGene                |
| refLink                |
| refSeqAli              |
| refSeqStatus           |
| refSeqSummary          |
+------------------------+
8 rows in set (1.31 sec)

select * from refGene into outfile "/export.txt";
ERROR 1045 (28000): Access denied for user 'genome'@'%' (using password: NO)

I got this error, and I googled it found that
"The SELECT ... INTO OUTFILE 'file_name' form of SELECT writes the selected 

rows to a file. The file is created on the server host, so you must have 
the FILE privilege to use this syntax.

If you want to create the resulting file on some client host other than 
the server host, you cannot use SELECT ... INTO OUTFILE. In that case, you 
should instead use a command such as

mysql -e "SELECT ..." >  file_name 

to generate the file on the client host." https://lists.soe.ucsc.edu/pipermail/genome/2009-August/019892.html


So, instead on linux command line, I typed :
mysql --user=genome --host=genome-mysql.cse.ucsc.edu hg19 -A -sre 'SELECT * from refGene' > ~/Desktop/refGene.hg19



and now it works!