A wet-dry hybrid biologist's take on genetics and genomics. Mostly is about Linux, R, python, reproducible research, open science and NGS. Grab my book to transform yourself to a computational biologist https://divingintogeneticsandgenomics.ck.page/
I use sshfs to mount remote servers.
but I also want to connecting windows servers to my ubuntu.
If there's one good thing that I can say about Windows XP is that it supports the SMB protocol. This enables a computer
running Windows to share files, folders, and more with another PC. All that other PC needs is the right software to
take advantage of the SMB protocol. Luckily, that software is available for GNU/Linux.
on mac, I can click the Finder bar --->Go---> Connect to Server and then type in the address.
I will show you how to do it on ubuntu.
Install
First, install cifs-utils
sudo apt-get install cifs-utils
I got Hash Sum mismatch errors:
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
keyutils libsmbclient libwbclient0 python-crypto python-ldb python-samba python-tdb samba-common samba-common-bin samba-libs
Suggested packages:
smbclient winbind python-crypto-dbg python-crypto-doc heimdal-clients
The following NEW packages will be installed:
cifs-utils keyutils python-crypto python-ldb python-samba python-tdb samba-common samba-common-bin
The following packages will be upgraded:
libsmbclient libwbclient0 samba-libs
3 upgraded, 8 newly installed, 0 to remove and 353 not upgraded.
Need to get 7,317 kB of archives.
After this operation, 11.5 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 samba-libs amd64 2:4.3.11+dfsg-0ubuntu0.16.04.8 [5,178 kB]
Err:1 http://security.ubuntu.com/ubuntu xenial-security/main amd64 samba-libs amd64 2:4.3.11+dfsg-0ubuntu0.16.04.8
Hash Sum mismatch
After googel around, I
sudo apt-get clean
# now it works
sudo apt-get update
sudo apt-get install cifs-utils
Mount
# make a folder where the remote server will be mounted
sudo mkdir /mnt/genomic_med
sudo mount -t cifs -o username=mtang1 //d1prpccifs/genomic_med /mnt/genomic_med
#You will be promoted to type in the password.
Password for mtang1@//d1prpccifs/genomic_med: ********
On our systems there are actually 2 CPUs with 12 Cores each for a total of 24 ppn (processors per node). We use CPU and Core interchangeably, but we shouldn’t. We do not use hyperthreading on any of our clusters because it breaks the MPI software (message passing interface, used for multi-node processing). You can consider one thread per processor/core. So the most threads you can have is 24. If various parts of your pipeline use multiple threads and they’re running at the same time, you might want to be sure that all of those add up to 24 and no more. The other thing is that there is some relatively new (to us) code out there that calls a multi-threaded R without specifying number of threads, or else it starts up several iterations of itself, such that the scheduler is not aware. This causes lots of issues. I don’t recall if the code you were running previously that used so many resources was one of those or not.
At least 1 cat + 4(-j) + 3 (pipes) = 8 threads will be used.
checking how many cores I have in the computing nodes:
cat /proc/cpuinfo | grep "model name"
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
grep "model name" /proc/cpuinfo | wc -l
24
I reserved 12 cores to run the job. checking the job after submitting:
bjobs -l 220806
## some output
RUNLIMIT
1440.0 min of chms025
MEMLIMIT
32 G
Mon Jul 3 16:06:49: Started 12 Task(s) on Host(s) <chms025><chms025><chms025
><chms025><chms025><chms025><chms025><chms025><chms0
25><chms025><chms025><chms025>, Allocated 12 Slot(s) on
Host(s) <chms025><chms025><chms025><chms025><chms025><chms025><chms025><chms025><chms025><chms025><chms025
><chms025>, Execution Home </rsrch2/genomic_med/krai>, Ex
ecution CWD </rsrch2/genomic_med/krai/scratch/TCGA_CCLE_SK
CM/TCGA_SKCM_FINAL_downsample_RUN/SNV_calling>;
Mon Jul 3 21:15:41: Resource usage collected.
The CPU time used is 2132 seconds.
MEM: 1.1 Gbytes; SWAP: 2.3 Gbytes;**NTHREAD: 17**
PGID: 26713; PIDs: 26713 26719 26722 26729 26734 26783
26784 26786 26788 1301 1302 1303 1304 26785 26787 26789
MEMORY USAGE:
MAX MEM: 1.9 Gbytes; AVG MEM: 1 Gbytes
It says 17 threads are used.
I went to the computing nodes, and checked PIDs related to my job:
ssh chms025
uptime
21:19:39 up 410 days, 9:33, 1 user, load average: **5.94, 5.91, 5.87**
top -u krai -M -n 1 -b | grep krai
32381 krai 20 0 486m 314m 1808 R 100.0 0.1 0:01.37 freebayes
32382 krai 20 0 240m 224m 1808 R 98.4 0.1 0:01.15 freebayes
32360 krai 20 0 195m 179m 1912 R 92.6 0.0 0:02.95 freebayes
32390 krai 20 0 204m 188m 1808 R 54.0 0.0 0:00.28 freebayes
32388 krai 20 0 15568 1648 848 R 1.9 0.0 0:00.02 top
26713 krai 20 0 20388 2684 1460 S 0.0 0.0 0:41.56 res
26719 krai 20 0 103m 1256 1032 S 0.0 0.0 0:00.00 1499116008.2208
26722 krai 20 0 103m 804 556 S 0.0 0.0 0:00.00 1499116008.2208
26729 krai 20 0 258m 22m 4352 S 0.0 0.0 0:02.19 python
26734 krai 20 0 105m 1420 1144 S 0.0 0.0 0:00.00 bash
26783 krai 20 0 103m 1300 1060 S 0.0 0.0 0:00.00 freebayes-paral
26784 krai 20 0 103m 488 244 S 0.0 0.0 0:00.00 freebayes-paral
26785 krai 20 0 115m 4872 1928 S 0.0 0.0 0:05.03 python
26786 krai 20 0 100m 1288 480 S 0.0 0.0 0:00.00 cat
26787 krai 20 0 29152 11m 1344 S 0.0 0.0 1:46.80 vcfstreamsort
26788 krai 20 0 139m 9.9m 2036 S 0.0 0.0 1:11.87 perl
26789 krai 20 0 21156 1580 1308 S 0.0 0.0 1:34.24 vcfuniq
31906 krai 20 0 96072 1768 840 S 0.0 0.0 0:00.00 sshd
31907 krai 20 0 106m 2076 1464 S 0.0 0.0 0:00.07 bash
32389 krai 20 0 100m 836 732 S 0.0 0.0 0:00.00 grep
Indeed, there are 4 freebayes (-j 4 from parallel) are running. 1 cat, 1 vcfstreamsort, 1 vcfuniq, not sure where are the 2 python, 1 grep, 1 perl, 2 bash from. My guess is that some scripts are wrapped shell scripts.