Diving into Genetics and Genomics: run gistic2 with sequenza segmentation output from whole exome sequencing

This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Wednesday, November 22, 2017

run gistic2 with sequenza segmentation output from whole exome sequencing

Convert sequenza output to gistic input

Gistic was designed for SNP6 array data. I saw many papers use it for whole exome sequencing data as well.
I have the segment files from sequenza and want to convert them to the gistic input.

Input format for gistic:

segment file:
(1) Sample (sample name)
(2) Chromosome (chromosome number)
(3) Start Position (segment start position, in bases)
(4) End Position (segment end position, in bases)
(5) Num markers (number of markers in segment)
(6) Seg.CN (log2() -1 of copy number)

see a link https://groups.google.com/a/broadinstitute.org/forum/?utm_medium=email&utm_source=footer#!msg/gistic-forum/yYxIe58qLkA/4dXWAPuMEgAJ

The conversion should be log2 (logarithm base 2) - 1, so that copy number 2 is 0.
Every segment start and end in the segments file should appear in the markers file, not the other way around.

when the copy number is 0 (a homozygous deletion of both copies). You can’t do a log2(0)-1, just put a small number e.g. -5

marker file:
https://groups.google.com/a/broadinstitute.org/forum/#!searchin/gistic-forum/marker$20file/gistic-forum/Vq9WWDiy7jU/BSFg2zmBZ1EJ

(1) Marker Name
(2) Chromosome
(3) Marker Position (in bases)

Note gistic2 does not require a marker file anymore.

output of sequenza

sequenza gives a segment file. Segmentation was done by copynumberbioconductor package.

13 columns of the *segments.txt file

"chromosome" "start.pos" "end.pos" "Bf" "N.BAF" "sd.BAF" "depth.ratio" "N.ratio" "sd.ratio" "CNt" "A" "B" "LPP"

We only need the chromosome, start.pos, end.pos, N.BAF and depth.ratio columns.

The depth.ratio column is the GC content normalized ratio. a depth ratio of 1 means it has copy number of 2 (the same as the normal blood control in my case).

UPDATED: 12/17/2017. see a comment below. it is not log2(2^ depth.ratio) -1 rather:

To convert to gistic input, I have to do log2(2 * depth.ratio) - 1

UPDATED 01/03/2018
I have a bunch of sgement files in the same folder.
add the sample name in the final column and do the log2 math in R.

library(tidyverse)
library(readr)
seg_files<- list.files(".", pattern = "*segments.txt", full.names = F) 

seg_dat_list <- lapply(seg_files, function(f) {
        dat<- read_tsv(f, col_names = T, col_types = cols(.default = col_character()))
        sample<- gsub("_vs_.*segments.txt", "", f)
        dat$sample<- sample
        return(dat)
})

seg_dat <- do.call(rbind, seg_dat_list)

gistic_input<- seg_dat %>% select(sample, chromosome, start.pos, end.pos, N.BAF, depth.ratio) %>% mutate(depth.ratio = as.numeric(depth.ratio)) %>% mutate(depth.ratio = log2(2 * depth.ratio) -1)

write_tsv(gistic_input, "all_segments.txt")

Back to bash:


## marker file:

cat all_segments.txt | sed '1d' | cut -f2,3 > markers.txt
cat all_segments.txt | sed '1d' | cut -f2,4 >> markers.txt

## sort the files by chromosome, take the unique ones and number the markers.

cat markers.txt | sort -V -k1,1 -k2,2nr | uniq | nl > markers_gistic.txt

Run gistic

modify the gistic2 script a bit. e.g. change MCR_ROOT folder path

#!/bin/sh
## set MCR environment and launch GISTIC executable

## NOTE: change the line below if you have installed the Matlab MCR in an alternative location
MCR_ROOT=/scratch/genomic_med/apps/Matlab_Complier_runTime
MCR_VER=v83

echo Setting Matlab MCR root to $MCR_ROOT

## set up environment variables
LD_LIBRARY_PATH=$MCR_ROOT/$MCR_VER/runtime/glnxa64:$LD_LIBRARY_PATH
LD_LIBRARY_PATH=$MCR_ROOT/$MCR_VER/bin/glnxa64:$LD_LIBRARY_PATH
LD_LIBRARY_PATH=$MCR_ROOT/$MCR_VER/sys/os/glnxa64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH
XAPPLRESDIR=$MCR_ROOT/$MCR_VER/MATLAB_Component_Runtime/v83/X11/app-defaults
export XAPPLRESDIR

## launch GISTIC executable
./gp_gistic2_from_seg $@

I removed ./ from the last line since I have put all executables in my path.

mkdir gistic_out
gistic2 -b gistic_out -seg all_segments.txt -refgene /scratch/genomic_med/apps/gistic/refgenefiles/hg19.mat -mk markers_gistic.txt  -genegistic 1 -smallmem 1 -broad 1 -brlen 0.5 -conf 0.90 -armpeel 1 -savegene 1 -gcm extreme

25 comments:

UnknownDecember 6, 2017 at 11:42 AM
Hi Tommy,

Are you sure about log2(2^depth.ratio) - 1 = depth.ratio -1 ?
I would take log2(2*depth.ratio) - 1 instead, because depth.ratio is not log-transformed in sequenza output.
ReplyDelete
Replies
UnknownFebruary 7, 2018 at 8:56 AM
Super nice, thank you!
I am running GISTIC standalone 2.02.3.

I got this error
Shortened 12 segments in '/vol5/home/wangdo/mybin/GISTIC/myfiles/segmentationfile.txt' that overlap by one marker.

I wanna know which lines are causing this error. Do you know how can we get detail error message such as GISTIC on genepattern?
ReplyDelete
Replies
UnknownMay 7, 2018 at 6:39 PM
I was able to successfully run GISTIC 2.0 using Sequenza outputs following your guidelines above. When I look at the significantly recurrent CNV regions identified by GISTIC on IGV, however, I cannot seem to agree with the GISTIC results by visual inspection. According to the Sequenza vignette, the field 'CNt' denotes the estimated copy number of tumor. Could you please describe the difference between depth.ratio and CNt? When I calculate tumor copy number using depth.ratio based on your formula above (log2(2 * depth.ratio) - 1) it does not match the corresponding CNt value in each row in the Sequenza output.
ReplyDelete
Replies
Dinesh DewasiJune 4, 2020 at 8:25 AM
Hi Dear,
Thanks for sharing such a useful blog. Really! This Blog is very informative for us which contains a lot of information about the Counselling For Men. I like this post. Please visit at "Bioinformatics genomics",We help you find information about Bioinformatics, Proteomics Genomics and Bioinformatics on Data-incites.com. We bring world class talent and deep experience to your projects as evidenced by our extensive list of achievements.

Visit Here - <a href='https://data-incites.com/current-projects>https://data-incites.com/current-projects </a>Thanks
ReplyDelete
Replies
AmarinderJune 10, 2020 at 12:56 AM
Hi All, Thanks for tutorial.

Do we need to depth.ratio by ploidy? It is mentioned in following paper with similar steps they said "To avoid bias from different sample ploidy, depth.ratios were further divided by the ploidy of each sample to correct for ploidy." https://www.nature.com/articles/s41588-019-0569-6#Sec12
ReplyDelete
Replies
Dinesh DewasiJuly 7, 2020 at 12:11 AM
Hi Dear,

I Like Your Blog Very Much. I see Daily Your Blog, is A Very Useful For me.

You can also Find Data incites Data Incites is a Seattle based consulting group providing world class data science expertise and data science for the life sciences. Our Services are included bioinformatics, biostatistics, machine learning, clinical research, data analysis, data visualization

Visit Now:- https://data-incites.com/
ReplyDelete
Replies
UnknownSeptember 18, 2020 at 1:45 AM
Hi,

Thanks for sharing your blog. I have a question. If I use "log2(CNt/ploidy)" as GISTIC input, then the output of GISTIC is (CNt-ploidy)/ploidy. Suppose the sample ploidy=4, if the value of chr1 is -1, it meaning that the CN of chr1 is 2. Is this calculation way reasonable?
ReplyDelete
Replies
sattakingDecember 6, 2021 at 11:28 AM
How to Play the Satta King Game
The Satta King Game is a lottery game in which players bet on collections of numbers ranging from 0 to 99. This game has a lot of similarities with lottery games. It is a popular type of lottery that can be played both online and offline. The Satta King is a kind of betting lottery. The players choose a number from the hundred available and bet on it. If they win, they are awarded a sum of 90 multiple rupees. This game is popular throughout India.
To start playing this game, you need to first log in to the satta matka King website. Click on the play now button and then create a username and password. Remember to choose a username that is not public and is not your family member's name. Once you've created a username and password, you can choose a game and click on the register button. This way, you'll be able to log in and start playing right away.
After logging into the Satta King website, you can begin playing the matka result King game. To register, click the play now button. To create your account, enter your username and password. Then, select the game you want to play and click the register button. This will open your profile for other players to see. satta result You should always be careful when choosing your user name and password. This is important because you don't want your friends and family to know who you are and what you're doing.
The Satta King Game has two main elements - the Satta kalyan matka result and the Satta king record chart. You can use the record chart to predict the Satta result that's about to open. There is also a fixed timing for each Satta kin game. Once you've played your Satta game, you'll be able to see the Satta khana and record chart in advance.
After you've chosen your numbers, you'll need to enter them. In the satka matka King game, players choose one number from one hundred. If they're right, they'll receive ninety multiple rupees. If they don't win, they'll be given eighty times the original amount. As a result, this game is a form of lottery. While the Satta King game may seem to be illegal in some countries, it is legal in India.
When you play the Satta King game online, you'll be satta matka result for winning. However, you'll be required to invest at least 30% of your money to win. This is an essential aspect of this game because you don't want to lose your money. You need to be able to quit when satka matka you're near the maximum amount of money. The more you play, the higher your chance of winning. The Satta King Game is a fun way to spend your free time, and it is a fun way to pass the time.
The Satta King Game is a great way to keep your satta result active. You can spend hours playing the game with friends and family, and the Satta King game is a great way to do so. The online version of the game is free to play and there are many ways to play. The only real disadvantage is that you have to be patient and have a lot of patience. If you want to win, you should invest some time to learn the basics of the game.
You can play the

ReplyDelete
Replies
asdasdApril 26, 2022 at 1:08 AM
เว็บ pg สล็อต คือเกมสล็อตออนไลน์ที่เป็นไข้ใหญ่ค่ายหนึ่งที่ได้รับความนิยมอย่างมากในหมู่นักพนันการเล่นพนันและก็นักเล่นเกมสล็อตโดยเฉพาะเกมประเภทสล็อตเท่านั้นที่จะเป็นเกมที่ได้รับความนิยมสูงสุดที่ PG SLOT
ReplyDelete
Replies
wikiJune 13, 2022 at 10:11 PM
This comment has been removed by the author.
ReplyDelete
Replies
สล็อตเว็บตรงOctober 15, 2022 at 9:51 PM
สล็อตเว็บตรง เป็นสล็อตออนไลน์อันดับหนึ่งที่มีคนให้ความสนใจและให้ความชื่นชมกับตัวเกมส์ PG slot กันเป็นอย่างมากและเป็นเกมส์สล็อตออนไลน์ที่มีระบบพัฒนาดีอย่างต่อเนื่องและเป็น PG slotที่ดึงดูดคนมาเล่นและมาเป็นครอบครัวเดียวกับ PG slot
ReplyDelete
Replies
YunDecember 25, 2022 at 11:35 PM
why not use N.ratio but N.BAF?
ReplyDelete
Replies

Add comment

Diving into Genetics and Genomics

My github papge