Creative Commons License
This blog by Tommy Tang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

My github papge

Thursday, October 20, 2016

why not stringsAsFactors: my personal experience

If you are using R and most likely you will encounter stringsAsFactors when read in files. functions such as read.table set defaultstringsAsFactors to TRUE, which may cause various problems.
If you want to know the history of this argument, you may want to read a post by Roger Peng:
I just had an unexpected experience with stringsAsFactors. I will put down my notes below. This is also my first attempt to use RNotebook in Rstudio :)

## dummy examples
library(dplyr)
df<- data.frame(chr1=c(1,2,3), start1 = c(10,20,30), end1 = c(30,40,50), chr2=c(1,2,5), 
                type = c("BND", "BND", "BND"))
df
chr1
<dbl>
start1
<dbl>
end1
<dbl>
chr2
<dbl>
type
<fctr>
110301BND
220402BND
330505BND
Now, I want to creat a new column type2. if chr1 is the same as chr2, I set it to foldbackInversion, if not, keep it the same as type

df %>% mutate(type2 = ifelse(chr1==chr2, "foldbackInversion", type))
chr1
<dbl>
start1
<dbl>
end1
<dbl>
chr2
<dbl>
type
<fctr>
type2
<chr>
110301BNDfoldbackInversion
220402BNDfoldbackInversion
330505BND1
Did you just see row3 the type2 becomes 1!!!
This is because type is stroed as factor, and interally R uses intergers to repsent them to save space. If you use dplyr’s internal if_else()function which is stricter in checking the types, you will get errors.

df %>% mutate(type2 = if_else(chr1==chr2, "foldbackInversion", type))
Error: `false` has type 'integer' not 'character'
How to fix it? change the factors to characters!!

df$type<- as.character(df$type)
df %>% mutate(type2 = if_else(chr1==chr2, "foldbackInversion", type))
chr1
<dbl>
start1
<dbl>
end1
<dbl>
chr2
<dbl>
type
<chr>
type2
<chr>
110301BNDfoldbackInversion
220402BNDfoldbackInversion
330505BNDBND

No comments:

Post a Comment