If you are using R and most likely you will encounter
stringsAsFactors
when read in files. functions such as read.table
set defaultstringsAsFactors
to TRUE
, which may cause various problems.
If you want to know the history of this argument, you may want to read a post by Roger Peng:
I just had an unexpected experience with
stringsAsFactors
. I will put down my notes below. This is also my first attempt to use RNotebook in Rstudio :)## dummy examples
library(dplyr)
df<- data.frame(chr1=c(1,2,3), start1 = c(10,20,30), end1 = c(30,40,50), chr2=c(1,2,5),
type = c("BND", "BND", "BND"))
df
chr1
<dbl>
|
start1
<dbl>
|
end1
<dbl>
|
chr2
<dbl>
|
type
<fctr>
|
---|---|---|---|---|
1 | 10 | 30 | 1 | BND |
2 | 20 | 40 | 2 | BND |
3 | 30 | 50 | 5 | BND |
Now, I want to creat a new column
type2
. if chr1 is the same as chr2, I set it to foldbackInversion
, if not, keep it the same as type
df %>% mutate(type2 = ifelse(chr1==chr2, "foldbackInversion", type))
chr1
<dbl>
|
start1
<dbl>
|
end1
<dbl>
|
chr2
<dbl>
|
type
<fctr>
|
type2
<chr>
|
---|---|---|---|---|---|
1 | 10 | 30 | 1 | BND | foldbackInversion |
2 | 20 | 40 | 2 | BND | foldbackInversion |
3 | 30 | 50 | 5 | BND | 1 |
Did you just see row3 the type2 becomes 1!!!
This is because type is stroed as factor, and interally R uses intergers to repsent them to save space. If you use
dplyr
’s internal if_else()
function which is stricter in checking the types, you will get errors.df %>% mutate(type2 = if_else(chr1==chr2, "foldbackInversion", type))
Error: `false` has type 'integer' not 'character'
How to fix it? change the factors to characters!!
df$type<- as.character(df$type)
df %>% mutate(type2 = if_else(chr1==chr2, "foldbackInversion", type))
chr1
<dbl>
|
start1
<dbl>
|
end1
<dbl>
|
chr2
<dbl>
|
type
<chr>
|
type2
<chr>
|
---|---|---|---|---|---|
1 | 10 | 30 | 1 | BND | foldbackInversion |
2 | 20 | 40 | 2 | BND | foldbackInversion |
3 | 30 | 50 | 5 | BND | BND |
No comments:
Post a Comment