If you are using R and most likely you will encounter
stringsAsFactors when read in files. functions such as read.table set defaultstringsAsFactors to TRUE, which may cause various problems.
If you want to know the history of this argument, you may want to read a post by Roger Peng:
I just had an unexpected experience with
stringsAsFactors. I will put down my notes below. This is also my first attempt to use RNotebook in Rstudio :)## dummy examples
library(dplyr)
df<- data.frame(chr1=c(1,2,3), start1 = c(10,20,30), end1 = c(30,40,50), chr2=c(1,2,5),
type = c("BND", "BND", "BND"))
df
chr1
<dbl>
|
start1
<dbl>
|
end1
<dbl>
|
chr2
<dbl>
|
type
<fctr>
|
|---|---|---|---|---|
| 1 | 10 | 30 | 1 | BND |
| 2 | 20 | 40 | 2 | BND |
| 3 | 30 | 50 | 5 | BND |
Now, I want to creat a new column
type2. if chr1 is the same as chr2, I set it to foldbackInversion, if not, keep it the same as typedf %>% mutate(type2 = ifelse(chr1==chr2, "foldbackInversion", type))
chr1
<dbl>
|
start1
<dbl>
|
end1
<dbl>
|
chr2
<dbl>
|
type
<fctr>
|
type2
<chr>
|
|---|---|---|---|---|---|
| 1 | 10 | 30 | 1 | BND | foldbackInversion |
| 2 | 20 | 40 | 2 | BND | foldbackInversion |
| 3 | 30 | 50 | 5 | BND | 1 |
Did you just see row3 the type2 becomes 1!!!
This is because type is stroed as factor, and interally R uses intergers to repsent them to save space. If you use
dplyr’s internal if_else()function which is stricter in checking the types, you will get errors.df %>% mutate(type2 = if_else(chr1==chr2, "foldbackInversion", type))
Error: `false` has type 'integer' not 'character'
How to fix it? change the factors to characters!!
df$type<- as.character(df$type)
df %>% mutate(type2 = if_else(chr1==chr2, "foldbackInversion", type))
chr1
<dbl>
|
start1
<dbl>
|
end1
<dbl>
|
chr2
<dbl>
|
type
<chr>
|
type2
<chr>
|
|---|---|---|---|---|---|
| 1 | 10 | 30 | 1 | BND | foldbackInversion |
| 2 | 20 | 40 | 2 | BND | foldbackInversion |
| 3 | 30 | 50 | 5 | BND | BND |

No comments:
Post a Comment