How to replace complementary SNP call to original reference/alternate SNP call?How do I replace NA values with zeros in an R dataframe?How do I remove duplicated SNPs using PLink?How to delete a row by reference in data.table?How to transform an SNP matrix in Tab-delimited format into numbers using python?How to map the segment boundaries to the closest position in reference file in RList of data.frame manipulation and combination generationFunction parameters - replace by referenceLD proxy in R - how to replace missing SNPs?How do I remove duplicated by position SNPs using PLink?how to remove error replacement has length zero in R
Park the computer
What is the maximum amount of diamond in one Minecraft game?
Chilling juice in copper vessel
Why was no first prize awarded at a competition?
Initializing variables in an "if" statement
Do Goblin tokens count as Goblins?
Gory anime with pink haired girl escaping an asylum
Taking my Ph.D. advisor out for dinner after graduation
Will Jimmy fall off his platform?
What's the big deal about the Nazgûl losing their horses?
An elegant aspect for enumerate the equations of a book
What are some bad ways to subvert tropes?
How do I check that users don't write down their passwords?
PhD: When to quit and move on?
Why does this function pointer assignment work when assigned directly but not with the conditional operator?
How predictable is $RANDOM really?
What instances can be solved today by modern solvers (pure LP)?
Why do Martians have to wear space helmets?
My professor has asked me that he will be corresponding author, will it somehow hurt my future chances?
Tiny URL creator
Can you take the Dodge action while prone?
soda water first stored in refrigerator and then outside
Is conquering your neighbors to fight a greater enemy a valid strategy?
Do grungs have a written language?
How to replace complementary SNP call to original reference/alternate SNP call?
How do I replace NA values with zeros in an R dataframe?How do I remove duplicated SNPs using PLink?How to delete a row by reference in data.table?How to transform an SNP matrix in Tab-delimited format into numbers using python?How to map the segment boundaries to the closest position in reference file in RList of data.frame manipulation and combination generationFunction parameters - replace by referenceLD proxy in R - how to replace missing SNPs?How do I remove duplicated by position SNPs using PLink?how to remove error replacement has length zero in R
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I did sequencing on two genotypes and extracted the corresponding SNP chip SNPs to compare our genotypes to publicly available data on the exact same genotypes. I noticed that some SNPs must be of opposite strand polarity and need to be flipped. How can I use R to check if the sample calls match the reference or alternate SNP calls, and if they do not, to replace the sample calls with the complementary nucleotide?
I tried to incorporate ifelse and chartr, but I failed.
Example Data
test <- data.frame("pos" = 1:5, "ref" = c("A", "T", "C", "C","G"), "alt" = c("G","C", "A", "T", "A"), "sample" = c("A", "A", "C", "G", "G"))
View(test)
+---------+------+-------+----------+
| pos | ref | alt | sample1 |
+---------+------+-------+----------+
| 1 | A | G | A |
| 2 | T | C | A |
| 3 | C | A | C |
| 4 | C | T | G |
| 5 | G | A | G |
+---------+------+-------+----------+
Desired Output
+---------+------+-------+----------+
| pos | ref | alt | sample1 |
+---------+------+-------+----------+
| 1 | A | G | A |
| 2 | T | C | T |
| 3 | C | A | C |
| 4 | C | T | C |
| 5 | G | A | G |
+---------+------+-------+----------+
r bioinformatics
add a comment |
I did sequencing on two genotypes and extracted the corresponding SNP chip SNPs to compare our genotypes to publicly available data on the exact same genotypes. I noticed that some SNPs must be of opposite strand polarity and need to be flipped. How can I use R to check if the sample calls match the reference or alternate SNP calls, and if they do not, to replace the sample calls with the complementary nucleotide?
I tried to incorporate ifelse and chartr, but I failed.
Example Data
test <- data.frame("pos" = 1:5, "ref" = c("A", "T", "C", "C","G"), "alt" = c("G","C", "A", "T", "A"), "sample" = c("A", "A", "C", "G", "G"))
View(test)
+---------+------+-------+----------+
| pos | ref | alt | sample1 |
+---------+------+-------+----------+
| 1 | A | G | A |
| 2 | T | C | A |
| 3 | C | A | C |
| 4 | C | T | G |
| 5 | G | A | G |
+---------+------+-------+----------+
Desired Output
+---------+------+-------+----------+
| pos | ref | alt | sample1 |
+---------+------+-------+----------+
| 1 | A | G | A |
| 2 | T | C | T |
| 3 | C | A | C |
| 4 | C | T | C |
| 5 | G | A | G |
+---------+------+-------+----------+
r bioinformatics
add a comment |
I did sequencing on two genotypes and extracted the corresponding SNP chip SNPs to compare our genotypes to publicly available data on the exact same genotypes. I noticed that some SNPs must be of opposite strand polarity and need to be flipped. How can I use R to check if the sample calls match the reference or alternate SNP calls, and if they do not, to replace the sample calls with the complementary nucleotide?
I tried to incorporate ifelse and chartr, but I failed.
Example Data
test <- data.frame("pos" = 1:5, "ref" = c("A", "T", "C", "C","G"), "alt" = c("G","C", "A", "T", "A"), "sample" = c("A", "A", "C", "G", "G"))
View(test)
+---------+------+-------+----------+
| pos | ref | alt | sample1 |
+---------+------+-------+----------+
| 1 | A | G | A |
| 2 | T | C | A |
| 3 | C | A | C |
| 4 | C | T | G |
| 5 | G | A | G |
+---------+------+-------+----------+
Desired Output
+---------+------+-------+----------+
| pos | ref | alt | sample1 |
+---------+------+-------+----------+
| 1 | A | G | A |
| 2 | T | C | T |
| 3 | C | A | C |
| 4 | C | T | C |
| 5 | G | A | G |
+---------+------+-------+----------+
r bioinformatics
I did sequencing on two genotypes and extracted the corresponding SNP chip SNPs to compare our genotypes to publicly available data on the exact same genotypes. I noticed that some SNPs must be of opposite strand polarity and need to be flipped. How can I use R to check if the sample calls match the reference or alternate SNP calls, and if they do not, to replace the sample calls with the complementary nucleotide?
I tried to incorporate ifelse and chartr, but I failed.
Example Data
test <- data.frame("pos" = 1:5, "ref" = c("A", "T", "C", "C","G"), "alt" = c("G","C", "A", "T", "A"), "sample" = c("A", "A", "C", "G", "G"))
View(test)
+---------+------+-------+----------+
| pos | ref | alt | sample1 |
+---------+------+-------+----------+
| 1 | A | G | A |
| 2 | T | C | A |
| 3 | C | A | C |
| 4 | C | T | G |
| 5 | G | A | G |
+---------+------+-------+----------+
Desired Output
+---------+------+-------+----------+
| pos | ref | alt | sample1 |
+---------+------+-------+----------+
| 1 | A | G | A |
| 2 | T | C | T |
| 3 | C | A | C |
| 4 | C | T | C |
| 5 | G | A | G |
+---------+------+-------+----------+
r bioinformatics
r bioinformatics
edited Mar 26 at 13:41
Oka
7782 silver badges9 bronze badges
7782 silver badges9 bronze badges
asked Mar 25 at 20:17
sxdsxd
111 bronze badge
111 bronze badge
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
You can use the library dplyr to manipulate your dataset. With mutate() for instance, you can check if the sample call match the reference.
Then, you can use ifelse() to make some changes.
library(dplyr)
test %>%
mutate(TEST1 = (ref == sample)) %>%
mutate(sample2 = ifelse(TEST1 == TRUE, ref, sample))
# pos ref alt sample TEST1
# 1 1 A G A TRUE
# 2 2 T C A FALSE
# 3 3 C A C TRUE
# 4 4 C T G FALSE
# 5 5 G A G TRUE
Thank you! I am new to the ifelse/%>%.
– sxd
Mar 25 at 21:21
I was able to doi <- sapply(test, is.factor) test[i] <- lapply(test[i], as.character)to make it work with your recommendation. Going to create more rows using chartr to pull the right complementary nucleotides when using ifelse.
– sxd
Mar 25 at 21:27
1
it is not necessary to use the pipe %>%. You can dotest$sample2 = ifelse(test$TEST1 = TRUE, test$ref, test$sample)in order to create a new variable. The syntaxe is more elegant with dplyr.
– demarsylvain
Mar 25 at 21:33
test$TEST1is already logical, no need to compare toTRUE.
– zx8754
Mar 26 at 7:19
you're right. It's just to add more clarity, the code is easier to understand when the condition is fully "visible".
– demarsylvain
Mar 26 at 13:59
add a comment |
Using ifelse and chartr:
ifelse(test$sample == test$ref, test$sample, chartr("ATCG", "TAGC", test$sample))
# [1] "A" "T" "C" "C" "G"
add a comment |
While you asked for solution in R, you still may want to consider other tools to flip strands especially if you need to do it on larger scale. You can do it with Plink and many other tools. Also I probably would seek to confirm the strand alignment in public data from their records/Method description. If only some of the SNPs in public datasets are/seems to be flipped, it is suspicious (one would expect that alignment would be uniform within the data from the same source).
Also, if you use positional arguments, you should confirm the genome build/versions in your vs public data, because different versions can also be a source of confusion. Finally, I would probably also want to make sure that the SNPs in questions are not multiallelic, as it can also be a source of divergence..
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55345770%2fhow-to-replace-complementary-snp-call-to-original-reference-alternate-snp-call%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use the library dplyr to manipulate your dataset. With mutate() for instance, you can check if the sample call match the reference.
Then, you can use ifelse() to make some changes.
library(dplyr)
test %>%
mutate(TEST1 = (ref == sample)) %>%
mutate(sample2 = ifelse(TEST1 == TRUE, ref, sample))
# pos ref alt sample TEST1
# 1 1 A G A TRUE
# 2 2 T C A FALSE
# 3 3 C A C TRUE
# 4 4 C T G FALSE
# 5 5 G A G TRUE
Thank you! I am new to the ifelse/%>%.
– sxd
Mar 25 at 21:21
I was able to doi <- sapply(test, is.factor) test[i] <- lapply(test[i], as.character)to make it work with your recommendation. Going to create more rows using chartr to pull the right complementary nucleotides when using ifelse.
– sxd
Mar 25 at 21:27
1
it is not necessary to use the pipe %>%. You can dotest$sample2 = ifelse(test$TEST1 = TRUE, test$ref, test$sample)in order to create a new variable. The syntaxe is more elegant with dplyr.
– demarsylvain
Mar 25 at 21:33
test$TEST1is already logical, no need to compare toTRUE.
– zx8754
Mar 26 at 7:19
you're right. It's just to add more clarity, the code is easier to understand when the condition is fully "visible".
– demarsylvain
Mar 26 at 13:59
add a comment |
You can use the library dplyr to manipulate your dataset. With mutate() for instance, you can check if the sample call match the reference.
Then, you can use ifelse() to make some changes.
library(dplyr)
test %>%
mutate(TEST1 = (ref == sample)) %>%
mutate(sample2 = ifelse(TEST1 == TRUE, ref, sample))
# pos ref alt sample TEST1
# 1 1 A G A TRUE
# 2 2 T C A FALSE
# 3 3 C A C TRUE
# 4 4 C T G FALSE
# 5 5 G A G TRUE
Thank you! I am new to the ifelse/%>%.
– sxd
Mar 25 at 21:21
I was able to doi <- sapply(test, is.factor) test[i] <- lapply(test[i], as.character)to make it work with your recommendation. Going to create more rows using chartr to pull the right complementary nucleotides when using ifelse.
– sxd
Mar 25 at 21:27
1
it is not necessary to use the pipe %>%. You can dotest$sample2 = ifelse(test$TEST1 = TRUE, test$ref, test$sample)in order to create a new variable. The syntaxe is more elegant with dplyr.
– demarsylvain
Mar 25 at 21:33
test$TEST1is already logical, no need to compare toTRUE.
– zx8754
Mar 26 at 7:19
you're right. It's just to add more clarity, the code is easier to understand when the condition is fully "visible".
– demarsylvain
Mar 26 at 13:59
add a comment |
You can use the library dplyr to manipulate your dataset. With mutate() for instance, you can check if the sample call match the reference.
Then, you can use ifelse() to make some changes.
library(dplyr)
test %>%
mutate(TEST1 = (ref == sample)) %>%
mutate(sample2 = ifelse(TEST1 == TRUE, ref, sample))
# pos ref alt sample TEST1
# 1 1 A G A TRUE
# 2 2 T C A FALSE
# 3 3 C A C TRUE
# 4 4 C T G FALSE
# 5 5 G A G TRUE
You can use the library dplyr to manipulate your dataset. With mutate() for instance, you can check if the sample call match the reference.
Then, you can use ifelse() to make some changes.
library(dplyr)
test %>%
mutate(TEST1 = (ref == sample)) %>%
mutate(sample2 = ifelse(TEST1 == TRUE, ref, sample))
# pos ref alt sample TEST1
# 1 1 A G A TRUE
# 2 2 T C A FALSE
# 3 3 C A C TRUE
# 4 4 C T G FALSE
# 5 5 G A G TRUE
answered Mar 25 at 20:32
demarsylvaindemarsylvain
1,4912 gold badges7 silver badges21 bronze badges
1,4912 gold badges7 silver badges21 bronze badges
Thank you! I am new to the ifelse/%>%.
– sxd
Mar 25 at 21:21
I was able to doi <- sapply(test, is.factor) test[i] <- lapply(test[i], as.character)to make it work with your recommendation. Going to create more rows using chartr to pull the right complementary nucleotides when using ifelse.
– sxd
Mar 25 at 21:27
1
it is not necessary to use the pipe %>%. You can dotest$sample2 = ifelse(test$TEST1 = TRUE, test$ref, test$sample)in order to create a new variable. The syntaxe is more elegant with dplyr.
– demarsylvain
Mar 25 at 21:33
test$TEST1is already logical, no need to compare toTRUE.
– zx8754
Mar 26 at 7:19
you're right. It's just to add more clarity, the code is easier to understand when the condition is fully "visible".
– demarsylvain
Mar 26 at 13:59
add a comment |
Thank you! I am new to the ifelse/%>%.
– sxd
Mar 25 at 21:21
I was able to doi <- sapply(test, is.factor) test[i] <- lapply(test[i], as.character)to make it work with your recommendation. Going to create more rows using chartr to pull the right complementary nucleotides when using ifelse.
– sxd
Mar 25 at 21:27
1
it is not necessary to use the pipe %>%. You can dotest$sample2 = ifelse(test$TEST1 = TRUE, test$ref, test$sample)in order to create a new variable. The syntaxe is more elegant with dplyr.
– demarsylvain
Mar 25 at 21:33
test$TEST1is already logical, no need to compare toTRUE.
– zx8754
Mar 26 at 7:19
you're right. It's just to add more clarity, the code is easier to understand when the condition is fully "visible".
– demarsylvain
Mar 26 at 13:59
Thank you! I am new to the ifelse/%>%.
– sxd
Mar 25 at 21:21
Thank you! I am new to the ifelse/%>%.
– sxd
Mar 25 at 21:21
I was able to do
i <- sapply(test, is.factor) test[i] <- lapply(test[i], as.character) to make it work with your recommendation. Going to create more rows using chartr to pull the right complementary nucleotides when using ifelse.– sxd
Mar 25 at 21:27
I was able to do
i <- sapply(test, is.factor) test[i] <- lapply(test[i], as.character) to make it work with your recommendation. Going to create more rows using chartr to pull the right complementary nucleotides when using ifelse.– sxd
Mar 25 at 21:27
1
1
it is not necessary to use the pipe %>%. You can do
test$sample2 = ifelse(test$TEST1 = TRUE, test$ref, test$sample) in order to create a new variable. The syntaxe is more elegant with dplyr.– demarsylvain
Mar 25 at 21:33
it is not necessary to use the pipe %>%. You can do
test$sample2 = ifelse(test$TEST1 = TRUE, test$ref, test$sample) in order to create a new variable. The syntaxe is more elegant with dplyr.– demarsylvain
Mar 25 at 21:33
test$TEST1 is already logical, no need to compare to TRUE.– zx8754
Mar 26 at 7:19
test$TEST1 is already logical, no need to compare to TRUE.– zx8754
Mar 26 at 7:19
you're right. It's just to add more clarity, the code is easier to understand when the condition is fully "visible".
– demarsylvain
Mar 26 at 13:59
you're right. It's just to add more clarity, the code is easier to understand when the condition is fully "visible".
– demarsylvain
Mar 26 at 13:59
add a comment |
Using ifelse and chartr:
ifelse(test$sample == test$ref, test$sample, chartr("ATCG", "TAGC", test$sample))
# [1] "A" "T" "C" "C" "G"
add a comment |
Using ifelse and chartr:
ifelse(test$sample == test$ref, test$sample, chartr("ATCG", "TAGC", test$sample))
# [1] "A" "T" "C" "C" "G"
add a comment |
Using ifelse and chartr:
ifelse(test$sample == test$ref, test$sample, chartr("ATCG", "TAGC", test$sample))
# [1] "A" "T" "C" "C" "G"
Using ifelse and chartr:
ifelse(test$sample == test$ref, test$sample, chartr("ATCG", "TAGC", test$sample))
# [1] "A" "T" "C" "C" "G"
answered Mar 26 at 7:18
zx8754zx8754
32k7 gold badges70 silver badges109 bronze badges
32k7 gold badges70 silver badges109 bronze badges
add a comment |
add a comment |
While you asked for solution in R, you still may want to consider other tools to flip strands especially if you need to do it on larger scale. You can do it with Plink and many other tools. Also I probably would seek to confirm the strand alignment in public data from their records/Method description. If only some of the SNPs in public datasets are/seems to be flipped, it is suspicious (one would expect that alignment would be uniform within the data from the same source).
Also, if you use positional arguments, you should confirm the genome build/versions in your vs public data, because different versions can also be a source of confusion. Finally, I would probably also want to make sure that the SNPs in questions are not multiallelic, as it can also be a source of divergence..
add a comment |
While you asked for solution in R, you still may want to consider other tools to flip strands especially if you need to do it on larger scale. You can do it with Plink and many other tools. Also I probably would seek to confirm the strand alignment in public data from their records/Method description. If only some of the SNPs in public datasets are/seems to be flipped, it is suspicious (one would expect that alignment would be uniform within the data from the same source).
Also, if you use positional arguments, you should confirm the genome build/versions in your vs public data, because different versions can also be a source of confusion. Finally, I would probably also want to make sure that the SNPs in questions are not multiallelic, as it can also be a source of divergence..
add a comment |
While you asked for solution in R, you still may want to consider other tools to flip strands especially if you need to do it on larger scale. You can do it with Plink and many other tools. Also I probably would seek to confirm the strand alignment in public data from their records/Method description. If only some of the SNPs in public datasets are/seems to be flipped, it is suspicious (one would expect that alignment would be uniform within the data from the same source).
Also, if you use positional arguments, you should confirm the genome build/versions in your vs public data, because different versions can also be a source of confusion. Finally, I would probably also want to make sure that the SNPs in questions are not multiallelic, as it can also be a source of divergence..
While you asked for solution in R, you still may want to consider other tools to flip strands especially if you need to do it on larger scale. You can do it with Plink and many other tools. Also I probably would seek to confirm the strand alignment in public data from their records/Method description. If only some of the SNPs in public datasets are/seems to be flipped, it is suspicious (one would expect that alignment would be uniform within the data from the same source).
Also, if you use positional arguments, you should confirm the genome build/versions in your vs public data, because different versions can also be a source of confusion. Finally, I would probably also want to make sure that the SNPs in questions are not multiallelic, as it can also be a source of divergence..
answered Mar 26 at 13:28
OkaOka
7782 silver badges9 bronze badges
7782 silver badges9 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55345770%2fhow-to-replace-complementary-snp-call-to-original-reference-alternate-snp-call%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown