Figuring out why character matching doesn't workreadr (or other packages from tidyverse) with data.frame instead of tibblecustom functions with group_by tidyverseadd row of non-matching column types to existing dataframeConvert class of a group of backticked columns using the tidyverse?join or merge data frames by if value is in one of multiple columnsSetting row names on a tibble is deprecated. Error: invalid 'row.names' lengthNormalize multiple values using values of one factor in RR: dplyr::lag throws error when trying to lag characters in tibblecan dplyr mutate() create a new tibble from an existing tibble?Splitting a data frame character column into an arbitrary number of columns with dynamic column names in R

Can White Castle?

Find the C-factor of a vote

How would modern naval warfare have to have developed differently for battleships to still be relevant in the 21st century?

What is "industrial ethernet"?

Why does Linux list NVMe drives as /dev/nvme0 instead of /dev/sda?

How is hair tissue mineral analysis performed?

Is a single radon-daughter atom in air a solid?

How do I professionally let my manager know I'll quit over smoking in the office?

How long would it take to cross the Channel in 1890's?

Can humans ever directly see a few photons at a time? Can a human see a single photon?

Why tighten down in a criss-cross pattern?

Interaction between Leyline of Anticipation and Teferi, Time Raveler

Impossible darts scores

Array initialization optimization

Dates on degrees don’t make sense – will people care?

Should developer taking test phones home or put in office?

How much will studying magic in an academy cost?

Run specific apex tests during "sfdx force:source:deploy"

When to remove insignificant variables?

Do I have any obligations to my PhD supervisor's requests after I have graduated?

Why did pressing the joystick button spit out keypresses?

Why do all the teams that I have worked with always finish a sprint without completion of all the stories?

How to model a twisted cylinder like this

Does this Wild Magic result affect the sorcerer or just other creatures?



Figuring out why character matching doesn't work


readr (or other packages from tidyverse) with data.frame instead of tibblecustom functions with group_by tidyverseadd row of non-matching column types to existing dataframeConvert class of a group of backticked columns using the tidyverse?join or merge data frames by if value is in one of multiple columnsSetting row names on a tibble is deprecated. Error: invalid 'row.names' lengthNormalize multiple values using values of one factor in RR: dplyr::lag throws error when trying to lag characters in tibblecan dplyr mutate() create a new tibble from an existing tibble?Splitting a data frame character column into an arbitrary number of columns with dynamic column names in R






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















For the project at hand, I am forced to join several data frames by a character column. This is sometimes problematic due to - for example - trailing whitespace but can also be remedied easily. However, in this case the joining does not work and I am unable to figure out what distinguishes the character values in the column used to do the joining.



Since in its original format the problem was not reproducible, here is a link where data in question can be downloaded. It looks like this:



readRDS("path/sourceA") -> sourceA
sourceA
# A tibble: 1 x 2
Name category
<chr> <dbl>
1 Grundschule Kronsberg 1

readRDS("path/sourceB") -> sourceB
sourceB
# A tibble: 1 x 2
Name value
<chr> <dbl>
1 Grundschule Kronsberg 2


I want to join these data frames together using the common id variable Name. As you can see, it appears the value is exactly the same in both frames. However, when I apply any joining procedure, this happens:



library(tidyverse)
joined.df <- full_join(sourceA, sourceB, by = "Name")

joined.df
# A tibble: 2 x 3
Name category value
<chr> <dbl> <dbl>
1 Grundschule Kronsberg 1 NA
2 Grundschule Kronsberg NA 2


In trying to figure this out, I tried to remove the whitespace from the Name column but, using standard procedure, was only able to so for sourceA. For sourceB it looks as if the procedure does not cut out the whitespace in between "Grundschule" and "Kronsberg".



joined.df %>%
mutate(Name_test = stringr::str_replace_all(Name, fixed(" "), ""))

# A tibble: 2 x 4
Name category value Name_test
<chr> <dbl> <dbl> <chr>
1 Grundschule Kronsberg 1 NA GrundschuleKronsberg
2 Grundschule Kronsberg NA 2 Grundschule Kronsberg


Weirdly, when using stringr::str_replace_all(Name, "\pWHITE_SPACE", ""), it works:



joined.df %>%
mutate(Name_test = stringr::str_replace_all(Name, "\pWHITE_SPACE", ""))

# A tibble: 2 x 4
Name category value Name_test
<chr> <dbl> <dbl> <chr>
1 Grundschule Kronsberg 1 NA GrundschuleKronsberg
2 Grundschule Kronsberg NA 2 GrundschuleKronsberg


I don't know anything about how the lookup in "\pWHITE_SPACE"differs from fixed(" ") under the hood, but I thought it might be good clue for someone who does.










share|improve this question



















  • 1





    Can't reproduce. I get just one row.

    – nicola
    Mar 25 at 8:51











  • Same here, solution with full_join() seems fine.

    – heck1
    Mar 25 at 8:52











  • Hm, interesting. Did some updates last week, checking that now

    – tifu
    Mar 25 at 8:56











  • merge(sourceA,sourceB, by = "Name") works as well, you could try that if there are package/update troubles.

    – heck1
    Mar 25 at 9:03











  • I did try that. The problem is not contingent on using dplyrs joining verbs

    – tifu
    Mar 25 at 9:04

















0















For the project at hand, I am forced to join several data frames by a character column. This is sometimes problematic due to - for example - trailing whitespace but can also be remedied easily. However, in this case the joining does not work and I am unable to figure out what distinguishes the character values in the column used to do the joining.



Since in its original format the problem was not reproducible, here is a link where data in question can be downloaded. It looks like this:



readRDS("path/sourceA") -> sourceA
sourceA
# A tibble: 1 x 2
Name category
<chr> <dbl>
1 Grundschule Kronsberg 1

readRDS("path/sourceB") -> sourceB
sourceB
# A tibble: 1 x 2
Name value
<chr> <dbl>
1 Grundschule Kronsberg 2


I want to join these data frames together using the common id variable Name. As you can see, it appears the value is exactly the same in both frames. However, when I apply any joining procedure, this happens:



library(tidyverse)
joined.df <- full_join(sourceA, sourceB, by = "Name")

joined.df
# A tibble: 2 x 3
Name category value
<chr> <dbl> <dbl>
1 Grundschule Kronsberg 1 NA
2 Grundschule Kronsberg NA 2


In trying to figure this out, I tried to remove the whitespace from the Name column but, using standard procedure, was only able to so for sourceA. For sourceB it looks as if the procedure does not cut out the whitespace in between "Grundschule" and "Kronsberg".



joined.df %>%
mutate(Name_test = stringr::str_replace_all(Name, fixed(" "), ""))

# A tibble: 2 x 4
Name category value Name_test
<chr> <dbl> <dbl> <chr>
1 Grundschule Kronsberg 1 NA GrundschuleKronsberg
2 Grundschule Kronsberg NA 2 Grundschule Kronsberg


Weirdly, when using stringr::str_replace_all(Name, "\pWHITE_SPACE", ""), it works:



joined.df %>%
mutate(Name_test = stringr::str_replace_all(Name, "\pWHITE_SPACE", ""))

# A tibble: 2 x 4
Name category value Name_test
<chr> <dbl> <dbl> <chr>
1 Grundschule Kronsberg 1 NA GrundschuleKronsberg
2 Grundschule Kronsberg NA 2 GrundschuleKronsberg


I don't know anything about how the lookup in "\pWHITE_SPACE"differs from fixed(" ") under the hood, but I thought it might be good clue for someone who does.










share|improve this question



















  • 1





    Can't reproduce. I get just one row.

    – nicola
    Mar 25 at 8:51











  • Same here, solution with full_join() seems fine.

    – heck1
    Mar 25 at 8:52











  • Hm, interesting. Did some updates last week, checking that now

    – tifu
    Mar 25 at 8:56











  • merge(sourceA,sourceB, by = "Name") works as well, you could try that if there are package/update troubles.

    – heck1
    Mar 25 at 9:03











  • I did try that. The problem is not contingent on using dplyrs joining verbs

    – tifu
    Mar 25 at 9:04













0












0








0








For the project at hand, I am forced to join several data frames by a character column. This is sometimes problematic due to - for example - trailing whitespace but can also be remedied easily. However, in this case the joining does not work and I am unable to figure out what distinguishes the character values in the column used to do the joining.



Since in its original format the problem was not reproducible, here is a link where data in question can be downloaded. It looks like this:



readRDS("path/sourceA") -> sourceA
sourceA
# A tibble: 1 x 2
Name category
<chr> <dbl>
1 Grundschule Kronsberg 1

readRDS("path/sourceB") -> sourceB
sourceB
# A tibble: 1 x 2
Name value
<chr> <dbl>
1 Grundschule Kronsberg 2


I want to join these data frames together using the common id variable Name. As you can see, it appears the value is exactly the same in both frames. However, when I apply any joining procedure, this happens:



library(tidyverse)
joined.df <- full_join(sourceA, sourceB, by = "Name")

joined.df
# A tibble: 2 x 3
Name category value
<chr> <dbl> <dbl>
1 Grundschule Kronsberg 1 NA
2 Grundschule Kronsberg NA 2


In trying to figure this out, I tried to remove the whitespace from the Name column but, using standard procedure, was only able to so for sourceA. For sourceB it looks as if the procedure does not cut out the whitespace in between "Grundschule" and "Kronsberg".



joined.df %>%
mutate(Name_test = stringr::str_replace_all(Name, fixed(" "), ""))

# A tibble: 2 x 4
Name category value Name_test
<chr> <dbl> <dbl> <chr>
1 Grundschule Kronsberg 1 NA GrundschuleKronsberg
2 Grundschule Kronsberg NA 2 Grundschule Kronsberg


Weirdly, when using stringr::str_replace_all(Name, "\pWHITE_SPACE", ""), it works:



joined.df %>%
mutate(Name_test = stringr::str_replace_all(Name, "\pWHITE_SPACE", ""))

# A tibble: 2 x 4
Name category value Name_test
<chr> <dbl> <dbl> <chr>
1 Grundschule Kronsberg 1 NA GrundschuleKronsberg
2 Grundschule Kronsberg NA 2 GrundschuleKronsberg


I don't know anything about how the lookup in "\pWHITE_SPACE"differs from fixed(" ") under the hood, but I thought it might be good clue for someone who does.










share|improve this question
















For the project at hand, I am forced to join several data frames by a character column. This is sometimes problematic due to - for example - trailing whitespace but can also be remedied easily. However, in this case the joining does not work and I am unable to figure out what distinguishes the character values in the column used to do the joining.



Since in its original format the problem was not reproducible, here is a link where data in question can be downloaded. It looks like this:



readRDS("path/sourceA") -> sourceA
sourceA
# A tibble: 1 x 2
Name category
<chr> <dbl>
1 Grundschule Kronsberg 1

readRDS("path/sourceB") -> sourceB
sourceB
# A tibble: 1 x 2
Name value
<chr> <dbl>
1 Grundschule Kronsberg 2


I want to join these data frames together using the common id variable Name. As you can see, it appears the value is exactly the same in both frames. However, when I apply any joining procedure, this happens:



library(tidyverse)
joined.df <- full_join(sourceA, sourceB, by = "Name")

joined.df
# A tibble: 2 x 3
Name category value
<chr> <dbl> <dbl>
1 Grundschule Kronsberg 1 NA
2 Grundschule Kronsberg NA 2


In trying to figure this out, I tried to remove the whitespace from the Name column but, using standard procedure, was only able to so for sourceA. For sourceB it looks as if the procedure does not cut out the whitespace in between "Grundschule" and "Kronsberg".



joined.df %>%
mutate(Name_test = stringr::str_replace_all(Name, fixed(" "), ""))

# A tibble: 2 x 4
Name category value Name_test
<chr> <dbl> <dbl> <chr>
1 Grundschule Kronsberg 1 NA GrundschuleKronsberg
2 Grundschule Kronsberg NA 2 Grundschule Kronsberg


Weirdly, when using stringr::str_replace_all(Name, "\pWHITE_SPACE", ""), it works:



joined.df %>%
mutate(Name_test = stringr::str_replace_all(Name, "\pWHITE_SPACE", ""))

# A tibble: 2 x 4
Name category value Name_test
<chr> <dbl> <dbl> <chr>
1 Grundschule Kronsberg 1 NA GrundschuleKronsberg
2 Grundschule Kronsberg NA 2 GrundschuleKronsberg


I don't know anything about how the lookup in "\pWHITE_SPACE"differs from fixed(" ") under the hood, but I thought it might be good clue for someone who does.







r merge tidyverse






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 25 at 14:14







tifu

















asked Mar 25 at 8:37









tifutifu

1,093214




1,093214







  • 1





    Can't reproduce. I get just one row.

    – nicola
    Mar 25 at 8:51











  • Same here, solution with full_join() seems fine.

    – heck1
    Mar 25 at 8:52











  • Hm, interesting. Did some updates last week, checking that now

    – tifu
    Mar 25 at 8:56











  • merge(sourceA,sourceB, by = "Name") works as well, you could try that if there are package/update troubles.

    – heck1
    Mar 25 at 9:03











  • I did try that. The problem is not contingent on using dplyrs joining verbs

    – tifu
    Mar 25 at 9:04












  • 1





    Can't reproduce. I get just one row.

    – nicola
    Mar 25 at 8:51











  • Same here, solution with full_join() seems fine.

    – heck1
    Mar 25 at 8:52











  • Hm, interesting. Did some updates last week, checking that now

    – tifu
    Mar 25 at 8:56











  • merge(sourceA,sourceB, by = "Name") works as well, you could try that if there are package/update troubles.

    – heck1
    Mar 25 at 9:03











  • I did try that. The problem is not contingent on using dplyrs joining verbs

    – tifu
    Mar 25 at 9:04







1




1





Can't reproduce. I get just one row.

– nicola
Mar 25 at 8:51





Can't reproduce. I get just one row.

– nicola
Mar 25 at 8:51













Same here, solution with full_join() seems fine.

– heck1
Mar 25 at 8:52





Same here, solution with full_join() seems fine.

– heck1
Mar 25 at 8:52













Hm, interesting. Did some updates last week, checking that now

– tifu
Mar 25 at 8:56





Hm, interesting. Did some updates last week, checking that now

– tifu
Mar 25 at 8:56













merge(sourceA,sourceB, by = "Name") works as well, you could try that if there are package/update troubles.

– heck1
Mar 25 at 9:03





merge(sourceA,sourceB, by = "Name") works as well, you could try that if there are package/update troubles.

– heck1
Mar 25 at 9:03













I did try that. The problem is not contingent on using dplyrs joining verbs

– tifu
Mar 25 at 9:04





I did try that. The problem is not contingent on using dplyrs joining verbs

– tifu
Mar 25 at 9:04












1 Answer
1






active

oldest

votes


















0














After quite a discussion in the comments, I was able to solve the problem. While the Name variables looked identical (and where parsed identically by dput()), there was a subtle difference when converting the characters into their ASCII codes:



library(gtools)

asc(sourceA$Name)
Grundschule Kronsberg
[1,] 71
[2,] 114
[3,] 117
[4,] 110
[5,] 100
[6,] 115
[7,] 99
[8,] 104
[9,] 117
[10,] 108
[11,] 101
[12,] 32
[13,] 75
[14,] 114
[15,] 111
[16,] 110
[17,] 115
[18,] 98
[19,] 101
[20,] 114
[21,] 103

asc(sourceB$Name)
Grundschule Kronsberg
[1,] 71
[2,] 114
[3,] 117
[4,] 110
[5,] 100
[6,] 115
[7,] 99
[8,] 104
[9,] 117
[10,] 108
[11,] 101
[12,] 194
[13,] 160
[14,] 75
[15,] 114
[16,] 111
[17,] 110
[18,] 115
[19,] 98
[20,] 101
[21,] 114
[22,] 103


sourceB has an extra code compared to sourceA and different values in position 12 and 13. Using chr() (also from gtools), I was able to re-convert the ASCII codes into characters:



 chr(asc(sourceA$Name))
[1] "G" "r" "u" "n" "d" "s" "c" "h" "u" "l" "e" " " "K" "r" "o" "n" "s" "b" "e" "r" "g"

chr(asc(sourceB$Name))
[1] "G" "r" "u" "n" "d" "s" "c" "h" "u" "l" "e" "Â" " " "K" "r" "o" "n" "s" "b" "e" "r" "g"


In sourceB, there is an extra  (ASCII decimal code 194) in the string, and the white space is coded with the decimal 160 instead of 32. I still don't know why in conjunction these two ASCII codes were displayed as a regular white space, but was able to solve the issue by simply replacing all white spaces with " "



sourceB <- sourceB %>%
mutate(Name = stringr::str_replace_all(Name, "\pWHITE_SPACE", " "))

full_join(sourceA, sourceB, by = "Name")
# A tibble: 1 x 3
Name category value
<chr> <dbl> <dbl>
1 Grundschule Kronsberg 1 2


This (somehow) changed the ASCII codes so that they now align with one another:



chr(asc(sourceB$Name))
[1] "G" "r" "u" "n" "d" "s" "c" "h" "u" "l" "e" " " "K" "r" "o" "n" "s" "b" "e" "r" "g"





share|improve this answer



























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55333906%2ffiguring-out-why-character-matching-doesnt-work%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    After quite a discussion in the comments, I was able to solve the problem. While the Name variables looked identical (and where parsed identically by dput()), there was a subtle difference when converting the characters into their ASCII codes:



    library(gtools)

    asc(sourceA$Name)
    Grundschule Kronsberg
    [1,] 71
    [2,] 114
    [3,] 117
    [4,] 110
    [5,] 100
    [6,] 115
    [7,] 99
    [8,] 104
    [9,] 117
    [10,] 108
    [11,] 101
    [12,] 32
    [13,] 75
    [14,] 114
    [15,] 111
    [16,] 110
    [17,] 115
    [18,] 98
    [19,] 101
    [20,] 114
    [21,] 103

    asc(sourceB$Name)
    Grundschule Kronsberg
    [1,] 71
    [2,] 114
    [3,] 117
    [4,] 110
    [5,] 100
    [6,] 115
    [7,] 99
    [8,] 104
    [9,] 117
    [10,] 108
    [11,] 101
    [12,] 194
    [13,] 160
    [14,] 75
    [15,] 114
    [16,] 111
    [17,] 110
    [18,] 115
    [19,] 98
    [20,] 101
    [21,] 114
    [22,] 103


    sourceB has an extra code compared to sourceA and different values in position 12 and 13. Using chr() (also from gtools), I was able to re-convert the ASCII codes into characters:



     chr(asc(sourceA$Name))
    [1] "G" "r" "u" "n" "d" "s" "c" "h" "u" "l" "e" " " "K" "r" "o" "n" "s" "b" "e" "r" "g"

    chr(asc(sourceB$Name))
    [1] "G" "r" "u" "n" "d" "s" "c" "h" "u" "l" "e" "Â" " " "K" "r" "o" "n" "s" "b" "e" "r" "g"


    In sourceB, there is an extra  (ASCII decimal code 194) in the string, and the white space is coded with the decimal 160 instead of 32. I still don't know why in conjunction these two ASCII codes were displayed as a regular white space, but was able to solve the issue by simply replacing all white spaces with " "



    sourceB <- sourceB %>%
    mutate(Name = stringr::str_replace_all(Name, "\pWHITE_SPACE", " "))

    full_join(sourceA, sourceB, by = "Name")
    # A tibble: 1 x 3
    Name category value
    <chr> <dbl> <dbl>
    1 Grundschule Kronsberg 1 2


    This (somehow) changed the ASCII codes so that they now align with one another:



    chr(asc(sourceB$Name))
    [1] "G" "r" "u" "n" "d" "s" "c" "h" "u" "l" "e" " " "K" "r" "o" "n" "s" "b" "e" "r" "g"





    share|improve this answer





























      0














      After quite a discussion in the comments, I was able to solve the problem. While the Name variables looked identical (and where parsed identically by dput()), there was a subtle difference when converting the characters into their ASCII codes:



      library(gtools)

      asc(sourceA$Name)
      Grundschule Kronsberg
      [1,] 71
      [2,] 114
      [3,] 117
      [4,] 110
      [5,] 100
      [6,] 115
      [7,] 99
      [8,] 104
      [9,] 117
      [10,] 108
      [11,] 101
      [12,] 32
      [13,] 75
      [14,] 114
      [15,] 111
      [16,] 110
      [17,] 115
      [18,] 98
      [19,] 101
      [20,] 114
      [21,] 103

      asc(sourceB$Name)
      Grundschule Kronsberg
      [1,] 71
      [2,] 114
      [3,] 117
      [4,] 110
      [5,] 100
      [6,] 115
      [7,] 99
      [8,] 104
      [9,] 117
      [10,] 108
      [11,] 101
      [12,] 194
      [13,] 160
      [14,] 75
      [15,] 114
      [16,] 111
      [17,] 110
      [18,] 115
      [19,] 98
      [20,] 101
      [21,] 114
      [22,] 103


      sourceB has an extra code compared to sourceA and different values in position 12 and 13. Using chr() (also from gtools), I was able to re-convert the ASCII codes into characters:



       chr(asc(sourceA$Name))
      [1] "G" "r" "u" "n" "d" "s" "c" "h" "u" "l" "e" " " "K" "r" "o" "n" "s" "b" "e" "r" "g"

      chr(asc(sourceB$Name))
      [1] "G" "r" "u" "n" "d" "s" "c" "h" "u" "l" "e" "Â" " " "K" "r" "o" "n" "s" "b" "e" "r" "g"


      In sourceB, there is an extra  (ASCII decimal code 194) in the string, and the white space is coded with the decimal 160 instead of 32. I still don't know why in conjunction these two ASCII codes were displayed as a regular white space, but was able to solve the issue by simply replacing all white spaces with " "



      sourceB <- sourceB %>%
      mutate(Name = stringr::str_replace_all(Name, "\pWHITE_SPACE", " "))

      full_join(sourceA, sourceB, by = "Name")
      # A tibble: 1 x 3
      Name category value
      <chr> <dbl> <dbl>
      1 Grundschule Kronsberg 1 2


      This (somehow) changed the ASCII codes so that they now align with one another:



      chr(asc(sourceB$Name))
      [1] "G" "r" "u" "n" "d" "s" "c" "h" "u" "l" "e" " " "K" "r" "o" "n" "s" "b" "e" "r" "g"





      share|improve this answer



























        0












        0








        0







        After quite a discussion in the comments, I was able to solve the problem. While the Name variables looked identical (and where parsed identically by dput()), there was a subtle difference when converting the characters into their ASCII codes:



        library(gtools)

        asc(sourceA$Name)
        Grundschule Kronsberg
        [1,] 71
        [2,] 114
        [3,] 117
        [4,] 110
        [5,] 100
        [6,] 115
        [7,] 99
        [8,] 104
        [9,] 117
        [10,] 108
        [11,] 101
        [12,] 32
        [13,] 75
        [14,] 114
        [15,] 111
        [16,] 110
        [17,] 115
        [18,] 98
        [19,] 101
        [20,] 114
        [21,] 103

        asc(sourceB$Name)
        Grundschule Kronsberg
        [1,] 71
        [2,] 114
        [3,] 117
        [4,] 110
        [5,] 100
        [6,] 115
        [7,] 99
        [8,] 104
        [9,] 117
        [10,] 108
        [11,] 101
        [12,] 194
        [13,] 160
        [14,] 75
        [15,] 114
        [16,] 111
        [17,] 110
        [18,] 115
        [19,] 98
        [20,] 101
        [21,] 114
        [22,] 103


        sourceB has an extra code compared to sourceA and different values in position 12 and 13. Using chr() (also from gtools), I was able to re-convert the ASCII codes into characters:



         chr(asc(sourceA$Name))
        [1] "G" "r" "u" "n" "d" "s" "c" "h" "u" "l" "e" " " "K" "r" "o" "n" "s" "b" "e" "r" "g"

        chr(asc(sourceB$Name))
        [1] "G" "r" "u" "n" "d" "s" "c" "h" "u" "l" "e" "Â" " " "K" "r" "o" "n" "s" "b" "e" "r" "g"


        In sourceB, there is an extra  (ASCII decimal code 194) in the string, and the white space is coded with the decimal 160 instead of 32. I still don't know why in conjunction these two ASCII codes were displayed as a regular white space, but was able to solve the issue by simply replacing all white spaces with " "



        sourceB <- sourceB %>%
        mutate(Name = stringr::str_replace_all(Name, "\pWHITE_SPACE", " "))

        full_join(sourceA, sourceB, by = "Name")
        # A tibble: 1 x 3
        Name category value
        <chr> <dbl> <dbl>
        1 Grundschule Kronsberg 1 2


        This (somehow) changed the ASCII codes so that they now align with one another:



        chr(asc(sourceB$Name))
        [1] "G" "r" "u" "n" "d" "s" "c" "h" "u" "l" "e" " " "K" "r" "o" "n" "s" "b" "e" "r" "g"





        share|improve this answer















        After quite a discussion in the comments, I was able to solve the problem. While the Name variables looked identical (and where parsed identically by dput()), there was a subtle difference when converting the characters into their ASCII codes:



        library(gtools)

        asc(sourceA$Name)
        Grundschule Kronsberg
        [1,] 71
        [2,] 114
        [3,] 117
        [4,] 110
        [5,] 100
        [6,] 115
        [7,] 99
        [8,] 104
        [9,] 117
        [10,] 108
        [11,] 101
        [12,] 32
        [13,] 75
        [14,] 114
        [15,] 111
        [16,] 110
        [17,] 115
        [18,] 98
        [19,] 101
        [20,] 114
        [21,] 103

        asc(sourceB$Name)
        Grundschule Kronsberg
        [1,] 71
        [2,] 114
        [3,] 117
        [4,] 110
        [5,] 100
        [6,] 115
        [7,] 99
        [8,] 104
        [9,] 117
        [10,] 108
        [11,] 101
        [12,] 194
        [13,] 160
        [14,] 75
        [15,] 114
        [16,] 111
        [17,] 110
        [18,] 115
        [19,] 98
        [20,] 101
        [21,] 114
        [22,] 103


        sourceB has an extra code compared to sourceA and different values in position 12 and 13. Using chr() (also from gtools), I was able to re-convert the ASCII codes into characters:



         chr(asc(sourceA$Name))
        [1] "G" "r" "u" "n" "d" "s" "c" "h" "u" "l" "e" " " "K" "r" "o" "n" "s" "b" "e" "r" "g"

        chr(asc(sourceB$Name))
        [1] "G" "r" "u" "n" "d" "s" "c" "h" "u" "l" "e" "Â" " " "K" "r" "o" "n" "s" "b" "e" "r" "g"


        In sourceB, there is an extra  (ASCII decimal code 194) in the string, and the white space is coded with the decimal 160 instead of 32. I still don't know why in conjunction these two ASCII codes were displayed as a regular white space, but was able to solve the issue by simply replacing all white spaces with " "



        sourceB <- sourceB %>%
        mutate(Name = stringr::str_replace_all(Name, "\pWHITE_SPACE", " "))

        full_join(sourceA, sourceB, by = "Name")
        # A tibble: 1 x 3
        Name category value
        <chr> <dbl> <dbl>
        1 Grundschule Kronsberg 1 2


        This (somehow) changed the ASCII codes so that they now align with one another:



        chr(asc(sourceB$Name))
        [1] "G" "r" "u" "n" "d" "s" "c" "h" "u" "l" "e" " " "K" "r" "o" "n" "s" "b" "e" "r" "g"






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Mar 26 at 8:13

























        answered Mar 25 at 14:05









        tifutifu

        1,093214




        1,093214





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55333906%2ffiguring-out-why-character-matching-doesnt-work%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

            Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

            Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript