Blob.decode with replacement does not seem to workHow to get UTF-8 working in Java webapps?How do I properly work with unicode characters in python to keep from getting errors?Perl: utf8::decode vs. Encode::decodeStrange Base64 encode/decode problemWorking with UTF-8 encoding in Python sourcePython decoding works for me but not othersAndroid Replace “…” with ellipsis characterJava encoding/decoding a String to/from a longLatin1 character values not displaying the same as in utf8Data.table, logical comparison and encoding bugs/errors in non-English environment

Is the purpose of sheet music to be played along to? Or a guide for learning and reference during playing?

Why does "git status" show I'm on the master branch and "git branch" does not in a newly created repository?

When does Fisher's "go get more data" approach make sense?

Wordplay addition paradox

Jump back to the position I started a search

Is this Android phone Android 9.0 or Android 6.0?

Create Array from list of indices/values

Operation Unz̖̬̜̺̬a͇͖̯͔͉l̟̭g͕̝̼͇͓̪͍o̬̝͍̹̻

What "fuel more powerful than anything the West (had) in stock" put Laika in orbit aboard Sputnik 2?

Why did Steve Rogers choose this character in Endgame?

Strategy to pay off revolving debt while building reserve savings fund?

When designing an adventure, how can I ensure a continuous player experience in a setting that's likely to favor TPKs?

How was Peter Parker able to use EDITH in the end?

Will this tire fail its MOT?

Wordplay subtraction paradox

Is the Münchhausen trilemma really a trilemma?

How to remove the first colon ':' from a timestamp?

A scene of Jimmy diversity

Is there an English equivalent for "Les carottes sont cuites", while keeping the vegetable reference?

Was Apollo 13 radio blackout on reentry longer than expected?

Did 007 exist before James Bond?

How can electric field be defined as force per charge, if the charge makes its own, singular electric field?

What happens if a company buys back all of its shares?

how slow a car engine can run

Blob.decode with replacement does not seem to work

How to get UTF-8 working in Java webapps?How do I properly work with unicode characters in python to keep from getting errors?Perl: utf8::decode vs. Encode::decodeStrange Base64 encode/decode problemWorking with UTF-8 encoding in Python sourcePython decoding works for me but not othersAndroid Replace “…” with ellipsis characterJava encoding/decoding a String to/from a longLatin1 character values not displaying the same as in utf8Data.table, logical comparison and encoding bugs/errors in non-English environment

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

This code:

my $þor-blob = Blob.new("þor".ords);
$þor-blob.decode( "ascii", :replacement("0"), :strict(False) ).say

Fails with:

Will not decode invalid ASCII (code point > 127 found)␤

And this one:

my $euro = Blob.new("3€".ords);
$euro.decode( "latin1", :replacement("euro") ).say

Simply does not seem to work, replacing € by ¬.

It's true that those methods are not tested, but is the syntax right?

asked Mar 26 at 8:57

jjmerelo

8,2374 gold badges20 silver badges54 bronze badges

1

This question had a bounty worth +100 reputation from me. I was looking for an answer drawing from credible and/or official sources, hoping to get an answer from a core dev like samcv, or from someone else providing a link to core dev discussion (irc or an issue etc.) about it that either corrects or adds value to my current answer by injecting an authoritative response about what currently works and what should work in relation to :replacement and :strict for the various encodings. It looks like the original points were wasted but I'll happily redo the award if someone does as I hoped.

– raiph
Apr 15 at 23:20

add a comment |

This code:

my $þor-blob = Blob.new("þor".ords);
$þor-blob.decode( "ascii", :replacement("0"), :strict(False) ).say

Fails with:

Will not decode invalid ASCII (code point > 127 found)␤

And this one:

my $euro = Blob.new("3€".ords);
$euro.decode( "latin1", :replacement("euro") ).say

Simply does not seem to work, replacing € by ¬.

It's true that those methods are not tested, but is the syntax right?

asked Mar 26 at 8:57

jjmerelo

8,2374 gold badges20 silver badges54 bronze badges

1

This question had a bounty worth +100 reputation from me. I was looking for an answer drawing from credible and/or official sources, hoping to get an answer from a core dev like samcv, or from someone else providing a link to core dev discussion (irc or an issue etc.) about it that either corrects or adds value to my current answer by injecting an authoritative response about what currently works and what should work in relation to :replacement and :strict for the various encodings. It looks like the original points were wasted but I'll happily redo the award if someone does as I hoped.

– raiph
Apr 15 at 23:20

add a comment |

This code:

my $þor-blob = Blob.new("þor".ords);
$þor-blob.decode( "ascii", :replacement("0"), :strict(False) ).say

Fails with:

Will not decode invalid ASCII (code point > 127 found)␤

And this one:

my $euro = Blob.new("3€".ords);
$euro.decode( "latin1", :replacement("euro") ).say

Simply does not seem to work, replacing € by ¬.

It's true that those methods are not tested, but is the syntax right?

asked Mar 26 at 8:57

jjmerelo

8,2374 gold badges20 silver badges54 bronze badges

This code:

my $þor-blob = Blob.new("þor".ords);
$þor-blob.decode( "ascii", :replacement("0"), :strict(False) ).say

Fails with:

Will not decode invalid ASCII (code point > 127 found)␤

And this one:

my $euro = Blob.new("3€".ords);
$euro.decode( "latin1", :replacement("euro") ).say

Simply does not seem to work, replacing € by ¬.

It's true that those methods are not tested, but is the syntax right?

encoding perl6

asked Mar 26 at 8:57

jjmerelo

8,2374 gold badges20 silver badges54 bronze badges

asked Mar 26 at 8:57

jjmerelo

8,2374 gold badges20 silver badges54 bronze badges

asked Mar 26 at 8:57

jjmerelo

8,2374 gold badges20 silver badges54 bronze badges

asked Mar 26 at 8:57

jjmerelo

8,2374 gold badges20 silver badges54 bronze badges

asked Mar 26 at 8:57

jjmerelo

8,2374 gold badges20 silver badges54 bronze badges

1

This question had a bounty worth +100 reputation from me. I was looking for an answer drawing from credible and/or official sources, hoping to get an answer from a core dev like samcv, or from someone else providing a link to core dev discussion (irc or an issue etc.) about it that either corrects or adds value to my current answer by injecting an authoritative response about what currently works and what should work in relation to :replacement and :strict for the various encodings. It looks like the original points were wasted but I'll happily redo the award if someone does as I hoped.

– raiph
Apr 15 at 23:20

add a comment |

1

This question had a bounty worth +100 reputation from me. I was looking for an answer drawing from credible and/or official sources, hoping to get an answer from a core dev like samcv, or from someone else providing a link to core dev discussion (irc or an issue etc.) about it that either corrects or adds value to my current answer by injecting an authoritative response about what currently works and what should work in relation to :replacement and :strict for the various encodings. It looks like the original points were wasted but I'll happily redo the award if someone does as I hoped.

– raiph
Apr 15 at 23:20

This question had a bounty worth +100 reputation from me. I was looking for an answer drawing from credible and/or official sources, hoping to get an answer from a core dev like samcv, or from someone else providing a link to core dev discussion (irc or an issue etc.) about it that either corrects or adds value to my current answer by injecting an authoritative response about what currently works and what should work in relation to :replacement and :strict for the various encodings. It looks like the original points were wasted but I'll happily redo the award if someone does as I hoped.

– raiph
Apr 15 at 23:20

add a comment |

1 Answer
1

active

oldest

votes

TL;DR:

Only samcv or some other core dev can provide an authoritative answer. This is my understanding of the code, comments, and results I see.

If my understanding is correct, some doc and/or code needs to be sorted out to render this SO moot.¹

Specifying the $replacement argument matches a different P6 core multi method than not doing so. Let's call it the "replacer" code path.

The "replacer" code path passes the $replacement and $strict arguments onto a code path in nqp that in turn passes them onto a code path in the backend that handles replacements.

On the MoarVM backend, the replacement and strict arguments are passed onto the decoders for the windows1252, windows1251, and shiftjis encodings but not for other encodings.²

Following the relevant code path

Your code calls this code in Buf.pm6:

multi method decode(Blob:D: $encoding,
 Str :$replacement!,
 Bool:D :$strict = False) 
 nqp::p6box_s(
 nqp::decoderepconf(
 self,
 Rakudo::Internals.NORMALIZE_ENCODING($encoding),
 $replacement.defined ?? $replacement !! nqp::null_s(),
 $strict ?? 0 !! 1))

The nqp::decoderepconf function directly maps to a corresponding function in the backend.

On the MoarVM backend, it's MVM_string_decode_from_buf_config in ops.c.

This in turn calls MVM_string_decode_config in the same file.

From this latter function's comments, there are a couple key sentences that presumably explain the relevance of the replacement and strictness arguments:

Unlike MVM_string_decode, it will not pass through codepoints which have no official mapping.

For now windows-1252 and windows-1251 are the only ones this makes a difference on.

Spelunking the code and commits in the repo suggests the latter comment is slightly out-of-date because it looks like it should make a difference on shiftjis too.

Also, to be clear, if one specifies the $replacement argument in P6 then the $strict argument is going to end up being ignored (and $strict = True assumed) if decoding any encoding other than the windows or shiftjis encodings.²

What happens with ascii and latin1 in particular

The current code for MVM_string_decode_config does not pass on the replacement/strictness arguments to the MVM_string_ascii_decode and MVM_string_latin1_decode functions.

So, if you use the encoding "ascii" then the blob must only contain values between 0 and 127, and for "latin1" the values must be between 0 and 255.

say "þor".ords; # (254 111 114)
say "3€".ords; # (51 8364)

The first string (as a Buf) fails to decode, and instead produces an error message, because 254 is more than 127 and the ascii decoder code in MoarVM reacts to an invalid value by throwing an exception with the "invalid ASCII" message.

The second replaces € with ¬. This is because by default a Buf is an 8 bit array, so a value above 255 gets truncated to its low byte, which for € is the same as ¬ (in both latin1 and Unicode).³

But it's no better if you use a Buf with a larger element size. The result is still a ¬, combined with tofu. I can see even if I can't C so it's clear to me that the MVM_string_latin1_decode function in MoarVM that decodes latin1 does not throw exceptions. So presumably when it encounters character values outside the range 0-255 it turns the higher bytes into tofu.

Footnotes

¹ Of course the very thing JJ is doing that led them to post this SO in the first place is fixing the doc. I added this footnote so that other later readers would understand that context and realize that this SO is leading to changes in the doc, and may lead to changes in the code, that will presumably render this SO moot due to the work done.

² It would be nice if there were multis that rejected use of the $replacement argument if the decoder for the specified encoding doesn't do anything with it.

³ See timotimo++'s comment below.

edited Mar 31 at 12:01

answered Mar 26 at 9:08

raiph

14.6k3 gold badges27 silver badges52 bronze badges

it's not that it does not use them. It's that I don't know how they are used, so I couldn't write the documents that explains what they do. And I know those characters are beyond the range of the representation. That's what the replacement is supposed to be for: to replace those characters (or that's what I though, and how the underlying NQP code works)

– jjmerelo
Mar 26 at 10:42

Thankis again for the answer, but that's not the code that's used. It's a Blob, which has got its own code. It's quite similar, thoough... Again, what I gather from that code is that it should replace whatever code point that can't be passed through. The tests point in that direction, also.

– jjmerelo
Mar 27 at 5:18

github.com/rakudo/rakudo/blob/master/src/core/Buf.pm6#L297-L309

– jjmerelo
Mar 27 at 9:14

2

the more precise answer to the latin1 part of the question is that Blob.new is by default Blob[uint8].new, which will truncate the values passed to 8 bit. That's why you get a ¬, as that's what is encoded by 0xac

– timotimo
Mar 27 at 17:31

2

@jjmerelo "$replacement makes no difference." Based on my read of the MoarVM code and comments, it works for the two windows encodings and the shiftjis encoding but does nothing for other encodings such as ascii and latin1. I've edited the question to make my answer as clear as I think I can make it.

– raiph
Mar 31 at 13:18

|
show 2 more comments

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55353143%2fblob-decode-with-replacement-does-not-seem-to-work%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

TL;DR:

Only samcv or some other core dev can provide an authoritative answer. This is my understanding of the code, comments, and results I see.

If my understanding is correct, some doc and/or code needs to be sorted out to render this SO moot.¹

Specifying the $replacement argument matches a different P6 core multi method than not doing so. Let's call it the "replacer" code path.

The "replacer" code path passes the $replacement and $strict arguments onto a code path in nqp that in turn passes them onto a code path in the backend that handles replacements.

On the MoarVM backend, the replacement and strict arguments are passed onto the decoders for the windows1252, windows1251, and shiftjis encodings but not for other encodings.²

Following the relevant code path

Your code calls this code in Buf.pm6:

multi method decode(Blob:D: $encoding,
 Str :$replacement!,
 Bool:D :$strict = False) 
 nqp::p6box_s(
 nqp::decoderepconf(
 self,
 Rakudo::Internals.NORMALIZE_ENCODING($encoding),
 $replacement.defined ?? $replacement !! nqp::null_s(),
 $strict ?? 0 !! 1))

The nqp::decoderepconf function directly maps to a corresponding function in the backend.

On the MoarVM backend, it's MVM_string_decode_from_buf_config in ops.c.

This in turn calls MVM_string_decode_config in the same file.

From this latter function's comments, there are a couple key sentences that presumably explain the relevance of the replacement and strictness arguments:

Unlike MVM_string_decode, it will not pass through codepoints which have no official mapping.

For now windows-1252 and windows-1251 are the only ones this makes a difference on.

Spelunking the code and commits in the repo suggests the latter comment is slightly out-of-date because it looks like it should make a difference on shiftjis too.

What happens with ascii and latin1 in particular

The current code for MVM_string_decode_config does not pass on the replacement/strictness arguments to the MVM_string_ascii_decode and MVM_string_latin1_decode functions.

So, if you use the encoding "ascii" then the blob must only contain values between 0 and 127, and for "latin1" the values must be between 0 and 255.

say "þor".ords; # (254 111 114)
say "3€".ords; # (51 8364)

Footnotes

² It would be nice if there were multis that rejected use of the $replacement argument if the decoder for the specified encoding doesn't do anything with it.

³ See timotimo++'s comment below.

edited Mar 31 at 12:01

answered Mar 26 at 9:08

raiph

14.6k3 gold badges27 silver badges52 bronze badges

it's not that it does not use them. It's that I don't know how they are used, so I couldn't write the documents that explains what they do. And I know those characters are beyond the range of the representation. That's what the replacement is supposed to be for: to replace those characters (or that's what I though, and how the underlying NQP code works)

– jjmerelo
Mar 26 at 10:42

Thankis again for the answer, but that's not the code that's used. It's a Blob, which has got its own code. It's quite similar, thoough... Again, what I gather from that code is that it should replace whatever code point that can't be passed through. The tests point in that direction, also.

– jjmerelo
Mar 27 at 5:18

github.com/rakudo/rakudo/blob/master/src/core/Buf.pm6#L297-L309

– jjmerelo
Mar 27 at 9:14

2

the more precise answer to the latin1 part of the question is that Blob.new is by default Blob[uint8].new, which will truncate the values passed to 8 bit. That's why you get a ¬, as that's what is encoded by 0xac

– timotimo
Mar 27 at 17:31

2

@jjmerelo "$replacement makes no difference." Based on my read of the MoarVM code and comments, it works for the two windows encodings and the shiftjis encoding but does nothing for other encodings such as ascii and latin1. I've edited the question to make my answer as clear as I think I can make it.

– raiph
Mar 31 at 13:18

|
show 2 more comments

TL;DR:

Only samcv or some other core dev can provide an authoritative answer. This is my understanding of the code, comments, and results I see.

If my understanding is correct, some doc and/or code needs to be sorted out to render this SO moot.¹

Specifying the $replacement argument matches a different P6 core multi method than not doing so. Let's call it the "replacer" code path.

The "replacer" code path passes the $replacement and $strict arguments onto a code path in nqp that in turn passes them onto a code path in the backend that handles replacements.

On the MoarVM backend, the replacement and strict arguments are passed onto the decoders for the windows1252, windows1251, and shiftjis encodings but not for other encodings.²

Following the relevant code path

Your code calls this code in Buf.pm6:

multi method decode(Blob:D: $encoding,
 Str :$replacement!,
 Bool:D :$strict = False) 
 nqp::p6box_s(
 nqp::decoderepconf(
 self,
 Rakudo::Internals.NORMALIZE_ENCODING($encoding),
 $replacement.defined ?? $replacement !! nqp::null_s(),
 $strict ?? 0 !! 1))

The nqp::decoderepconf function directly maps to a corresponding function in the backend.

On the MoarVM backend, it's MVM_string_decode_from_buf_config in ops.c.

This in turn calls MVM_string_decode_config in the same file.

From this latter function's comments, there are a couple key sentences that presumably explain the relevance of the replacement and strictness arguments:

Unlike MVM_string_decode, it will not pass through codepoints which have no official mapping.

For now windows-1252 and windows-1251 are the only ones this makes a difference on.

Spelunking the code and commits in the repo suggests the latter comment is slightly out-of-date because it looks like it should make a difference on shiftjis too.

What happens with ascii and latin1 in particular

The current code for MVM_string_decode_config does not pass on the replacement/strictness arguments to the MVM_string_ascii_decode and MVM_string_latin1_decode functions.

So, if you use the encoding "ascii" then the blob must only contain values between 0 and 127, and for "latin1" the values must be between 0 and 255.

say "þor".ords; # (254 111 114)
say "3€".ords; # (51 8364)

Footnotes

² It would be nice if there were multis that rejected use of the $replacement argument if the decoder for the specified encoding doesn't do anything with it.

³ See timotimo++'s comment below.

edited Mar 31 at 12:01

answered Mar 26 at 9:08

raiph

14.6k3 gold badges27 silver badges52 bronze badges

it's not that it does not use them. It's that I don't know how they are used, so I couldn't write the documents that explains what they do. And I know those characters are beyond the range of the representation. That's what the replacement is supposed to be for: to replace those characters (or that's what I though, and how the underlying NQP code works)

– jjmerelo
Mar 26 at 10:42

Thankis again for the answer, but that's not the code that's used. It's a Blob, which has got its own code. It's quite similar, thoough... Again, what I gather from that code is that it should replace whatever code point that can't be passed through. The tests point in that direction, also.

– jjmerelo
Mar 27 at 5:18

github.com/rakudo/rakudo/blob/master/src/core/Buf.pm6#L297-L309

– jjmerelo
Mar 27 at 9:14

2

the more precise answer to the latin1 part of the question is that Blob.new is by default Blob[uint8].new, which will truncate the values passed to 8 bit. That's why you get a ¬, as that's what is encoded by 0xac

– timotimo
Mar 27 at 17:31

2

@jjmerelo "$replacement makes no difference." Based on my read of the MoarVM code and comments, it works for the two windows encodings and the shiftjis encoding but does nothing for other encodings such as ascii and latin1. I've edited the question to make my answer as clear as I think I can make it.

– raiph
Mar 31 at 13:18

|
show 2 more comments

TL;DR:

Only samcv or some other core dev can provide an authoritative answer. This is my understanding of the code, comments, and results I see.

If my understanding is correct, some doc and/or code needs to be sorted out to render this SO moot.¹

Specifying the $replacement argument matches a different P6 core multi method than not doing so. Let's call it the "replacer" code path.

The "replacer" code path passes the $replacement and $strict arguments onto a code path in nqp that in turn passes them onto a code path in the backend that handles replacements.

On the MoarVM backend, the replacement and strict arguments are passed onto the decoders for the windows1252, windows1251, and shiftjis encodings but not for other encodings.²

Following the relevant code path

Your code calls this code in Buf.pm6:

multi method decode(Blob:D: $encoding,
 Str :$replacement!,
 Bool:D :$strict = False) 
 nqp::p6box_s(
 nqp::decoderepconf(
 self,
 Rakudo::Internals.NORMALIZE_ENCODING($encoding),
 $replacement.defined ?? $replacement !! nqp::null_s(),
 $strict ?? 0 !! 1))

The nqp::decoderepconf function directly maps to a corresponding function in the backend.

On the MoarVM backend, it's MVM_string_decode_from_buf_config in ops.c.

This in turn calls MVM_string_decode_config in the same file.

From this latter function's comments, there are a couple key sentences that presumably explain the relevance of the replacement and strictness arguments:

Unlike MVM_string_decode, it will not pass through codepoints which have no official mapping.

For now windows-1252 and windows-1251 are the only ones this makes a difference on.

Spelunking the code and commits in the repo suggests the latter comment is slightly out-of-date because it looks like it should make a difference on shiftjis too.

What happens with ascii and latin1 in particular

The current code for MVM_string_decode_config does not pass on the replacement/strictness arguments to the MVM_string_ascii_decode and MVM_string_latin1_decode functions.

So, if you use the encoding "ascii" then the blob must only contain values between 0 and 127, and for "latin1" the values must be between 0 and 255.

say "þor".ords; # (254 111 114)
say "3€".ords; # (51 8364)

Footnotes

² It would be nice if there were multis that rejected use of the $replacement argument if the decoder for the specified encoding doesn't do anything with it.

³ See timotimo++'s comment below.

edited Mar 31 at 12:01

answered Mar 26 at 9:08

raiph

14.6k3 gold badges27 silver badges52 bronze badges

TL;DR:

Only samcv or some other core dev can provide an authoritative answer. This is my understanding of the code, comments, and results I see.

If my understanding is correct, some doc and/or code needs to be sorted out to render this SO moot.¹

Specifying the $replacement argument matches a different P6 core multi method than not doing so. Let's call it the "replacer" code path.

The "replacer" code path passes the $replacement and $strict arguments onto a code path in nqp that in turn passes them onto a code path in the backend that handles replacements.

On the MoarVM backend, the replacement and strict arguments are passed onto the decoders for the windows1252, windows1251, and shiftjis encodings but not for other encodings.²

Following the relevant code path

Your code calls this code in Buf.pm6:

multi method decode(Blob:D: $encoding,
 Str :$replacement!,
 Bool:D :$strict = False) 
 nqp::p6box_s(
 nqp::decoderepconf(
 self,
 Rakudo::Internals.NORMALIZE_ENCODING($encoding),
 $replacement.defined ?? $replacement !! nqp::null_s(),
 $strict ?? 0 !! 1))

The nqp::decoderepconf function directly maps to a corresponding function in the backend.

On the MoarVM backend, it's MVM_string_decode_from_buf_config in ops.c.

This in turn calls MVM_string_decode_config in the same file.

From this latter function's comments, there are a couple key sentences that presumably explain the relevance of the replacement and strictness arguments:

Unlike MVM_string_decode, it will not pass through codepoints which have no official mapping.

For now windows-1252 and windows-1251 are the only ones this makes a difference on.

Spelunking the code and commits in the repo suggests the latter comment is slightly out-of-date because it looks like it should make a difference on shiftjis too.

What happens with ascii and latin1 in particular

The current code for MVM_string_decode_config does not pass on the replacement/strictness arguments to the MVM_string_ascii_decode and MVM_string_latin1_decode functions.

So, if you use the encoding "ascii" then the blob must only contain values between 0 and 127, and for "latin1" the values must be between 0 and 255.

say "þor".ords; # (254 111 114)
say "3€".ords; # (51 8364)

Footnotes

² It would be nice if there were multis that rejected use of the $replacement argument if the decoder for the specified encoding doesn't do anything with it.

³ See timotimo++'s comment below.

edited Mar 31 at 12:01

answered Mar 26 at 9:08

raiph

14.6k3 gold badges27 silver badges52 bronze badges

edited Mar 31 at 12:01

answered Mar 26 at 9:08

raiph

14.6k3 gold badges27 silver badges52 bronze badges

answered Mar 26 at 9:08

raiph

14.6k3 gold badges27 silver badges52 bronze badges

answered Mar 26 at 9:08

raiph

14.6k3 gold badges27 silver badges52 bronze badges

it's not that it does not use them. It's that I don't know how they are used, so I couldn't write the documents that explains what they do. And I know those characters are beyond the range of the representation. That's what the replacement is supposed to be for: to replace those characters (or that's what I though, and how the underlying NQP code works)

– jjmerelo
Mar 26 at 10:42

Thankis again for the answer, but that's not the code that's used. It's a Blob, which has got its own code. It's quite similar, thoough... Again, what I gather from that code is that it should replace whatever code point that can't be passed through. The tests point in that direction, also.

– jjmerelo
Mar 27 at 5:18

github.com/rakudo/rakudo/blob/master/src/core/Buf.pm6#L297-L309

– jjmerelo
Mar 27 at 9:14

2

the more precise answer to the latin1 part of the question is that Blob.new is by default Blob[uint8].new, which will truncate the values passed to 8 bit. That's why you get a ¬, as that's what is encoded by 0xac

– timotimo
Mar 27 at 17:31

2

@jjmerelo "$replacement makes no difference." Based on my read of the MoarVM code and comments, it works for the two windows encodings and the shiftjis encoding but does nothing for other encodings such as ascii and latin1. I've edited the question to make my answer as clear as I think I can make it.

– raiph
Mar 31 at 13:18

|
show 2 more comments

it's not that it does not use them. It's that I don't know how they are used, so I couldn't write the documents that explains what they do. And I know those characters are beyond the range of the representation. That's what the replacement is supposed to be for: to replace those characters (or that's what I though, and how the underlying NQP code works)

– jjmerelo
Mar 26 at 10:42

Thankis again for the answer, but that's not the code that's used. It's a Blob, which has got its own code. It's quite similar, thoough... Again, what I gather from that code is that it should replace whatever code point that can't be passed through. The tests point in that direction, also.

– jjmerelo
Mar 27 at 5:18

github.com/rakudo/rakudo/blob/master/src/core/Buf.pm6#L297-L309

– jjmerelo
Mar 27 at 9:14

2

the more precise answer to the latin1 part of the question is that Blob.new is by default Blob[uint8].new, which will truncate the values passed to 8 bit. That's why you get a ¬, as that's what is encoded by 0xac

– timotimo
Mar 27 at 17:31

2

@jjmerelo "$replacement makes no difference." Based on my read of the MoarVM code and comments, it works for the two windows encodings and the shiftjis encoding but does nothing for other encodings such as ascii and latin1. I've edited the question to make my answer as clear as I think I can make it.

– raiph
Mar 31 at 13:18

it's not that it does not use them. It's that I don't know how they are used, so I couldn't write the documents that explains what they do. And I know those characters are beyond the range of the representation. That's what the replacement is supposed to be for: to replace those characters (or that's what I though, and how the underlying NQP code works)

– jjmerelo
Mar 26 at 10:42

Thankis again for the answer, but that's not the code that's used. It's a Blob, which has got its own code. It's quite similar, thoough... Again, what I gather from that code is that it should replace whatever code point that can't be passed through. The tests point in that direction, also.

– jjmerelo
Mar 27 at 5:18

github.com/rakudo/rakudo/blob/master/src/core/Buf.pm6#L297-L309

– jjmerelo
Mar 27 at 9:14

the more precise answer to the latin1 part of the question is that Blob.new is by default Blob[uint8].new, which will truncate the values passed to 8 bit. That's why you get a ¬, as that's what is encoded by 0xac

– timotimo
Mar 27 at 17:31

@jjmerelo "$replacement makes no difference." Based on my read of the MoarVM code and comments, it works for the two windows encodings and the shiftjis encoding but does nothing for other encodings such as ascii and latin1. I've edited the question to make my answer as clear as I think I can make it.

– raiph
Mar 31 at 13:18

|
show 2 more comments

Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

gX5cdCT0wbdqXAs97mK,EPpm1mskDIB

搜尋此網誌

Styjun

1 Answer
1

Following the relevant code path

What happens with ascii and latin1 in particular

Footnotes

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Following the relevant code path

What happens with ascii and latin1 in particular

Footnotes

Following the relevant code path

What happens with ascii and latin1 in particular

Footnotes

Following the relevant code path

What happens with ascii and latin1 in particular

Footnotes

Following the relevant code path

What happens with ascii and latin1 in particular

Footnotes

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

1 Answer 1

Following the relevant code path

What happens with ascii and latin1 in particular

Footnotes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Following the relevant code path

What happens with ascii and latin1 in particular

Footnotes

Following the relevant code path

What happens with ascii and latin1 in particular

Footnotes

Following the relevant code path

What happens with ascii and latin1 in particular

Footnotes

Following the relevant code path

What happens with ascii and latin1 in particular

Footnotes

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

1 Answer
1

1 Answer
1

1 Answer
1