How to replace hex value in a stringRemoving control characters from a string in pythonWhat is the difference between String and string in C#?How do I iterate over the words of a string?How do I read / convert an InputStream into a String in Java?Case insensitive 'Contains(string)'How do I make the first letter of a string uppercase in JavaScript?How to replace all occurrences of a string?How to check whether a string contains a substring in JavaScript?Does Python have a string 'contains' substring method?How do I convert a String to an int in Java?Why is char[] preferred over String for passwords?
Get file name and directory in .vimrc file
How could Tony Stark wield the Infinity Nano Gauntlet - at all?
What are some tips and tricks for finding the cheapest flight when luggage and other fees are not revealed until far into the booking process?
C++ Least cost swapping 2
Subgroup generated by a subgroup and a conjugate of it
Have there ever been other TV shows or Films that told a similiar story to the new 90210 show?
Alignment of different align environment
Gofer work in exchange for LoR
Why is the battery jumpered to a resistor in this schematic?
Tikz: The position of a label change step-wise and not in a continuous way
Is it alright to say good afternoon Sirs and Madams in a panel interview?
Have made several mistakes during the course of my PhD. Can't help but feel resentment. Can I get some advice about how to move forward?
What was the intention with the Commodore 128?
Which manga depicts Doraemon and Nobita on Easter Island?
Heyawacky: Ace of Cups
Meaning and structure of headline "Hair it is: A List of ..."
Why should P.I be willing to write strong LOR even if that means losing a undergraduate from his/her lab?
When and which board game was the first to be ever invented?
Has there ever been a truly bilingual country prior to the contemporary period?
How does the illumination of the sky from the sun compare to that of the moon?
Reducing contention in thread-safe LruCache
How do I answer an interview question about how to handle a hard deadline I won't be able to meet?
Did Michelle Obama have a staff of 23; and Melania have a staff of 4?
Eric Andre had a dream
How to replace hex value in a string
Removing control characters from a string in pythonWhat is the difference between String and string in C#?How do I iterate over the words of a string?How do I read / convert an InputStream into a String in Java?Case insensitive 'Contains(string)'How do I make the first letter of a string uppercase in JavaScript?How to replace all occurrences of a string?How to check whether a string contains a substring in JavaScript?Does Python have a string 'contains' substring method?How do I convert a String to an int in Java?Why is char[] preferred over String for passwords?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
While importing data from a flat file, I noticed some embedded hex-values in the string (<0x00>
, <0x01>
).
I want to replace them with specific characters, but am unable to do so. Removing them won't work either.
What it looks like in the exported flat file: https://i.imgur.com/7MQpoMH.png
Another example: https://i.imgur.com/3ZUSGIr.png
This is what I've tried:
(and mind, <0x01>
represents a none-editable entity. It's not recognized here.)
import io
with io.open('1.txt', 'r+', encoding="utf-8") as p:
s=p.read()
# included in case it bears any significance
import re
import binascii
s = "Some string with hex: <0x01>"
s = s.encode('latin1').decode('utf-8')
# throws e.g.: >>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 114: invalid start byte
s = re.sub(r'<0x01>', r'.', s)
s = re.sub(r'\0x01', r'.', s)
s = re.sub(r'\\0x01', r'.', s)
s = s.replace('x01', '.')
s = s.replace('<0x01>', '.')
s = s.replace('0x01', '.')
or something along these lines in hopes to get a grasp of it while iterating through the whole string:
for x in s:
try:
base64.encodebytes(x)
base64.decodebytes(x)
s.strip(binascii.unhexlify(x))
s.decode('utf-8')
s.encode('latin1').decode('utf-8')
except:
pass
Nothing seems to get the job done.
I'd expect the characters to be replacable with the methods I've dug up, but they are not. What am I missing?
NB: I have to preserve umlauts (äöüÄÖÜ)
-- edit:
Could I introduce the hex-values in the first place when exporting? If so, is there a way to avoid that?
with io.open('out.txt', 'w', encoding="utf-8") as temp:
temp.write(s)
python-3.x string encoding utf-8 hex
add a comment |
While importing data from a flat file, I noticed some embedded hex-values in the string (<0x00>
, <0x01>
).
I want to replace them with specific characters, but am unable to do so. Removing them won't work either.
What it looks like in the exported flat file: https://i.imgur.com/7MQpoMH.png
Another example: https://i.imgur.com/3ZUSGIr.png
This is what I've tried:
(and mind, <0x01>
represents a none-editable entity. It's not recognized here.)
import io
with io.open('1.txt', 'r+', encoding="utf-8") as p:
s=p.read()
# included in case it bears any significance
import re
import binascii
s = "Some string with hex: <0x01>"
s = s.encode('latin1').decode('utf-8')
# throws e.g.: >>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 114: invalid start byte
s = re.sub(r'<0x01>', r'.', s)
s = re.sub(r'\0x01', r'.', s)
s = re.sub(r'\\0x01', r'.', s)
s = s.replace('x01', '.')
s = s.replace('<0x01>', '.')
s = s.replace('0x01', '.')
or something along these lines in hopes to get a grasp of it while iterating through the whole string:
for x in s:
try:
base64.encodebytes(x)
base64.decodebytes(x)
s.strip(binascii.unhexlify(x))
s.decode('utf-8')
s.encode('latin1').decode('utf-8')
except:
pass
Nothing seems to get the job done.
I'd expect the characters to be replacable with the methods I've dug up, but they are not. What am I missing?
NB: I have to preserve umlauts (äöüÄÖÜ)
-- edit:
Could I introduce the hex-values in the first place when exporting? If so, is there a way to avoid that?
with io.open('out.txt', 'w', encoding="utf-8") as temp:
temp.write(s)
python-3.x string encoding utf-8 hex
add a comment |
While importing data from a flat file, I noticed some embedded hex-values in the string (<0x00>
, <0x01>
).
I want to replace them with specific characters, but am unable to do so. Removing them won't work either.
What it looks like in the exported flat file: https://i.imgur.com/7MQpoMH.png
Another example: https://i.imgur.com/3ZUSGIr.png
This is what I've tried:
(and mind, <0x01>
represents a none-editable entity. It's not recognized here.)
import io
with io.open('1.txt', 'r+', encoding="utf-8") as p:
s=p.read()
# included in case it bears any significance
import re
import binascii
s = "Some string with hex: <0x01>"
s = s.encode('latin1').decode('utf-8')
# throws e.g.: >>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 114: invalid start byte
s = re.sub(r'<0x01>', r'.', s)
s = re.sub(r'\0x01', r'.', s)
s = re.sub(r'\\0x01', r'.', s)
s = s.replace('x01', '.')
s = s.replace('<0x01>', '.')
s = s.replace('0x01', '.')
or something along these lines in hopes to get a grasp of it while iterating through the whole string:
for x in s:
try:
base64.encodebytes(x)
base64.decodebytes(x)
s.strip(binascii.unhexlify(x))
s.decode('utf-8')
s.encode('latin1').decode('utf-8')
except:
pass
Nothing seems to get the job done.
I'd expect the characters to be replacable with the methods I've dug up, but they are not. What am I missing?
NB: I have to preserve umlauts (äöüÄÖÜ)
-- edit:
Could I introduce the hex-values in the first place when exporting? If so, is there a way to avoid that?
with io.open('out.txt', 'w', encoding="utf-8") as temp:
temp.write(s)
python-3.x string encoding utf-8 hex
While importing data from a flat file, I noticed some embedded hex-values in the string (<0x00>
, <0x01>
).
I want to replace them with specific characters, but am unable to do so. Removing them won't work either.
What it looks like in the exported flat file: https://i.imgur.com/7MQpoMH.png
Another example: https://i.imgur.com/3ZUSGIr.png
This is what I've tried:
(and mind, <0x01>
represents a none-editable entity. It's not recognized here.)
import io
with io.open('1.txt', 'r+', encoding="utf-8") as p:
s=p.read()
# included in case it bears any significance
import re
import binascii
s = "Some string with hex: <0x01>"
s = s.encode('latin1').decode('utf-8')
# throws e.g.: >>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 114: invalid start byte
s = re.sub(r'<0x01>', r'.', s)
s = re.sub(r'\0x01', r'.', s)
s = re.sub(r'\\0x01', r'.', s)
s = s.replace('x01', '.')
s = s.replace('<0x01>', '.')
s = s.replace('0x01', '.')
or something along these lines in hopes to get a grasp of it while iterating through the whole string:
for x in s:
try:
base64.encodebytes(x)
base64.decodebytes(x)
s.strip(binascii.unhexlify(x))
s.decode('utf-8')
s.encode('latin1').decode('utf-8')
except:
pass
Nothing seems to get the job done.
I'd expect the characters to be replacable with the methods I've dug up, but they are not. What am I missing?
NB: I have to preserve umlauts (äöüÄÖÜ)
-- edit:
Could I introduce the hex-values in the first place when exporting? If so, is there a way to avoid that?
with io.open('out.txt', 'w', encoding="utf-8") as temp:
temp.write(s)
python-3.x string encoding utf-8 hex
python-3.x string encoding utf-8 hex
edited Mar 27 at 13:37
P. A. Monsaille
asked Mar 27 at 13:22
P. A. MonsailleP. A. Monsaille
195 bronze badges
195 bronze badges
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Judging from the images, these are actually control characters.
Your editor displays them in this greyed-out way showing you the value of the bytes using hex notation.
You don't have the characters "0x01" in your data, but really a single byte with the value 1
, so unhexlify
and friends won't help.
In Python, these characters can be produced in string literals with escape sequences using the notation xHH
, with two hexadecimal digits.
The fragment from the first image is probably equal to the following string:
"sich zx01 B. irgendeine"
Your attempts to remove them were close.s = s.replace('x01', '.')
should work.
Yep, that did it … thank you. Fyi, I figured out that I introduced the characters myself during re.sub replacements. For example,re.sub('(?<=w)([,.!?;])(?=w)', u'1 ', s)
backreferenced the replaced character and thus introduced the "single byte with the value 1". The regex-module apparently does a better job at this:'(?<=w)p;,.!?(?=w)'
. (via reference)
– P. A. Monsaille
Mar 27 at 15:18
I don't thinkre.sub
withn
backreferences will introduce control characters. That is, unless you mispell the backreferences asx01
, of course. Btw, if this answer solved the problem you described, consider accepting it through the tick on the left.
– lenz
Mar 27 at 18:57
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55378278%2fhow-to-replace-hex-value-in-a-string%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Judging from the images, these are actually control characters.
Your editor displays them in this greyed-out way showing you the value of the bytes using hex notation.
You don't have the characters "0x01" in your data, but really a single byte with the value 1
, so unhexlify
and friends won't help.
In Python, these characters can be produced in string literals with escape sequences using the notation xHH
, with two hexadecimal digits.
The fragment from the first image is probably equal to the following string:
"sich zx01 B. irgendeine"
Your attempts to remove them were close.s = s.replace('x01', '.')
should work.
Yep, that did it … thank you. Fyi, I figured out that I introduced the characters myself during re.sub replacements. For example,re.sub('(?<=w)([,.!?;])(?=w)', u'1 ', s)
backreferenced the replaced character and thus introduced the "single byte with the value 1". The regex-module apparently does a better job at this:'(?<=w)p;,.!?(?=w)'
. (via reference)
– P. A. Monsaille
Mar 27 at 15:18
I don't thinkre.sub
withn
backreferences will introduce control characters. That is, unless you mispell the backreferences asx01
, of course. Btw, if this answer solved the problem you described, consider accepting it through the tick on the left.
– lenz
Mar 27 at 18:57
add a comment |
Judging from the images, these are actually control characters.
Your editor displays them in this greyed-out way showing you the value of the bytes using hex notation.
You don't have the characters "0x01" in your data, but really a single byte with the value 1
, so unhexlify
and friends won't help.
In Python, these characters can be produced in string literals with escape sequences using the notation xHH
, with two hexadecimal digits.
The fragment from the first image is probably equal to the following string:
"sich zx01 B. irgendeine"
Your attempts to remove them were close.s = s.replace('x01', '.')
should work.
Yep, that did it … thank you. Fyi, I figured out that I introduced the characters myself during re.sub replacements. For example,re.sub('(?<=w)([,.!?;])(?=w)', u'1 ', s)
backreferenced the replaced character and thus introduced the "single byte with the value 1". The regex-module apparently does a better job at this:'(?<=w)p;,.!?(?=w)'
. (via reference)
– P. A. Monsaille
Mar 27 at 15:18
I don't thinkre.sub
withn
backreferences will introduce control characters. That is, unless you mispell the backreferences asx01
, of course. Btw, if this answer solved the problem you described, consider accepting it through the tick on the left.
– lenz
Mar 27 at 18:57
add a comment |
Judging from the images, these are actually control characters.
Your editor displays them in this greyed-out way showing you the value of the bytes using hex notation.
You don't have the characters "0x01" in your data, but really a single byte with the value 1
, so unhexlify
and friends won't help.
In Python, these characters can be produced in string literals with escape sequences using the notation xHH
, with two hexadecimal digits.
The fragment from the first image is probably equal to the following string:
"sich zx01 B. irgendeine"
Your attempts to remove them were close.s = s.replace('x01', '.')
should work.
Judging from the images, these are actually control characters.
Your editor displays them in this greyed-out way showing you the value of the bytes using hex notation.
You don't have the characters "0x01" in your data, but really a single byte with the value 1
, so unhexlify
and friends won't help.
In Python, these characters can be produced in string literals with escape sequences using the notation xHH
, with two hexadecimal digits.
The fragment from the first image is probably equal to the following string:
"sich zx01 B. irgendeine"
Your attempts to remove them were close.s = s.replace('x01', '.')
should work.
answered Mar 27 at 14:03
lenzlenz
3,3994 gold badges18 silver badges32 bronze badges
3,3994 gold badges18 silver badges32 bronze badges
Yep, that did it … thank you. Fyi, I figured out that I introduced the characters myself during re.sub replacements. For example,re.sub('(?<=w)([,.!?;])(?=w)', u'1 ', s)
backreferenced the replaced character and thus introduced the "single byte with the value 1". The regex-module apparently does a better job at this:'(?<=w)p;,.!?(?=w)'
. (via reference)
– P. A. Monsaille
Mar 27 at 15:18
I don't thinkre.sub
withn
backreferences will introduce control characters. That is, unless you mispell the backreferences asx01
, of course. Btw, if this answer solved the problem you described, consider accepting it through the tick on the left.
– lenz
Mar 27 at 18:57
add a comment |
Yep, that did it … thank you. Fyi, I figured out that I introduced the characters myself during re.sub replacements. For example,re.sub('(?<=w)([,.!?;])(?=w)', u'1 ', s)
backreferenced the replaced character and thus introduced the "single byte with the value 1". The regex-module apparently does a better job at this:'(?<=w)p;,.!?(?=w)'
. (via reference)
– P. A. Monsaille
Mar 27 at 15:18
I don't thinkre.sub
withn
backreferences will introduce control characters. That is, unless you mispell the backreferences asx01
, of course. Btw, if this answer solved the problem you described, consider accepting it through the tick on the left.
– lenz
Mar 27 at 18:57
Yep, that did it … thank you. Fyi, I figured out that I introduced the characters myself during re.sub replacements. For example,
re.sub('(?<=w)([,.!?;])(?=w)', u'1 ', s)
backreferenced the replaced character and thus introduced the "single byte with the value 1". The regex-module apparently does a better job at this: '(?<=w)p;,.!?(?=w)'
. (via reference)– P. A. Monsaille
Mar 27 at 15:18
Yep, that did it … thank you. Fyi, I figured out that I introduced the characters myself during re.sub replacements. For example,
re.sub('(?<=w)([,.!?;])(?=w)', u'1 ', s)
backreferenced the replaced character and thus introduced the "single byte with the value 1". The regex-module apparently does a better job at this: '(?<=w)p;,.!?(?=w)'
. (via reference)– P. A. Monsaille
Mar 27 at 15:18
I don't think
re.sub
with n
backreferences will introduce control characters. That is, unless you mispell the backreferences as x01
, of course. Btw, if this answer solved the problem you described, consider accepting it through the tick on the left.– lenz
Mar 27 at 18:57
I don't think
re.sub
with n
backreferences will introduce control characters. That is, unless you mispell the backreferences as x01
, of course. Btw, if this answer solved the problem you described, consider accepting it through the tick on the left.– lenz
Mar 27 at 18:57
add a comment |
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55378278%2fhow-to-replace-hex-value-in-a-string%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown