How to replace hex value in a stringRemoving control characters from a string in pythonWhat is the difference between String and string in C#?How do I iterate over the words of a string?How do I read / convert an InputStream into a String in Java?Case insensitive 'Contains(string)'How do I make the first letter of a string uppercase in JavaScript?How to replace all occurrences of a string?How to check whether a string contains a substring in JavaScript?Does Python have a string 'contains' substring method?How do I convert a String to an int in Java?Why is char[] preferred over String for passwords?

Get file name and directory in .vimrc file

How could Tony Stark wield the Infinity Nano Gauntlet - at all?

What are some tips and tricks for finding the cheapest flight when luggage and other fees are not revealed until far into the booking process?

C++ Least cost swapping 2

Subgroup generated by a subgroup and a conjugate of it

Have there ever been other TV shows or Films that told a similiar story to the new 90210 show?

Alignment of different align environment

Gofer work in exchange for LoR

Why is the battery jumpered to a resistor in this schematic?

Tikz: The position of a label change step-wise and not in a continuous way

Is it alright to say good afternoon Sirs and Madams in a panel interview?

Have made several mistakes during the course of my PhD. Can't help but feel resentment. Can I get some advice about how to move forward?

What was the intention with the Commodore 128?

Which manga depicts Doraemon and Nobita on Easter Island?

Heyawacky: Ace of Cups

Meaning and structure of headline "Hair it is: A List of ..."

Why should P.I be willing to write strong LOR even if that means losing a undergraduate from his/her lab?

When and which board game was the first to be ever invented?

Has there ever been a truly bilingual country prior to the contemporary period?

How does the illumination of the sky from the sun compare to that of the moon?

Reducing contention in thread-safe LruCache

How do I answer an interview question about how to handle a hard deadline I won't be able to meet?

Did Michelle Obama have a staff of 23; and Melania have a staff of 4?

Eric Andre had a dream

How to replace hex value in a string

Removing control characters from a string in pythonWhat is the difference between String and string in C#?How do I iterate over the words of a string?How do I read / convert an InputStream into a String in Java?Case insensitive 'Contains(string)'How do I make the first letter of a string uppercase in JavaScript?How to replace all occurrences of a string?How to check whether a string contains a substring in JavaScript?Does Python have a string 'contains' substring method?How do I convert a String to an int in Java?Why is char[] preferred over String for passwords?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

While importing data from a flat file, I noticed some embedded hex-values in the string (<0x00>, <0x01>).

I want to replace them with specific characters, but am unable to do so. Removing them won't work either.
What it looks like in the exported flat file: https://i.imgur.com/7MQpoMH.png
Another example: https://i.imgur.com/3ZUSGIr.png

This is what I've tried:
(and mind, <0x01> represents a none-editable entity. It's not recognized here.)

import io
with io.open('1.txt', 'r+', encoding="utf-8") as p:
 s=p.read()
# included in case it bears any significance

import re
import binascii

s = "Some string with hex: <0x01>"

s = s.encode('latin1').decode('utf-8')
# throws e.g.: >>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 114: invalid start byte

s = re.sub(r'<0x01>', r'.', s)
s = re.sub(r'\0x01', r'.', s)
s = re.sub(r'\\0x01', r'.', s)
s = s.replace('x01', '.')
s = s.replace('<0x01>', '.')
s = s.replace('0x01', '.')

or something along these lines in hopes to get a grasp of it while iterating through the whole string:

for x in s:
 try:
 base64.encodebytes(x)
 base64.decodebytes(x)
 s.strip(binascii.unhexlify(x))
 s.decode('utf-8')
 s.encode('latin1').decode('utf-8')
 except:
 pass

Nothing seems to get the job done.

I'd expect the characters to be replacable with the methods I've dug up, but they are not. What am I missing?
NB: I have to preserve umlauts (äöüÄÖÜ)

-- edit:

Could I introduce the hex-values in the first place when exporting? If so, is there a way to avoid that?

with io.open('out.txt', 'w', encoding="utf-8") as temp:
 temp.write(s)

edited Mar 27 at 13:37

asked Mar 27 at 13:22

P. A. Monsaille

195 bronze badges

add a comment |

While importing data from a flat file, I noticed some embedded hex-values in the string (<0x00>, <0x01>).

This is what I've tried:
(and mind, <0x01> represents a none-editable entity. It's not recognized here.)

import io
with io.open('1.txt', 'r+', encoding="utf-8") as p:
 s=p.read()
# included in case it bears any significance

import re
import binascii

s = "Some string with hex: <0x01>"

s = s.encode('latin1').decode('utf-8')
# throws e.g.: >>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 114: invalid start byte

s = re.sub(r'<0x01>', r'.', s)
s = re.sub(r'\0x01', r'.', s)
s = re.sub(r'\\0x01', r'.', s)
s = s.replace('x01', '.')
s = s.replace('<0x01>', '.')
s = s.replace('0x01', '.')

or something along these lines in hopes to get a grasp of it while iterating through the whole string:

for x in s:
 try:
 base64.encodebytes(x)
 base64.decodebytes(x)
 s.strip(binascii.unhexlify(x))
 s.decode('utf-8')
 s.encode('latin1').decode('utf-8')
 except:
 pass

Nothing seems to get the job done.

I'd expect the characters to be replacable with the methods I've dug up, but they are not. What am I missing?
NB: I have to preserve umlauts (äöüÄÖÜ)

-- edit:

Could I introduce the hex-values in the first place when exporting? If so, is there a way to avoid that?

with io.open('out.txt', 'w', encoding="utf-8") as temp:
 temp.write(s)

edited Mar 27 at 13:37

asked Mar 27 at 13:22

P. A. Monsaille

195 bronze badges

add a comment |

While importing data from a flat file, I noticed some embedded hex-values in the string (<0x00>, <0x01>).

This is what I've tried:
(and mind, <0x01> represents a none-editable entity. It's not recognized here.)

import io
with io.open('1.txt', 'r+', encoding="utf-8") as p:
 s=p.read()
# included in case it bears any significance

import re
import binascii

s = "Some string with hex: <0x01>"

s = s.encode('latin1').decode('utf-8')
# throws e.g.: >>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 114: invalid start byte

s = re.sub(r'<0x01>', r'.', s)
s = re.sub(r'\0x01', r'.', s)
s = re.sub(r'\\0x01', r'.', s)
s = s.replace('x01', '.')
s = s.replace('<0x01>', '.')
s = s.replace('0x01', '.')

or something along these lines in hopes to get a grasp of it while iterating through the whole string:

for x in s:
 try:
 base64.encodebytes(x)
 base64.decodebytes(x)
 s.strip(binascii.unhexlify(x))
 s.decode('utf-8')
 s.encode('latin1').decode('utf-8')
 except:
 pass

Nothing seems to get the job done.

I'd expect the characters to be replacable with the methods I've dug up, but they are not. What am I missing?
NB: I have to preserve umlauts (äöüÄÖÜ)

-- edit:

Could I introduce the hex-values in the first place when exporting? If so, is there a way to avoid that?

with io.open('out.txt', 'w', encoding="utf-8") as temp:
 temp.write(s)

edited Mar 27 at 13:37

asked Mar 27 at 13:22

P. A. Monsaille

195 bronze badges

While importing data from a flat file, I noticed some embedded hex-values in the string (<0x00>, <0x01>).

This is what I've tried:
(and mind, <0x01> represents a none-editable entity. It's not recognized here.)

import io
with io.open('1.txt', 'r+', encoding="utf-8") as p:
 s=p.read()
# included in case it bears any significance

import re
import binascii

s = "Some string with hex: <0x01>"

s = s.encode('latin1').decode('utf-8')
# throws e.g.: >>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 114: invalid start byte

s = re.sub(r'<0x01>', r'.', s)
s = re.sub(r'\0x01', r'.', s)
s = re.sub(r'\\0x01', r'.', s)
s = s.replace('x01', '.')
s = s.replace('<0x01>', '.')
s = s.replace('0x01', '.')

or something along these lines in hopes to get a grasp of it while iterating through the whole string:

for x in s:
 try:
 base64.encodebytes(x)
 base64.decodebytes(x)
 s.strip(binascii.unhexlify(x))
 s.decode('utf-8')
 s.encode('latin1').decode('utf-8')
 except:
 pass

Nothing seems to get the job done.

I'd expect the characters to be replacable with the methods I've dug up, but they are not. What am I missing?
NB: I have to preserve umlauts (äöüÄÖÜ)

-- edit:

Could I introduce the hex-values in the first place when exporting? If so, is there a way to avoid that?

with io.open('out.txt', 'w', encoding="utf-8") as temp:
 temp.write(s)

python-3.x string encoding utf-8 hex

edited Mar 27 at 13:37

asked Mar 27 at 13:22

P. A. Monsaille

195 bronze badges

edited Mar 27 at 13:37

asked Mar 27 at 13:22

P. A. Monsaille

195 bronze badges

edited Mar 27 at 13:37

asked Mar 27 at 13:22

P. A. Monsaille

195 bronze badges

asked Mar 27 at 13:22

P. A. Monsaille

195 bronze badges

asked Mar 27 at 13:22

P. A. Monsaille

195 bronze badges

add a comment |

1 Answer
1

active

oldest

votes

Judging from the images, these are actually control characters.
Your editor displays them in this greyed-out way showing you the value of the bytes using hex notation.
You don't have the characters "0x01" in your data, but really a single byte with the value 1, so unhexlify and friends won't help.

In Python, these characters can be produced in string literals with escape sequences using the notation xHH, with two hexadecimal digits.
The fragment from the first image is probably equal to the following string:

"sich zx01 B. irgendeine"

Your attempts to remove them were close.
s = s.replace('x01', '.') should work.

answered Mar 27 at 14:03

lenz

3,3994 gold badges18 silver badges32 bronze badges

Yep, that did it … thank you. Fyi, I figured out that I introduced the characters myself during re.sub replacements. For example, re.sub('(?<=w)([,.!?;])(?=w)', u'1 ', s) backreferenced the replaced character and thus introduced the "single byte with the value 1". The regex-module apparently does a better job at this: '(?<=w)p;,.!?(?=w)'. (via reference)

– P. A. Monsaille
Mar 27 at 15:18

I don't think re.sub with n backreferences will introduce control characters. That is, unless you mispell the backreferences as x01, of course. Btw, if this answer solved the problem you described, consider accepting it through the tick on the left.

– lenz
Mar 27 at 18:57

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55378278%2fhow-to-replace-hex-value-in-a-string%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

"sich zx01 B. irgendeine"

Your attempts to remove them were close.
s = s.replace('x01', '.') should work.

answered Mar 27 at 14:03

lenz

3,3994 gold badges18 silver badges32 bronze badges

Yep, that did it … thank you. Fyi, I figured out that I introduced the characters myself during re.sub replacements. For example, re.sub('(?<=w)([,.!?;])(?=w)', u'1 ', s) backreferenced the replaced character and thus introduced the "single byte with the value 1". The regex-module apparently does a better job at this: '(?<=w)p;,.!?(?=w)'. (via reference)

– P. A. Monsaille
Mar 27 at 15:18

I don't think re.sub with n backreferences will introduce control characters. That is, unless you mispell the backreferences as x01, of course. Btw, if this answer solved the problem you described, consider accepting it through the tick on the left.

– lenz
Mar 27 at 18:57

add a comment |

"sich zx01 B. irgendeine"

Your attempts to remove them were close.
s = s.replace('x01', '.') should work.

answered Mar 27 at 14:03

lenz

3,3994 gold badges18 silver badges32 bronze badges

Yep, that did it … thank you. Fyi, I figured out that I introduced the characters myself during re.sub replacements. For example, re.sub('(?<=w)([,.!?;])(?=w)', u'1 ', s) backreferenced the replaced character and thus introduced the "single byte with the value 1". The regex-module apparently does a better job at this: '(?<=w)p;,.!?(?=w)'. (via reference)

– P. A. Monsaille
Mar 27 at 15:18

I don't think re.sub with n backreferences will introduce control characters. That is, unless you mispell the backreferences as x01, of course. Btw, if this answer solved the problem you described, consider accepting it through the tick on the left.

– lenz
Mar 27 at 18:57

add a comment |

"sich zx01 B. irgendeine"

Your attempts to remove them were close.
s = s.replace('x01', '.') should work.

answered Mar 27 at 14:03

lenz

3,3994 gold badges18 silver badges32 bronze badges

"sich zx01 B. irgendeine"

Your attempts to remove them were close.
s = s.replace('x01', '.') should work.

answered Mar 27 at 14:03

lenz

3,3994 gold badges18 silver badges32 bronze badges

answered Mar 27 at 14:03

lenz

3,3994 gold badges18 silver badges32 bronze badges

answered Mar 27 at 14:03

lenz

3,3994 gold badges18 silver badges32 bronze badges

answered Mar 27 at 14:03

lenz

3,3994 gold badges18 silver badges32 bronze badges

Yep, that did it … thank you. Fyi, I figured out that I introduced the characters myself during re.sub replacements. For example, re.sub('(?<=w)([,.!?;])(?=w)', u'1 ', s) backreferenced the replaced character and thus introduced the "single byte with the value 1". The regex-module apparently does a better job at this: '(?<=w)p;,.!?(?=w)'. (via reference)

– P. A. Monsaille
Mar 27 at 15:18

I don't think re.sub with n backreferences will introduce control characters. That is, unless you mispell the backreferences as x01, of course. Btw, if this answer solved the problem you described, consider accepting it through the tick on the left.

– lenz
Mar 27 at 18:57

add a comment |

Yep, that did it … thank you. Fyi, I figured out that I introduced the characters myself during re.sub replacements. For example, re.sub('(?<=w)([,.!?;])(?=w)', u'1 ', s) backreferenced the replaced character and thus introduced the "single byte with the value 1". The regex-module apparently does a better job at this: '(?<=w)p;,.!?(?=w)'. (via reference)

– P. A. Monsaille
Mar 27 at 15:18

I don't think re.sub with n backreferences will introduce control characters. That is, unless you mispell the backreferences as x01, of course. Btw, if this answer solved the problem you described, consider accepting it through the tick on the left.

– lenz
Mar 27 at 18:57

Yep, that did it … thank you. Fyi, I figured out that I introduced the characters myself during re.sub replacements. For example, re.sub('(?<=w)([,.!?;])(?=w)', u'1 ', s) backreferenced the replaced character and thus introduced the "single byte with the value 1". The regex-module apparently does a better job at this: '(?<=w)p;,.!?(?=w)'. (via reference)

– P. A. Monsaille
Mar 27 at 15:18

I don't think re.sub with n backreferences will introduce control characters. That is, unless you mispell the backreferences as x01, of course. Btw, if this answer solved the problem you described, consider accepting it through the tick on the left.

– lenz
Mar 27 at 18:57

add a comment |

Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer
1

1 Answer
1

1 Answer
1