I have read through similar questions on stack overflow, however non of them solve the unicode problem I have: 'ascii' codec can't decode byte 0xc3 in position 302.
Have tried: import sys reload(sys) sys.setdefaultencoding('utf-8')
Note: If you inspect the source code of a html document you may also see that the character set used is stated in a so called 'meta tag'. It seems however that computers prefer to look at the HTTP header, so don't be confused by this. Ensure that the encoding standard of the web server matches the encoding used in your documents and you'll be fine.
however receive an error: NameError: name 'reload' is not defined
I try to read file with danish vowels: æ, ø, å. In return receive 'UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 position 302 etc. Position 302 and further on include danish vowels. Is there a way to fix this?
So far I have tried putting a specially-formatted comment as the first line of the source code:
# -*- coding: <ascii> -*-
. Did not give any result.Also tried:
f = open(fname, encoding='ascii', errors='surrogate escape')
. But instead of reading file with characters as they are for example in the word 'Europæiske' I get 'Europudcc3udca6iske'.Then I tried suggestions from the blog (lost a link to that blog) to 'import unicodedata', however, it was not well explained where to take it form there.
Mr Lister
36.1k1010 gold badges7979 silver badges121121 bronze badges
Nadia SNadia S
closed as off-topic by Bhargav Rao♦, aschipfl, Brendan Abel, Rick Smith, Pierre LafortuneMar 15 '16 at 19:32
This question appears to be off-topic. The users who voted to close gave this specific reason:
- 'Questions seeking debugging help ('why isn't this code working?') must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Reproducible Example.' – Bhargav Rao, aschipfl, Rick Smith, Pierre Lafortune
2 Answers
Simply open with the correct encoding. You have to know the encoding that the file was saved in. Western versions of Windows might be
Mark TolonenMark TolonenWindows-1252
, or perhaps utf8
. Modules such as chardet can perform an educated guess. Also, for for csv
module, open with newline='
as well (see documentation for using csv.reader
:101k1414 gold badges121121 silver badges180180 bronze badges
that
#-- coding:
thing is only for what's being used in the program itself, for example if you define a variable or function with Danish characters.what you're dealing with is I/O, so remember the rule: bytes on the edges, Unicode inside. this means use
jcomeau_ictxjcomeau_ictxstr.decode
when reading in, and unicode.encode
when writing out.31.1k55 gold badges7171 silver badges9090 bronze badges