Charset problem - please help

2012-01-06 13:34:43


Hello, firs of all, thank you very much for this great scrit: I installed in in my website and it is working just out of the box. I am just having a problem with the charset: both he webspage where the code is embed and the RSS I am importing have the charset set to iso-8859-1. Still, for some reason, when I import the RSS, the text (which is in Portuguese) does not display correcly. Fo example: Índios isolados localizados perto de obras de hidrelétricas appears Índios isolados localizados perto de obras de hidrelétricas When I change the charset to display UTF-8, the RSS displays correctly, but the rest of the content doesn't. I am guessing that the script is converting the RSS to UTF 8, and I don't know how to change that. Any advice would be welcomed. Thank you in advance. Dani
2012-01-07 00:28:36


I have webpage encoding utf-8 format, I don't want to change the encoding format. I want to know how to parse a php encoding in widows 1250 so that it is visible rss's encoded data in widows 1250 into my webpage.
2012-01-07 03:30:36


Hello, The documents are loaded with the format specified in the charset of the original page. For exemple, is the RSS has this declaration
<?xml version="1.0"?>
It gets a default encoding. The encoding may be specified:
<?xml version="1.0" encoding="ISO-8859-1">
How to translate the charset to that of the page? It the encoding is specified in the constructor:
$doc= new DOMDocument('1.0', 'UTF-8');
The charset will be overloaded by that of the loaded file. But the encoding may be changed at string level, for example:
$content = mb_convert_encoding($content, "ISO-8859-1", "UTF-8");
To convert from ISO to UTF. Conversions must be achieved in the RSS_Tags() function
$y["title"] = mb_convert_encoding($title, "ISO-8859-1", "UTF-8");
$y["description"] = mb_convert_encoding($description, "ISO-8859-1", "UTF-8");
2012-01-08 21:07:45


Hello, good evening. Thank you so much for the reply. I managed to solve the problem, but got a little confused first. Because both the RSS and the page had the same stated encoding and even then the RSS import was not working. I only managed to make it work when I actually re-encoded it to UTF-8 at the string level. I don't know why this happen though.