Dear Community,
First of all, as a former user of FCKEditor; I've found CKEditor very successful. However, I have a subtle issue that prevents me from using it. I think it is about the encoding. I have been searching about this topic but I haven't found more than some forum topics that are left unanswered.
I am using PHP5 under Apache and MySQL (with charset UTF-8). Furthermore, I am using server-sided configuration of CKEditor and JQuery for ajax posting. The encoding for ajax posting is also UTF-8 and the server sided php files are UTF-8 as well.
The thing is, when I retrieve data from editor (using getData()) in order to perform an ajax post, the data includes unicode control characters for beginning and end of the line.
For instance, while in the editor I have "<p> Hello World!</p>", upon the post the database records the string as
"u000au0009<p>Hello World!</p>
u000a"
I have tried replacing those characters within javascript, converting them to utf-8 and all the crazy hack I could come up with. I haven't been able to solve this issue.
I would be really grateful if someone could post the solution for that, besides; as far as I searched this problem has been encountered a few times before and nobody seems to have resolved it.
Thanks in advance.
First of all, as a former user of FCKEditor; I've found CKEditor very successful. However, I have a subtle issue that prevents me from using it. I think it is about the encoding. I have been searching about this topic but I haven't found more than some forum topics that are left unanswered.
I am using PHP5 under Apache and MySQL (with charset UTF-8). Furthermore, I am using server-sided configuration of CKEditor and JQuery for ajax posting. The encoding for ajax posting is also UTF-8 and the server sided php files are UTF-8 as well.
The thing is, when I retrieve data from editor (using getData()) in order to perform an ajax post, the data includes unicode control characters for beginning and end of the line.
For instance, while in the editor I have "<p> Hello World!</p>", upon the post the database records the string as
"u000au0009<p>Hello World!</p>
u000a"
I have tried replacing those characters within javascript, converting them to utf-8 and all the crazy hack I could come up with. I haven't been able to solve this issue.
I would be really grateful if someone could post the solution for that, besides; as far as I searched this problem has been encountered a few times before and nobody seems to have resolved it.
Thanks in advance.
Re: Encoding Problem
I just had a similar issue after changing from FCKeditor to CKeditor. Most posts (in English) were ok, but some posts that had text coped from Outlook using Word email editor had weird characters inserted. Updates caused them to increase in length after each submission.
Also remember that I had this problem with FCKeditor a few years ago when I originally wrote the editor API... In FCKeditor, there were modules that set the "codepage" at the top of many of the .js files. This was causing the codepage to be reset from UTF-8 to something else. I had to manually edit these files and delete the codepage characters at the time.
If you paste text copied from a word html that has "bullet" characters into ckeditor and submit the form, the browser will encode the bullet incorrectly (not in UTF-8) and when it re-loaded it will be gibberish. Same goes for foreign characters using other languages.
My solution was to correctly specify the CODEPAGE and CHARACTER SET in the application program as UTF-8.
For ASP/VBScript, this was done using the code below at the very top of the program. There are PHP HEADER settings that do the same thing, and server-level settings too.
'# -------------------------------------------------------------------------------------------
'# RTS: - specify codepage/charset to support UTF-8 by default (and CKEditor) - 11/22/2010 ele
'# http://htmlpurifier.org/docs/enduser-utf8.html
'# http://www.asp-dev.com/main.asp?page=96#axzz161JEhyHi
'# Setting Response.CodePage explicitly affects a single page whereas Session.CodePage affects all responses in a session.
'# *** Must be before any characters are sent/processed. Does not work when specified in the HEAD tag since chars normally have been written to the browser by RMS
'# Don't know why RMS is defaulting to non-UTF-8 codepage?
'# -------------------------------------------------------------------------------------------
Response.CodePage = 65001
Response.CharSet = "utf-8" '# international character supportThe bottom line is the html page must correctly specify UTF-8 as the character set on initial page load. Specifying UTF-8 in the html HEAD section may not work since it must be specified before any text is output back to the browser.
See the 2 links above that provided me with information.
I just solved this issue today myself.
Bottom line, HTML pages should be forced to UTF-8 to support generic character encoding...
Eric Edberg