PHP – MySQL: Unicode solution to Chinese, Russian or any language
Hey Guys,
I am a Freelance Web Developer and my main tools are PHP & MySQL. Few days ago, I got a Chinese project where I had to develop a Real Estate site in Chinese language. You know we often build websites in English and Databases are in English too. So, the default configuration in MySQL works fine everytime.
But when it comes a language other than English, many people do not know what to do. Well. When I started the project, I did not even know that the default MySQL settings will not work for the Chinese language. So, I started searching for a stable solution where my program will support any language for adding, updating and searching data from the MySQL database.
And Yeah.
I found it!
OK.
Let us see the solution now.
It is very very simple.
Step One: SET THE CHARSET TO UTF-8 IN THE HEAD SECTION
First of all, the browser needs to know that you are going to display or use Unicode in this page. So, go to your <HEAD></HEAD> section and set the charset to utf-8. So, the browser will be able to show the Unicode text without any error and smoothly. You can also copy and paste the line below:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Step Two: CREATING THE DATABASE
When you create your (a) Database and (b) any Table in the database, set the Collation of both of them to utf8_unicode_ci and you know it is very easy if you are using phpMyAdmin.
Step Three: DATABASE INITIALIZATION
When you initialize the database connection, please add the “extra lines”
<?php define('HOSTNAME', 'localhost'); define('USERNAME', 'database_user_name'); define('PASSWORD', 'database_password'); define('DATABASE', 'database_name'); $dbLink = mysql_connect(HOSTNAME, USERNAME, PASSWORD); mysql_query("SET character_set_results=utf8", $dbLink); mb_language('uni'); mb_internal_encoding('UTF-8'); mysql_select_db(DATABASE, $dbLink); mysql_query("set names 'utf8'",$dbLink); ?>
But why are you adding the extra lines? Because you are letting the database know what kind of input you are going to work with soon.
Step Four: INSERTING INPUTS/DATA IN THE DATABASE
<?php mysql_query("SET character_set_client=utf8", $dbLink); mysql_query("SET character_set_connection=utf8", $dbLink); $sql_query = "INSERT INTO TABLE_NAME(field_name_one, field_name_two) VALUES('field_value_one', 'field_value_two')"; mysql_query($sql_query, $dbLink); ?>
Why are you adding the first two lines for? Because the database should know what kind of data is going to be stored.
Step Five: UPDATING INPUTS/DATA IN THE DATABASE
<?php mysql_query("SET character_set_client=utf8", $dbLink); mysql_query("SET character_set_connection=utf8", $dbLink); $sql_query = "UPDATE TABLE_NAME SET field_name_one='field_value_one', field_name_two='field_value_two' WHERE id='$id'; "; mysql_query($sql_query, $dbLink); ?>
So, you are adding the extra two lines before you run your query string as you are playing with Unicode.
Step Six: SEARCHING DATA FROM THE DATABASE
<?php mysql_query("SET character_set_results=utf8", $dbLink); $sql_query = "SELECT * FROM TABLE_NAME WHERE id='$id'; "; $dbResult = mysql_query( $sql_query, $dbLink); ?>
Adding the one extra line every time you search your Unicode data is enough.
OKKK.
You are done. This should work smoothly for handling your data in any language does not matter it is Bangla (my mother tongue), Hindi, Chinese, French, German, Spanish, Russian, Arabian (Arabic), Urdu, or any other language.
And do not forget to leave a comment if you have any. Because I need to update the post in case required.
Thanks for reading and please check if it works for you.
I’m new to php world. Everytime that I got the result from my query was ????? because those text stored inside my DB were Unicode. This article saved my day. Thank you.
I know.
It happened with me too.
I am happy to know that it helped you.
What am I doing wrong. I have a MySQL database with Russian/Cyrillic words. They look fine in phpmyadmin, but on the utf-8 encoded html page I got ?????. After following your tutorial I got мобРinstead.
Database charset is set to utf8 and collation is set to utf8_unicode_bin, collation for the table is utf8_unicode_bin, and the same for the column.
I work with Smarty templates and the database call looks now like this:
$db = DB::connect(“mysql://$dbuser:$dbpass@$dbhost/$dbname”);
mysql_query(‘SET character_set_results=utf8′);
mb_language(‘uni’);
mb_internal_encoding(‘UTF-8′);
mysql_query(“SET NAMES ‘utf8′”);
I need to get this working for Monday… and am pulling my hair right now. Any help is really appreciated. Thanks
@ Smokey
You are using ‘utf8_unicode_bin’.
My article suggests to use only ‘utf8_unicode_bin’.
Can you try ‘utf8-unicode_ci’ in database and table creation?
And please follow the article exactly as described because this is a sensitive issue.
Thank you.
@ Smokey
Make sure the followings have ‘utf8-unicode_ci’ attribute:
1. Database
2. Tables
3. Fields in the table
All of the three need to have ‘utf8-unicode_ci’ attribute. OK?
Yes, I have them already set to utf8_unicode_ci. Any other suggestions??
Can you zip your database and upload it somewhere?
By the way,
I can take a look by tomorrow since I am very busy today.
Ok, after hours of hair pulling I figured it finally out. In my PHP smarty code I had the htmlentities set. I removed the smarty code $smarty->register_modifier(“variable”,”htmlentities”); and everything worked fine. I will read up on this, but here is a post that mentions this problem. Here is the php documentation http://us2.php.net/htmlentities and another article about it http://annevankesteren.nl/2004/05/unicode-support
PS: could you please remove my email from my last post… thanks
… I forgot, as mentioned in your post. I also had to set mysql_query(mysql_query(‘SET character_set_results=utf8′); after initializing the database connection
@ Smokey
I am happy to hear that your problem has been solved.
Great!
And I deleted your message that one you wanted.
I do appreciate the effort you make to teach us in an easy way. I am in the process of developing an application that utilizes unicode and it is real good place to start it.
I promise to let you know any difficulties I got, the solution I make, things I found important …and even more to post to this site.
Thank you.
@ YeGullelew
I appreciate your positive approach too.
Excellent post.
Many thanks
- Yohan
Thank you so much for this tutorial. It is very simple and easy to follow and you saved my bacon.
Your a genius, thank you
hey big thx for you i have had to implement a page with 3 different languages. This tutorial saved me a lot of time a stress it works really great
Thank you for this grate solution.
I want to work with Unicode Bengali number. I can calculate any English number with SUM() function on English with MySQL but Is there any way to calculate the Unicode Bengali number with SUM() function or anything with MySQL similarly PHP array_sum() function for calculating Unicode Bengali number or numeric value?
@ Nobin
Please check my new post:
http://www.tanzilo.com/2008/12/23/php-mysql-unicode-number-add-subtract-etc-for-any-language/
Thank you so much for sharing this. I am struggling few days to try to resolving this issue, and finally, you saved my life.
Happy holiday!
[...] Step Four: You need to check my other posting for SELECT, INSERT & UPDATE your local language. Here goes my other article so that you can perform all required operations for storing and displaying information in browser smoothly: http://www.tanzilo.com/2008/10/13/php-mysql-unicode-solution-to-chinese-russian-or-any-language/ [...]
You, sir, have saved my day. I’ve been struggling for quite a while for a simple way to work with unicode data in MySQL. All of my problems with invalid characters messing up my site have suddenly vanished after applying your concise steps. Thanks a million.
Hi, i’m looking for a php function that can convert chinese character to hexadecimal, do u mind to share your opinion / hints ? and this issue has struggled me for few months, getting urgent to get it done. hope you or someone here can help.
@tootcsen
hey,
check my other two articles for hints
1. http://www.tanzilo.com/2008/12/29/php-mysql-creating-a-website-in-your-local-language-smoothly/
2. http://www.tanzilo.com/2008/12/23/php-mysql-unicode-number-add-subtract-etc-for-any-language/
You should modify the code to adjust them in hexadecimal replace.
i hope that will help
thanks.
Another cool tutorial, thanks again for sharing your experiences …
I ran into trouble with displaying Chinese using PHP/MySQL. I googled and found your article. Problem solved in 2 minutes. Thanks a lot!
Hi,
I am also working on several real estate websites and the current changes market forces my clients to offer the site in English, Spanish, Poolish and Russian. How do I handle this in the database and the PHP code? do I use UTF-8 for all of them?
Thanks
@Brecht
Hi there,
if you follow this article as it is, you can show single or multiple language output in a page. Simply follow the article and test after you’ve done so!
Hi Bro,
It works fine. Excellent post. Thanks
pn
hi… this is swathi.i have some problem in my website.can u please help me…i have to develop my website wit telugu font using php…how can i do this simpy
@ swathi
if you are developing a simple site, Step One of this article should be enough for you.
if you are developing a database based website in PHP, you need to follow all the steps.
Hello Bhia Assalmualikume
It is nice to say that i have solved my prooblem relating to insert and retrieve bangla correctly using the idea from this pages
thanks a lot to you
Allah hafaz
Arif
Chittagong University
Bangladesh
Hi,
A good day to you.
There is a little query regarding PHP, MYSQL and UNICODE in C panel. I’m new to it. My Musql is running well. I have a lot of Hindi and Urdu language Unicode characters in different databases. What I want is to store, retrive and display the characters as they are and not their hex equivalent codes since it creates problem in editing the text stored in mysql. It is easy to edit characters than their equivalent hex numbers. When I store the Hindi/Urdu text, for example, after
converting them into hex, it is retrived and displayd nicely but now there is no
character, only hex codes in Mysql and in the source page of browser. If I want to edit the data it is impossible if some change is needed. For that I need to keep Hindi/Urdu data at some other place then edit them and then convert them into hex first and then store them in Mysql. While I store data in mysql as original characters, the
browser and the source page display only ???? ???? ??? like character. It does not
recognize the stored data in Unicode characters. I want that the original characters
should be sored in mysql (Storing is no broblem) retrived through echo in Php page and
display in browser and source page of browser as they are and not their hex numbers.
By the way where to place your class.unicode.php and what is this DATABASE INITIALIZATION.
I attach the url of a page of mine here (http://www.nehi.in/hil/hil4.php). I work with C panel. I have set UTF-8 and utf8_Unicode_ci for all the fields, tables and databases through myphpadmin. I think I am not not converting the unicode successfully in htmlentities.
I wonder if you have something to say.
Thanks
@ S.G. Hussain
Why don’t you develop a small application first in your localhost and try it?
It would be better if you develop a small application and go to a complex one step by step.
Most helpful! Thank you very, very much!!!!!
Thanks a lot, mate!!! Thumbs up for this solution
Hi.
I’m making a website in english and Chinese using mysql databases
I’m using one database and table for both languages
like 1 tables with clubname_eng and clubname_chn
I followed all the instructions, but somehow it doesn’t update the chinese text
The first page has a from than contains both english and Chinese
when pressing the send button, it runs updateclub.php
but it only updates English text. Chinese text stays empty.
the strange part is that when I print the chinese text on that update page it appears correctly. it’s just not updating
than I tried utf8_encode command. but than I ended up with 2x more characters and that does not make any sense.
but I got chinese text in my database and was able to read it. it’s just wrong text
the last thing I tried is to add the text directly with phpmyaddmin.
when I run the SQL it shows the text correctly, but on my page it give a square and a wrong Character for the word beijing (北京)
What am i doing wrong
And please help my solve this nightmare.
Cheers
@ Berry
Write two functions for two languages.
Then use each function separately to handle each language.
what you mean with 2 functions?
UTF-8 should be able to use all kind of charecters? or not?
when I _Request the data from my form page, I can use in in the page. both languages at the same time. only the the text in the query is empty if it’s in Chinese
Like the value $eng_city as the value beijing and the value $chn_city has the value 北京.
when I echo this value they appear correct
but when I use :
mysql_query(“SET character_set_client=utf8″, $dbLink);
mysql_query(“SET character_set_connection=utf8″, $dbLink);
$sql_query = “UPDATE clubs SET city_eng = ‘$eng_city’ WHERE id = ‘$club_id’; “;
mysql_query($sql_query, $dbLink) or die;
mysql_query(“SET character_set_client=utf8″, $dbLink) or die;
mysql_query(“SET character_set_connection=utf8″, $dbLink) or die;
$sql_query = “UPDATE clubs SET city_chn = ‘$chn_city’ WHERE id = ‘$club_id’; “;
mysql_query($sql_query, $dbLink) or die;
it only updates the english text. the $chn_city value in the query is empty, but not if I echo it later on???
also I added or die to the mysql_query to see if there is an error.
but it doesn’t die
Please help
Cheers Berry
@ berry
First try to insert the Chinese text.
Follow all the steps described here carefully.
When you succeed, then try to add English text.
one more this I want to ask.
If I add the chinese text with phpmyaddmin, why I doesn’t show correct on my page?
@ berry
I think you missed a step somewhere.
Again try to follow all the steps described here one by one.
Make sure you did not miss any one.
Thanks for the fast reply
ok I made a new database with 1 table called Chinese
in this table I made 2 fields.
field 1: chn_name – Type = text – Collation = utf8_unicode_ci
field 2: eng_name – rest same as field 1
with operations I changed the database and table collation, too
then I made 3 simple pages called
http://www.1945mf-china.com/edit/test1.php
http://www.1945mf-china.com/edit/test2.php
http://www.1945mf-china.com/edit/test3.php
test1.php has 2 textboxes, but I only programmed the first one so that it is Chinese only.
here are the sources.
test1.php
Untitled Document
test2.php
Untitled Document
test3.php
Untitled Document
I still have the same problem.
If I type Beijing in Chinese, I won’t add the record.
What am I doing wrong??????
Cheers
I see that I can’t post the source code
@ berry
Did you follow:
Step One: SET THE CHARSET TO UTF-8 IN THE HEAD SECTION ??
yeah I did that,
Dreamweaver does it for me, so I even can’t forget it.
@ berry
I think you miss somewhere…
Follow all the steps and try to develop a very small and very simple Chinese supporting code.
These gave me a big headach
But I solved the problem.
I added
$sql = “SET NAMES ‘utf8′”;
mysql_query($sql, $dbLink);
this step is not in your solution
Now it writes and reads correctly
thanks for all the help
Cheers
@ berry
No.
You are wrong.
That line is the last line of:
Step Three: DATABASE INITIALIZATION
Sorry you’re right,
I made a file connect.php that I include in all the php files that need to connect to that database
this works for me, but as soon as I take the last 2 lines away, it doesn’t work anymore
Very strange cause they both do the same thing but code differently
Cheers
code one more time
define(‘HOSTNAME’, ‘localhost’);
define(‘USERNAME’, ‘*****’);
define(‘PASSWORD’, ‘*****’);
define(‘DATABASE’, ‘*****’);
$dbLink = mysql_connect(HOSTNAME, USERNAME, PASSWORD);
mysql_query(“SET character_set_results=utf8″, $dbLink);
mb_language(‘uni’);
mb_internal_encoding(‘UTF-8′);
mysql_select_db(DATABASE, $dbLink);
mysql_query(“set names ‘utf8′”,$dbLink);
$sql = “SET NAMES ‘utf8′”;
mysql_query($sql, $dbLink);
@ berry
But I am really really happy to know that your problem is solved at last.
Hello.
Thank you for the post, it’s really helpful.
I faced with a problem which is not exactly the same as you provided solution for, but may be you could help me.
I read the post very careful, and made all changes (double checked).
table, database, all fields collation set to utf8_unicode_ci.
Proper meta tag is added as well.
The problem:
In phpmyadmin I see fields absolutely correctly:
field1 field2
室沙萨那(some chinese text here) Benefits: пока думаем… (Mix english and russian)
The problem that after I made all chnages as you advised I got
this output string for the fields:
瀹ゆ矙钀ㄩ偅 (for field1)
Benefits: 锌芯泻邪 写褍屑邪械屑… (for field2)
Its chinese but not the same characters as in database. And as you can see russian text also replace by chinese characters.
I double checked all preferences and so on but still cant understand what is wrong.
Could you please advise me a direction..
I created simple table, made everything according with your steps.
After “UPDATE `probe` SET `NAME` = ‘你好’ WHERE `ID` = 1;” and SELECT after I get wrong output again and in PHP Admin shows the string correctly.
anyone could help me?
can anyone help me?
I tried apply SET names and SET CHARSET in different order and I studied MySql documentation from which the author created this article (with some additions which are not necessary), but funny thing that neither that guidance nor this article are correct. At least not always. I checked everything and tried exact stages step by step more than 10 times and still cant find way to fix it.
But there is way – in PHP admin everything shows correct. But code of PHP Admin is quite difficult to understand and I couldnt find how they do it.
I can provide access to DB if someone could help me…
Thanks a lot!! You helped me very well with Russian!!
As it happened to Smokey on November 2nd, 2008 first I passed from ??????? before to мобРafter doing what is stated in this article. But it was because I had forgotten to take out the previous utf8_encode(….) I had on my code. After taking these ones out perfect!!
God bless you & us!!
Thanks a lot! This saved my day
Thank you, your blog looks very nice!
I’m a newbie of PHP and found these words on manual:
———-
A string is series of characters. Before PHP 6, a character is the same as a byte. That is, there are exactly 256 different characters possible. This also implies that PHP has no native support of Unicode. See utf8_encode() and utf8_decode() for some basic Unicode functionality.
- PHP Manual: Language Reference -> Types -> Strings
———-
I don’t understand that! So I searched it on google and found your article. But you have not even mention that.
According to the manual, PHP strings are ‘exactly 256 characters’, that is to say you can’t create a string like ‘汉字’ in PHP. But I do did it in my code.
It works! That puzzled me. Could I have some suggestions from you?
I’m create website in the byethost.com and then import my wordpress blog in to it which is using unicode ” Sinhala Language in Sri Lanka” but all post of it display like “?????”. How to cahge DB type to unicode
@ sanjeew
I think there is wordpress plugin for solution to this problem.
how can i find this and its name please
@ sanjeew
use google.com and find it yourself
excellend, great … personally, I only needed to add
mysql_query(“set names ‘utf8′”,$dbLink);
and it worked a treat (already had the charset one)!
Dear Sir,
You have done excellent job becuase you have worked exactly to counter the problem.
I m also oracle database administrator and developer and now a days I have a website in pashto language.
To conver this into php I am learning php as I have lot of knowledge about sql & database management so that is not problem for me.
It will not be exegeration if I tell you I became your FAN. I want your personal email id to discuss lot of php and database issues with you.
and in the last sir,
How can we store wraped paragraphs like GHAZALS in sql and than retriew those records in php the same way we entered or if we have any way to wrap the paragraph in php programaticllay.
i want to use this code for my site with Turkish,English and Russian languages. But i can not work this code foru Turkish characters like “ıöü”
@ Kader
It should work for any language.
I think you are missing one or more steps. Give a try and test!
hello friends,
i am created one website but i need to develop in tamil how to develop in step by step…
by mani
Yes, you are right. I have some problem in my pages, i check step by step what i do. And correct the problem.
Thank a lot for this article.
hello u r the genius in this conversion.
i have one problem i making a dictionary to search words…the words are in punjabi language and its not ur working…
after applying ur code..
plz help
Regards
Narinder
dear sir
i want to make a search from database by words
my code is:::
mysql_query(“SET character_set_results=utf8″, $this->dbLink);
$this->sqlQuery = “select * from view where Title like ‘%”.$searchterm.”%’ or Description like’%”.$searchterm.”%’”;
$this->dbResult = mysql_query($this->sqlQuery, $this->dbLink);
while($this->dbRow = mysql_fetch_object($this->dbResult))
{
echo ” . $this->convertToLocalHtml($this->dbRow->Title) . ”;
echo ‘ ‘ . $this->convertToLocalHtml($this->dbRow->Description) . ”;
}
but the query not working.
@ Narinder
I think you should use “Step Six: SEARCHING DATA FROM THE DATABASE” for searching.
I can’t thank you enough. Your article made my life as a fellow developer so mush easier. May you find success and wealth as a result of your generosity with your knowledge.
I want to insert my web form arabic input into mysql db by php Please please please help me how can i do this
Please if you can show me the complete source code of html mysql db and php I will be very thank full to you Mr. Admin
@ Sayed Karim
Did you follow all the steps and observed the result?
If you follow the above steps, i think you can fix it.
Hai Sir,
The above code is not working, i did according to your instructions but i dont know why….?
in phpmyadmin i have selected for my database as well as table utf_unicode_ci.
my code is:
define(‘HOSTNAME’, ‘localhost’);
define(‘USERNAME’, ‘root’);
define(‘PASSWORD’, ”);
define(‘DATABASE’, ‘bulb_china’);
$dbLink = mysql_connect(HOSTNAME, USERNAME, PASSWORD);
mysql_query(“SET character_set_results=utf8″, $dbLink);
mb_language(‘uni’);
mb_internal_encoding(‘UTF-8′);
mysql_select_db(DATABASE, $dbLink);
mysql_query(“set names ‘utf8′”,$dbLink);
if(isset($_POST['add_country_cost']))
{
$country_name = $_POST['country_name'];
$cost_kilowatt = $_POST['cost_kilowatt'];
$currency = $_POST['currency'];
mysql_query(“SET character_set_client=utf8″, $dbLink);
mysql_query(“SET character_set_connection=utf8″, $dbLink);
$sql_query = “insert into country_dollarcost (`id`, `country_name`, `currency`, `cost`) values (”, ‘$country_name’, ‘$currency’, ‘$cost_kilowatt’)”;
mysql_query($sql_query, $dbLink);
Can U give suggestions for this
Thanks & Regards
Rajeshkumar
grajeshkumar.0186@gmail.com
@ Rajeshkumar
I don’t see any problem in your PHP code.
But other than PHP, there are other steps you must follow.
Please follow each and every step one by one.
That should solve the problem.
I’m NEW to this stuff, but very happy with your code; it’s worked ok, except when do searching. I’m from Vietnam anyway. the search text only work with NO-ACCENT text; for example when I search for “Tháng Bảy (July)”, it return “Thang Bay” only.
Could you point out what is the problem?
Many thanks.
@ johnypham
I guess you are missing somewhere in the process or line(s) of code.
Hello sir,
I am doing trying to work on unicode is it possible for me to store in unicode_ci and show information into english or say slovak. Is it possible please guide on this.
@ sunil rana
if you follow the above steps, you can store data in your desired language.
But I am not sure about unicode_ci. You can give a try and see what comes up.
Smokey, u saved my day with that bloody htmlentitys… thanks a bunch man
yippieeee
and tnx to the author ofcorse, but my preoblem from the start were entitys
I’m new to PHP and tried to make website in Nepali language with database.
Now I’m facing problem in adding , updating the records.
mostly in case when we have some thing ” ‘ ” this symbol in between entries as this is one of the imp symbol used in Nepali fonts.
Please help me out with this.
@ Pankaj
I think this is because your PHP version has configuration that restricts the use of single quote( ‘ )
Some servers accepts direct us of ( ‘ ) and some others do not.
Search google how to handle this.
enter english letter and match with chines charater and search for exect match
%character%
please guide me how i can do this
and thants for this solution i had implemented succfully.
I have some peoblems with russian, I try utf-8, utf8_unicode_ci and everything what is possible , but i can’t fix this problem, please can you help me? thank you!
@ Anahit
If you follow the steps one by one, Russian language supporting problem should be solved.
Try all the steps carefully!
And that’s it!
Sir,
I followed your article.It is very nicely inserting the values in Mysql in local language.But when i tried to retrieve the values through an array,
I got the output as “अब मेरा ” . canm you possibly brief me the reasons for the same .If you want I can send u the source code if you mention your email id .
thanks
@ yogesh
Welcome…I think you are missing something…follow each and every step carefully.