Category Archives: MySQL

HTML forms, PHP, MySQL and UTF-8 – solving the unnerving MS Word special characters problem

Ever had the problem that you paste copy from a MS Word document into a webform, hit the save or submit button and the outcome somehow looks like this…

�The quick brown fox � and so on and so forth�

…when it’s supposed to look like this…

“The quick brown fox – and so on and so forth”

…? Yes, I am specifically talking about the fancy quitation makrs and the elongated dash or hyphen that MS word automatically uses.

Well, welcome to the crazy world of character encoding. Just search the web and you’ll come across gazillion of blog posts and forum entries. All have a set of suggestions and it seems to work for some people while it doesn’t or others. Obviously I had the problem as well, otherwise I wouldn’t write about it now. Now how did I finally manage to solve the issue?

1. Set the correct character encoding on the page header

That was one of the suggestion I read most often, simply because it is the most important rule to follow. The html page that is supposed to display text with non-standard characters needs to be told that it should use the UTF-8 unicode character encoding.

 

Was my problem solved? Nope. Not yet.

2. Set the correct character encoding for html forms

The data that I was trying to display was actually extracted from a MySQL database which itself was populated through an HTML form. When I checked the database entries I realised that the data in there was already screwed up. So I suspected the HTML form to submit the data in an incorrect character encoding. So next step was to set the correct character encoding for the html forms submitting data to MySQL:

 

Note the bit at the end of the opening form tag, i.e. accept-charset=”utf-8″.

Was my problem solved? Nope. Not yet.

But when I sent the php file’s form data to itself and echoed the POST data, it displayed the special characters correctly. So the problem had something to do with the PHP/MySQL interface or MySQL itself.

3. Pass the data from PHP to MySQL with the correct character encoding

After reading lots of stackoverflow.com posts I ended up adjusting my data handling PHP scripts (i.e. the script that adds the form data to the database) to set the character encoding straight after the script the database connection and before the database selection:

$link=mysql_connect('localhost', 'usr', 'pass');
mysql_set_charset('utf8',$link);

Was my problem solved? Nope. Not yet.

4. Make sure the MySQL database & tables are set to use utf8 and utf_unicode_ci

It seems you can screw up a lot with the wrong settings in MySQL. First of all, you can chose the character encoding of your database as well as your individual table fields. On top of that, you then have to  chose the collation of the character set which, as far as I understand it, sets which alphabet will be used and which special characters it contains.

The biggest problem I had was that even though I used utf-8 as main character encoding in both database and table field settings and used utf8_bin as the collation, simply because I didn’t know it any better. But the copy was still stored in the messed up state. And that’s where I started to pull my hair out. I read tons of posts but they more or less all told me to do what I had done already (see points 1-3). Okay, after I while I noticed that it might be the collation that causes the problem and I set everythin to utf8_general_ci, simply because everybody and their dogs seemed to use it.

But it did not solve my problem.

Finally, I stumbled accross this post which to me suggested that the collation uft8_unicode_ci supports more special characters than utf8_general_ci (at the cost of performance). At that stage I couldn’t care less about performance. I just wanted the blood MS word quotation marks and hypens to render. So I changed database and table field settings to

character encoding: utf8
collation:          utf8_unicode_ci

And finally, the stuff ended up in the database just the way MS word threw it in the clipboard. That was it. Simple enough… once you know it.

P.S. I probably should point out that leaving out any of the above 4 points results in garbled text again.

MySQL remote user access not working on Ubuntu + Plesk

Last week one of my mate’s private vServer was hacked, hijacked and used for DoC attacks which prompted his provider (Strato) to disable the machine and recommended a reinstallation of the OS. That was not as bad as it sounds, as all data on the server were on backup and the OS was outdated to boot (OpenSuse 9!).

The vServer was reinstalled with Ubuntu 10.04 LTS and Plesk 10. After all data were back in place, all sites back up and running, all that was missing was the remote connection to the MySQL server. My mate uses Navicat (as do I, btw) to remotely manage his databases. So a new user was added to MySQL with remote access rights, the Navicat settings were altered accordingly and… a connection could not be established. D’Oh!

What went wrong? Well, I had a look at the issue. First off, I checked the [mysqld] settings in the MySQL configuration (aka my.cnf).

user            = mysql
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
bind-address    = 0.0.0.0
basedir         = /usr
datadir         = /var/lib/mysql
tmpdir          = /tmp
skip-external-locking

That looked all good. The standard port was set to 3306, the bind-address was set to listen to all incoming traffic (although a security related “no-no” in my opinion), and no trace of the “skip-networking” directive that might have negated the bind-address settings.

Just to be on the safe side I restarted the mysql server (service restart mysql), but Navicat would still not connect.

Next stop was to have a look at the output generated by iptables -L and there I found this:

DROP       tcp  --  anywhere             anywhere            tcp dpt:mysql

Hmm. It seems that tcp connections to MySQL are dropped for some reason. Probably not a bad idea to keep that setting (and set my.cnf’s bind-address to 127.0.0.1) just to avoid new system intrusions via the MySQL route.

Of course that meant that the Navicat connection problem remained… that is, as far as a direct connection to the MySQL server is concerned. However, Navicat permits to connect to MySQL via SSH tunneling*. And using that option works like a treat minus the security risks of an open MySQL server.

* A bit confusing is that you have to manually set the MySQL Server address to “localhost” in Navicat’s main connection property window. I would have thought that the software would choose this setting automatically, if SSH tunneling is activated. Well, just a minor nuisance.

MySQL and the mystery of wrong numbers… always check your numeric types!

Doh! Another half hour wasted!

I just spend the better part of half an hour searching for a bug in my PHP scripts that update a MySQL database via an AJAX call. One of the fields in the database table is numeric, integer to be precise. When I tried to update the field with the value 228, the database would always change the value to 127. Why? All the other queries worked correctly. The field happily accepted a 48, a 85 and a 126. But the 228 always turned out as a 127.

Of course, it’s a late Friday afternoon and technically I should be doing more mundane things like writing a blog post, doing my timesheets or having a beer or two. But instead I was stuck with this issue. Just fix this issue and the scripts were good to go! Just this one tiny problem. Must be a typo somewhere in the PHP script. Or is jQuery somehow tinkering with the input field value before the form data is being sent? What could it be? What, what, what?!?

Then, all of a sudden, something made click and I couldn’t avoid a solid facepalm. I checked the database setup and found that the numeric type for the field was set to “tinyint” instead of “smallint”. And as we all should know unsigned tinyint’s maximum value is 127. Everything greater than that will be “rounded” to 127.

Now the field type is correctly set to “smallint” and my script is working. I just wish I could get that wasted half hour back and use it for mundane things like writing a blog post about those little blackouts that often cost too much time.

Oh well, time for a beer…