Category Archives: Coding

HTML forms, PHP, MySQL and UTF-8 – solving the unnerving MS Word special characters problem

Ever had the problem that you paste copy from a MS Word document into a webform, hit the save or submit button and the outcome somehow looks like this…

�The quick brown fox � and so on and so forth�

…when it’s supposed to look like this…

“The quick brown fox – and so on and so forth”

…? Yes, I am specifically talking about the fancy quitation makrs and the elongated dash or hyphen that MS word automatically uses.

Well, welcome to the crazy world of character encoding. Just search the web and you’ll come across gazillion of blog posts and forum entries. All have a set of suggestions and it seems to work for some people while it doesn’t or others. Obviously I had the problem as well, otherwise I wouldn’t write about it now. Now how did I finally manage to solve the issue?

1. Set the correct character encoding on the page header

That was one of the suggestion I read most often, simply because it is the most important rule to follow. The html page that is supposed to display text with non-standard characters needs to be told that it should use the UTF-8 unicode character encoding.

 

Was my problem solved? Nope. Not yet.

2. Set the correct character encoding for html forms

The data that I was trying to display was actually extracted from a MySQL database which itself was populated through an HTML form. When I checked the database entries I realised that the data in there was already screwed up. So I suspected the HTML form to submit the data in an incorrect character encoding. So next step was to set the correct character encoding for the html forms submitting data to MySQL:

 

Note the bit at the end of the opening form tag, i.e. accept-charset=”utf-8″.

Was my problem solved? Nope. Not yet.

But when I sent the php file’s form data to itself and echoed the POST data, it displayed the special characters correctly. So the problem had something to do with the PHP/MySQL interface or MySQL itself.

3. Pass the data from PHP to MySQL with the correct character encoding

After reading lots of stackoverflow.com posts I ended up adjusting my data handling PHP scripts (i.e. the script that adds the form data to the database) to set the character encoding straight after the script the database connection and before the database selection:

$link=mysql_connect('localhost', 'usr', 'pass');
mysql_set_charset('utf8',$link);

Was my problem solved? Nope. Not yet.

4. Make sure the MySQL database & tables are set to use utf8 and utf_unicode_ci

It seems you can screw up a lot with the wrong settings in MySQL. First of all, you can chose the character encoding of your database as well as your individual table fields. On top of that, you then have to  chose the collation of the character set which, as far as I understand it, sets which alphabet will be used and which special characters it contains.

The biggest problem I had was that even though I used utf-8 as main character encoding in both database and table field settings and used utf8_bin as the collation, simply because I didn’t know it any better. But the copy was still stored in the messed up state. And that’s where I started to pull my hair out. I read tons of posts but they more or less all told me to do what I had done already (see points 1-3). Okay, after I while I noticed that it might be the collation that causes the problem and I set everythin to utf8_general_ci, simply because everybody and their dogs seemed to use it.

But it did not solve my problem.

Finally, I stumbled accross this post which to me suggested that the collation uft8_unicode_ci supports more special characters than utf8_general_ci (at the cost of performance). At that stage I couldn’t care less about performance. I just wanted the blood MS word quotation marks and hypens to render. So I changed database and table field settings to

character encoding: utf8
collation:          utf8_unicode_ci

And finally, the stuff ended up in the database just the way MS word threw it in the clipboard. That was it. Simple enough… once you know it.

P.S. I probably should point out that leaving out any of the above 4 points results in garbled text again.

Post to Twitter via PHP script or twitterfeed

Oh how difficult it can be to find an easy solutions. After probing the net for ways how to post stuff to Twitter I finally came across a blog post that was not outdated and is still working. Lest I never forget (until Twitter changes their authentication protocol again)!

http://tips4php.net/2010/12/twitter-oauth-the-easy-way-simple-post-to-twitter-script/

However, the above method does not provide an oauth solution if you want to post to Twitter accounts other than your personal one. To post a news feed to a client’s twitter account requires them to complete the “Get all Oauth key” quests in Twitterland and that can be too much of a grind. But fortunately there is

http://twitterfeed.com/

The good folks there provide a pretty painless Twitter/Facebook integration as long as they can tap into an RSS feed. Oh, well, and there is the tiny catch. To use bit.ly’s URL shortening service, twitterfeed users have to register and obtain a bit.ly API key. But that is reasonably straight forward and easy to explain:

1. Go to http://bit.ly and sign up for a new account
2. Go to http://bit.ly/a/your_api_key
3. Copy and paste both keys into the corresponding fields of the twitterfeed advanced settings

MySQL and the mystery of wrong numbers… always check your numeric types!

Doh! Another half hour wasted!

I just spend the better part of half an hour searching for a bug in my PHP scripts that update a MySQL database via an AJAX call. One of the fields in the database table is numeric, integer to be precise. When I tried to update the field with the value 228, the database would always change the value to 127. Why? All the other queries worked correctly. The field happily accepted a 48, a 85 and a 126. But the 228 always turned out as a 127.

Of course, it’s a late Friday afternoon and technically I should be doing more mundane things like writing a blog post, doing my timesheets or having a beer or two. But instead I was stuck with this issue. Just fix this issue and the scripts were good to go! Just this one tiny problem. Must be a typo somewhere in the PHP script. Or is jQuery somehow tinkering with the input field value before the form data is being sent? What could it be? What, what, what?!?

Then, all of a sudden, something made click and I couldn’t avoid a solid facepalm. I checked the database setup and found that the numeric type for the field was set to “tinyint” instead of “smallint”. And as we all should know unsigned tinyint’s maximum value is 127. Everything greater than that will be “rounded” to 127.

Now the field type is correctly set to “smallint” and my script is working. I just wish I could get that wasted half hour back and use it for mundane things like writing a blog post about those little blackouts that often cost too much time.

Oh well, time for a beer…