HTML forms, PHP, MySQL and UTF-8 – solving the unnerving MS Word special characters problem

Ever had the problem that you paste copy from a MS Word document into a webform, hit the save or submit button and the outcome somehow looks like this…

�The quick brown fox � and so on and so forth�

…when it’s supposed to look like this…

“The quick brown fox – and so on and so forth”

…? Yes, I am specifically talking about the fancy quitation makrs and the elongated dash or hyphen that MS word automatically uses.

Well, welcome to the crazy world of character encoding. Just search the web and you’ll come across gazillion of blog posts and forum entries. All have a set of suggestions and it seems to work for some people while it doesn’t or others. Obviously I had the problem as well, otherwise I wouldn’t write about it now. Now how did I finally manage to solve the issue?

1. Set the correct character encoding on the page header

That was one of the suggestion I read most often, simply because it is the most important rule to follow. The html page that is supposed to display text with non-standard characters needs to be told that it should use the UTF-8 unicode character encoding.

 

Was my problem solved? Nope. Not yet.

2. Set the correct character encoding for html forms

The data that I was trying to display was actually extracted from a MySQL database which itself was populated through an HTML form. When I checked the database entries I realised that the data in there was already screwed up. So I suspected the HTML form to submit the data in an incorrect character encoding. So next step was to set the correct character encoding for the html forms submitting data to MySQL:

 

Note the bit at the end of the opening form tag, i.e. accept-charset=”utf-8″.

Was my problem solved? Nope. Not yet.

But when I sent the php file’s form data to itself and echoed the POST data, it displayed the special characters correctly. So the problem had something to do with the PHP/MySQL interface or MySQL itself.

3. Pass the data from PHP to MySQL with the correct character encoding

After reading lots of stackoverflow.com posts I ended up adjusting my data handling PHP scripts (i.e. the script that adds the form data to the database) to set the character encoding straight after the script the database connection and before the database selection:

$link=mysql_connect('localhost', 'usr', 'pass');
mysql_set_charset('utf8',$link);

Was my problem solved? Nope. Not yet.

4. Make sure the MySQL database & tables are set to use utf8 and utf_unicode_ci

It seems you can screw up a lot with the wrong settings in MySQL. First of all, you can chose the character encoding of your database as well as your individual table fields. On top of that, you then have to  chose the collation of the character set which, as far as I understand it, sets which alphabet will be used and which special characters it contains.

The biggest problem I had was that even though I used utf-8 as main character encoding in both database and table field settings and used utf8_bin as the collation, simply because I didn’t know it any better. But the copy was still stored in the messed up state. And that’s where I started to pull my hair out. I read tons of posts but they more or less all told me to do what I had done already (see points 1-3). Okay, after I while I noticed that it might be the collation that causes the problem and I set everythin to utf8_general_ci, simply because everybody and their dogs seemed to use it.

But it did not solve my problem.

Finally, I stumbled accross this post which to me suggested that the collation uft8_unicode_ci supports more special characters than utf8_general_ci (at the cost of performance). At that stage I couldn’t care less about performance. I just wanted the blood MS word quotation marks and hypens to render. So I changed database and table field settings to

character encoding: utf8
collation:          utf8_unicode_ci

And finally, the stuff ended up in the database just the way MS word threw it in the clipboard. That was it. Simple enough… once you know it.

P.S. I probably should point out that leaving out any of the above 4 points results in garbled text again.

Installing the GeoIP extension on Cent OS 5 running WHM/cPanel

We’re running a managed dedicated server hosted by Liquidweb. Those guys are usually a pretty good bunch, but sometimes it shows that they are also just mere mortals. Last weekend I asked them to upgrade MySQL from version 4 to 5 and I also casually asked whether our server OS could be upgraded from Cent OS 4 to 5. The weekend was coming up so it seemed like a good idea to get that underway.

Well, it turned out they actually had to upgrade the OS in order to get MySQL 5 to work. So the server was reinstalled, all hosting accounts re-imaged etc. Unfortunately, none of the PHP/server extensions – such as ImageMagick and GeoIP – survived the upgrade. No sweat, I thought, logged into our WHM site and tried to install the missing modules there. But the PECL installation always failed. So I contacted Liquidweb again – but for the first time ever I did not hear back from them for more than an hour (as I said, they’re usually pretty good, but I guess even they need a break every now and then).

It was kind of unnerving that some of our websites were not working because the extensions were missing. So I took it upon myself to install the stuff with the aid of Google and my moderate Linux knowledge. While imagemagick proved to be a piece of cake following directions given here, GeoIP was a somewhat tougher nut to crack. That was because the installation instructions provided here did not work for me as our yum repositories did not know anything about GeoIP. But at least I knew which packages were necessary:

GeoIP
GeoIP-devel
GeoIP-data

Since the server was running Cent OS 5 now, I needed the correct rpms. God was I glad that a site like http://rpm.pbone.net/ exists! Here I found everything I needed (careful – watch for the correct/corresponding package versions!):

http://rpm.pbone.net/index.php3/stat/4/idpl/14025435/dir/centos_5/com/GeoIP-1.4.5-1.el5.centos.x86_64.rpm.html
http://rpm.pbone.net/index.php3/stat/4/idpl/15707308/dir/centos_5/com/GeoIP-devel-1.4.5-1.el5.centos.x86_64.rpm.html
http://rpm.pbone.net/index.php3/stat/4/idpl/15706982/dir/centos_5/com/GeoIP-data-20090201-1.el5.centos.i386.rpm.html

I logged into our server via SSH and navigated to the /root/tmp/ folder where I could safely execute the installation of the packages. I downloaded the three rpms from one of the mirrors:

wget ftp://ftp.muug.mb.ca/mirror/centos/5.6/extras/x86_64/RPMS/GeoIP-1.4.5-1.el5.centos.x86_64.rpm
wget ftp://ftp.muug.mb.ca/mirror/centos/5.6/extras/x86_64/RPMS/GeoIP-devel-1.4.5-1.el5.centos.x86_64.rpm
wget ftp://ftp.muug.mb.ca/mirror/centos/5.6/extras/i386/RPMS/GeoIP-data-20090201-1.el5.centos.i386.rpm

And installed them one by one:

rpm -ivh GeoIP-1.4.5-1.el5.centos.x86_64.rpm
rpm -ivh GeoIP-devel-1.4.5-1.el5.centos.x86_64.rpm
rpm -ivh GeoIP-data-20090201-1.el5.centos.i386.rpm

With sweaty palms I then typed the command

pecl install geoip

and jumped for joy when I did not get any error messages! I restarted apache via the WHM panel and – woohoo! – GeoIP extension worked as it should!

MySQL remote user access not working on Ubuntu + Plesk

Last week one of my mate’s private vServer was hacked, hijacked and used for DoC attacks which prompted his provider (Strato) to disable the machine and recommended a reinstallation of the OS. That was not as bad as it sounds, as all data on the server were on backup and the OS was outdated to boot (OpenSuse 9!).

The vServer was reinstalled with Ubuntu 10.04 LTS and Plesk 10. After all data were back in place, all sites back up and running, all that was missing was the remote connection to the MySQL server. My mate uses Navicat (as do I, btw) to remotely manage his databases. So a new user was added to MySQL with remote access rights, the Navicat settings were altered accordingly and… a connection could not be established. D’Oh!

What went wrong? Well, I had a look at the issue. First off, I checked the [mysqld] settings in the MySQL configuration (aka my.cnf).

user            = mysql
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
bind-address    = 0.0.0.0
basedir         = /usr
datadir         = /var/lib/mysql
tmpdir          = /tmp
skip-external-locking

That looked all good. The standard port was set to 3306, the bind-address was set to listen to all incoming traffic (although a security related “no-no” in my opinion), and no trace of the “skip-networking” directive that might have negated the bind-address settings.

Just to be on the safe side I restarted the mysql server (service restart mysql), but Navicat would still not connect.

Next stop was to have a look at the output generated by iptables -L and there I found this:

DROP       tcp  --  anywhere             anywhere            tcp dpt:mysql

Hmm. It seems that tcp connections to MySQL are dropped for some reason. Probably not a bad idea to keep that setting (and set my.cnf’s bind-address to 127.0.0.1) just to avoid new system intrusions via the MySQL route.

Of course that meant that the Navicat connection problem remained… that is, as far as a direct connection to the MySQL server is concerned. However, Navicat permits to connect to MySQL via SSH tunneling*. And using that option works like a treat minus the security risks of an open MySQL server.

* A bit confusing is that you have to manually set the MySQL Server address to “localhost” in Navicat’s main connection property window. I would have thought that the software would choose this setting automatically, if SSH tunneling is activated. Well, just a minor nuisance.

Post to Twitter via PHP script or twitterfeed

Oh how difficult it can be to find an easy solutions. After probing the net for ways how to post stuff to Twitter I finally came across a blog post that was not outdated and is still working. Lest I never forget (until Twitter changes their authentication protocol again)!

http://tips4php.net/2010/12/twitter-oauth-the-easy-way-simple-post-to-twitter-script/

However, the above method does not provide an oauth solution if you want to post to Twitter accounts other than your personal one. To post a news feed to a client’s twitter account requires them to complete the “Get all Oauth key” quests in Twitterland and that can be too much of a grind. But fortunately there is

http://twitterfeed.com/

The good folks there provide a pretty painless Twitter/Facebook integration as long as they can tap into an RSS feed. Oh, well, and there is the tiny catch. To use bit.ly’s URL shortening service, twitterfeed users have to register and obtain a bit.ly API key. But that is reasonably straight forward and easy to explain:

1. Go to http://bit.ly and sign up for a new account
2. Go to http://bit.ly/a/your_api_key
3. Copy and paste both keys into the corresponding fields of the twitterfeed advanced settings

MySQL and the mystery of wrong numbers… always check your numeric types!

Doh! Another half hour wasted!

I just spend the better part of half an hour searching for a bug in my PHP scripts that update a MySQL database via an AJAX call. One of the fields in the database table is numeric, integer to be precise. When I tried to update the field with the value 228, the database would always change the value to 127. Why? All the other queries worked correctly. The field happily accepted a 48, a 85 and a 126. But the 228 always turned out as a 127.

Of course, it’s a late Friday afternoon and technically I should be doing more mundane things like writing a blog post, doing my timesheets or having a beer or two. But instead I was stuck with this issue. Just fix this issue and the scripts were good to go! Just this one tiny problem. Must be a typo somewhere in the PHP script. Or is jQuery somehow tinkering with the input field value before the form data is being sent? What could it be? What, what, what?!?

Then, all of a sudden, something made click and I couldn’t avoid a solid facepalm. I checked the database setup and found that the numeric type for the field was set to “tinyint” instead of “smallint”. And as we all should know unsigned tinyint’s maximum value is 127. Everything greater than that will be “rounded” to 127.

Now the field type is correctly set to “smallint” and my script is working. I just wish I could get that wasted half hour back and use it for mundane things like writing a blog post about those little blackouts that often cost too much time.

Oh well, time for a beer…

google.maps.LatLng spits out NaN

Well, this is one of those examples that shows it pays off to sleep a night over certain problems.

Yesterday afternoon I scratched my head over the Google Maps V3 API’s behaviour. All I wanted to do is change the zoom level of a map using the fitBounds() method. The latitude and longitude values were coming from a jQuery Ajax request that returned an JSON object. The object looked something like this:

{
     "bounds": {
          "sw":"-45.99579575,170.05493469",
          "ne":"-45.48679779,170.77724915"
     }
}

I tried the following script to set the new bounds:

var obj = jQuery.parseJSON(ajaxResponse);
sw = new google.maps.LatLng(obj.bounds.sw);
ne = new google.maps.LatLng(obj.bounds.ne);
newbounds = new google.maps.LatLngBounds(sw,ne);

Now, the result was that it completely caked the Map. The map literally zoomed (or panned, who knows?) to uncharted territory after which any interactions with it would not result in any updates. The map had effectively crashed.

WTF, why? I asked myself. So I checked what values google.maps.LatLng were producing. NaN. Not numbers obviously, but why? I chewed on that for half an hour before I decided to leave it for the day.

Now this morning I had a look at it again and it quickly dawned on me… the JSON data was delivered as a string but the google.map.LatLng class expects floats. So I altered the script accordingly:

var obj = jQuery.parseJSON(ajaxResponse);
var sw=obj.bounds.sw.split(','); // split string to allow parseFloat of data
var ne=obj.bounds.ne.split(',');
sw = new google.maps.LatLng(parseFloat(sw[0]),parseFloat(sw[1]));
ne = new google.maps.LatLng(parseFloat(ne[0]),parseFloat(ne[1]));
newbounds = new google.maps.LatLngBounds(sw,ne);
map.fitBounds(newbounds);

And lo and behold… it works.

Create marker with custom labels in Google Maps API v3

Took me quite some time to get to the bottom of how to put custom labels on markers (aka “placemarks”) in Google Maps API v3.

I must admit that I am quite puzzled as to why Google hasn’t built in this functionality in their google.Maps.Marker class. You can do a lot with that class like define a title via setTitle (i.e. creating a mouseover-tooltip for markers). Why a setLabel feature is missing… I don’t know. Instead it is necessary to write your own class – or at the very least search for a solution on some developers blog.

Now, finding a code snippet to create marker labels turned out to be more complicated than I thought. I always ended up on pages that dealt with MarkerClusterer, which primarily focusses on creating, well, marker clusters in maps that hold lots of markers. At first I tried to dig through the source code of MarkerClusterer to see how the marker labels were done. But I gave up when I realised that I had stared at the code for an hour without making any progress.

So finally I came across Marc Ridey’s blog post “Label overlay example for Google Maps API v3″. The post has the solution I was looking for, albeit in a bit of a chopped up form as some of the info required to understand what he’s done is actually explained somewhere in the depths of the extensive comments section. Some vital info about how to manipulate the zIndex of the label (to have it on top of the marker) and setting the custom label are also only found in the comments. Furthemore, the code snippet wasn’t styled up which made it a bit tricky to follow the code in the context of the blog post.

After some fiddling I came up with a version that suited my needs – i.e. place marker labels on top of each marker (rather then below) and have a customizable text (rather than the automatic label generation from the markers geographic position. So here it is, the slightly altered Label class:

// Define the overlay, derived from google.maps.OverlayView
function Label(opt_options) {
     // Initialization
     this.setValues(opt_options);
 
     // Here go the label styles
     var span = this.span_ = document.createElement('span');
     span.style.cssText = 'position: relative; left: -50%; top: -10px; ' +
                          'white-space: nowrap;color:#ffffff;' +
                          'padding: 2px;font-family: Arial; font-weight: bold;' +
                          'font-size: 12px;';
 
     var div = this.div_ = document.createElement('div');
     div.appendChild(span);
     div.style.cssText = 'position: absolute; display: none';
};
 
Label.prototype = new google.maps.OverlayView;
 
Label.prototype.onAdd = function() {
     var pane = this.getPanes().overlayImage;
     pane.appendChild(this.div_);
 
     // Ensures the label is redrawn if the text or position is changed.
     var me = this;
     this.listeners_ = [
          google.maps.event.addListener(this, 'position_changed',
               function() { me.draw(); }),
          google.maps.event.addListener(this, 'text_changed',
               function() { me.draw(); }),
          google.maps.event.addListener(this, 'zindex_changed',
               function() { me.draw(); })
     ];
};
 
// Implement onRemove
Label.prototype.onRemove = function() {
     this.div_.parentNode.removeChild(this.div_);
 
     // Label is removed from the map, stop updating its position/text.
     for (var i = 0, I = this.listeners_.length; i < I; ++i) {
          google.maps.event.removeListener(this.listeners_[i]);
     }
};
 
// Implement draw
Label.prototype.draw = function() {
     var projection = this.getProjection();
     var position = projection.fromLatLngToDivPixel(this.get('position'));
     var div = this.div_;
     div.style.left = position.x + 'px';
     div.style.top = position.y + 'px';
     div.style.display = 'block';
     div.style.zIndex = this.get('zIndex'); //ALLOW LABEL TO OVERLAY MARKER
     this.span_.innerHTML = this.get('text').toString();
};

To print labels I used the setMarker functions from the Google Map documentation example “Complex Icons” with the added label functionality:

var beaches = [
     ['Bondi Beach', -33.890542, 151.274856, 4],
     ['Coogee Beach', -33.923036, 151.259052, 5],
     ['Cronulla Beach', -34.028249, 151.157507, 3],
     ['Manly Beach', -33.80010128657071, 151.28747820854187, 2],
     ['Maroubra Beach', -33.950198, 151.259302, 1]
];
 
function setMarkers(map, locations) {
     var image = new google.maps.MarkerImage('images/beachflag.png',
     new google.maps.Size(20, 32),
     new google.maps.Point(0,0),
     new google.maps.Point(0, 32));
     var shadow = new google.maps.MarkerImage('images/beachflag_shadow.png',
          new google.maps.Size(37, 32),
          new google.maps.Point(0,0),
          new google.maps.Point(0, 32));
     var shape = {
          coord: [1, 1, 1, 20, 18, 20, 18 , 1],
          type: 'poly'
     };
     for (var i = 0; i < locations.length; i++) {
          var beach = locations[i];
          var myLatLng = new google.maps.LatLng(beach[1], beach[2]);
          var marker = new google.maps.Marker({
               position: myLatLng,
               map: map,
               shadow: shadow,
               icon: image,
               shape: shape,
               title: beach[0],
              zIndex: beach[3]
          });
          var label = new Label({
               map: map
          });
          label.set('zIndex', 1234);
          label.bindTo('position', marker, 'position');
          label.set('text', beach[0]);
          //label.bindTo('text', marker, 'position');
     }
}

A working example of above code can be found here.

Of course, there’s also an alternative method:

Google Maps only using paper and scissors !

Reverse Geocoding via the Open Street Map and Google Maps APIs

How do you get the corresponding street address details for a geographic location expressed as latitude and longitude data only? Oh, and it should be a server side procdure too (i.e. no JavaScript). Well, here a a couple of solutions for PHP.

1. Google Maps

The following HTTP GET request…

http://maps.google.com/maps/geo?q=37.34,-121.94&output=json&sensor=false

…spits out a JSON response from the Google servers:

{
     "name": "37.34,-121.94",
     "Status": {
          "code": 200,
          "request": "geocode"
     },
     "Placemark": [ {
          "id": "p1",
          "address": "1060 Highland Ct, Santa Clara, CA 95050, USA",
          "AddressDetails": {
               "Accuracy" : 8,
               "Country" : {
                    "AdministrativeArea" : {
                         "AdministrativeAreaName" : "CA",
                         "SubAdministrativeArea" : {
                              "Locality" : {
                                   "LocalityName" : "Santa Clara",
                                   "PostalCode" : {
                                        "PostalCodeNumber" : "95050"
                                   },
                                   "Thoroughfare" : {
                                        "ThoroughfareName" : "1060 Highland Ct"
                                   }
                              },
                              "SubAdministrativeAreaName" : "Santa Clara"
                              }
                         },
                         "CountryName" : "USA",
                         "CountryNameCode" : "US"
                   }
               },
               "ExtendedData": {
               "LatLonBox": {
                    "north": 37.3431416,
                    "south": 37.3368464,
                    "east": -121.9368374,
                    "west": -121.9431326
               }
          },
          "Point": {
               "coordinates": [ -121.9399850, 37.3399940, 0 ]
          }
     } ]
}

Accessing the JSON info via PHP is fairly easy:

$url = 'http://maps.google.com/maps/geo?q=37.34,-121.94&amp;output=json&amp;sensor=false';
$data = @file_get_contents($url);          //read the HTTP request
$jsondata = json_decode($data,true);       //parse the JSOPN response
if(is_array($jsondata) &amp;&amp; $jsondata['Status']['code']==200)    //check data
{
     $addr = $jsondata['Placemark'][0]['address'];
}
echo $addr;

There you go. There’s one catch, though. Google’s terms and conditions disallow using the JSON requests outside the context of a Google Map. So it can’t be used to automatically populate a database for example.

2. Open Street Map (OSM)

Open Street Map offers a legal alternative, albeit a limited and not as accurate one as the Google API. The JSON request URL for OSM is as follows:

http://nominatim.openstreetmap.org/reverse?format=json&lat=37.34&lon=-121.94

This returns the following – compared to the Google output rather simple – output:

{
     "place_id":"16539418",
     "licence":"Data Copyright OpenStreetMap Contributors, Some Rights Reserved. CC-BY-SA 2.0.",
     "osm_type":"way","osm_id":"8936657",
     "display_name":"Highland Court, Santa Clara, Santa Clara County, California, 95050, United States of America",
     "address":{"road":"Highland Court",
          "city":"Santa Clara",
          "county":"Santa Clara County",
          "state":"California",
          "postcode":"95050",
          "country":"United States of America",
          "country_code":"us"
     }
}

The corresponding PHP code is pretty much the same as already stated above but for clarity here is the whole block

$url = 'http://nominatim.openstreetmap.org/reverse?format=json&amp;lat=37.34&amp;lon=-121.94';
$data = @file_get_contents($url);          //read the HTTP request
$jsondata = json_decode($data,true);       //parse the JSOPN response
if(is_array($jsondata))                    //check data
{
     $addr = $jsondata['display_name'];
}
echo $addr;

As I said, the OSM data is not as accurate and for some regions doesn’t feature house numbers or suburb names. That is not to say that this won’t get better in time.

Compared to the Google request, OSM responses are quite slow, so parsing lots of addresses takes time. So PHP script execution time limits need to be considered.

IE Ajax woes – Internet Explorer caching of Ajax requests

It happened again. A stone old IE bug got me cold. For some strange reasons Internet Explorer caches Ajax requests. Which means that if database info is extracted via an Ajax GET request, Internet Explorer will store the Ajax response in its cache for the current session. So if the database is updated during the session, Internet Explorer will show the zombie values from before the database update.

Best way to avoid this is to issue an Ajax POST request instead of GET.

More info here.

Mac OS X Network & NAS – the “Unix Executable” file type problem

So we are on an Apple Network here. But rather than relying on time machine as a backup solution – which isn’t really the way to go for multiple machine backups – we’re using Retrospect. (I could write a novel about the grief that piece of software has caused me – but why bother?). And while we ran our backups for yonks onto DVDs, the stack of media just took on ridiculous dimensions. That is why we installed a NAS (LevelOne FNS-5000B, good hardware, but shoddy interface, should have gone with a QNAP), with two 1.5 Tb hard disk in a RAID1 configuration. The NAS offers multiple protocols to connect to the network.

Retrospect is maintaining its backup files via the network on the NAS. And out of convenience (and for security reasons) we also keep older backup sets on the NAS. Now some of our guys have the Retrospect software (not just the client) installed one their machine so that they can access and retrieve files from historic backup sets.

And that’s where we encountered problems on a random basis. Sometimes on the Macs Retrospect simply refused to accept the backup files on the NAS as valid Retrospect catalogues. In the finder, the files did not show up with their usual icon, but instead featured the wee terminal icon. Apple-I showed the files as “Unix Executables”. Strangey enough, this problem would occur on some machine one day, but would be miraculously fixed the next day. If it did not fix itself I used to fix the issue by using a tool called FileType to change file type and creator codes. However, for some reason that did not work more recently. Plus I never felt quite comfortable fiddling with the backup set files anyhow.

So I did a bit of research and found that the “Unix Executable” problem has something to do with Macs not being able to “read the file’s resource fork“. So I asked myself, why the resource fork was readable one day, but not the next and I figured that it might have something to do with the network protocoll the Macs were using to access the NAS.

Sure enough, when the problem reoccured today, I did an Apple-I and found that the NAS was mounted using the Samba protocoll (smb://nas/). Now since the NAS also support AFP I wondered whether connecting via that protocol would fix the problem.

So I unmounted the NAS volume on the Mac (by clicking the eject icon next to the volume name in Finder) and then rather than using the Finder to re-connect to the NAS, I pressed Apple-K to get to the Finder’s  “Connect To Server…” window and typed in afp://192.168.1.* (i.e. our NAS IP). The NAS remounted into my finder and – voila! – gone was the “Unix Executable” problem.

Why the Macs sometimes connected via smb and then again via afp is beyond me. But at least now I know how to fix the problem.