BlogKontaktTagcloud

Make it human (or how to crack a CAPTCHA)

A CAPTCHA is a picture that should be able to separate a computer from a human. So a human should be able to read the content, but a computer should not. But sometimes it's pretty easier to make your computer look human by solving CAPTCHA's. There are some quite good CAPTCHA's out there, e.g. reCAPTCHA from google, but some people still prefer to write their own. During a boring weekend I tried to show how easy such a self written CAPTCHA is to crack.

As I don't want to offend the creator of this CAPTCHA and reveal his identity, I did create a emulator for this CAPTCHA on my own server which does just randomly deliver one of 100 downloaded CAPTCHA's. The address of the emulator is http://leo.buettiker.org/captcha/emulator.php. Such a CAPTCHA is delivered as animated gif and looks like the following picture:

So for cracking it i did use PHP as a scripting language with the WideImage for the image handling, Tesseract-OCR for orc and ImageMagick for handling with the gif. At the start let us define some variable and import the WideImage script.

<?php
include ".\image\WideImage.php";

$path = './captcha/';
$tmpCaptchaName = 'captcha.gif';
$resultFile = 'result.jpg';
$imagemagickconvert ='C:\Users\Leo\Desktop\PHP\ImageMagick-6.6.6-4\convert.exe';
$tesseract = '"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"';
$img = null;
$picOffStep = 4;
$initialOff = 51;
$picOff = $initialOff;

Then we need to download the image with the script. We use curl for this. We set CURLOPT_COOKIESESSION to 1 to get each time a new image and not stick to one session. We save the image to the disk for later use.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://leo.buettiker.org/captcha/emulator.php");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_COOKIESESSION, 1); 
$content=curl_exec($ch);
file_put_contents($path.$tmpCaptchaName, $content);
curl_close($ch);

Then we create an image in which we save the CAPTCHA-word. It's saved in the $img variable. With an image tool we see that the animated image consist out of 38 single pictures. As we only need the first part of the animation (spot goes from left to right) we iterate over the first 19 steps of the animation. With the command

convert captcha.gif[0] captcha0.jpg

We are able to extract to first picture of the animation. Just right afterwards we load the image again and crop a rectangular part of the spot out and insert it into the image. The position of the spot is found manually with an image program and is hardcoded. This is all done in the following snippet.

$img = WideImage::createPaletteImage(91,27);
for($i = 0; $i < 19; ++$i) {
	echo ".";
	$file = "image$i.jpg";
	exec ($imagemagickconvert.' '.$path.$tmpCaptchaName.'['.$i.'] '.$path.$file);
	$img = $img->merge(WideImage::load($path.$file)->crop($picOff,9,23,27),$picOff-$initialOff);
	$picOff += $picOffStep;
}
echo "\n";
$img->saveToFile($path.$resultFile);

Now we have a picture like the following saved on the disk:

We now use Tesseract-OCR to do ocr on the image. We do this with the command:

$tesseract.exe result.jpg toutput -l eng letters

as we now that the words in the CAPTCHA consists only out of digits and lowercase letter we save this character in the letters config file. This looks like:

tessedit_char_whitelist 0123456789abcdefghijklmnopqrstuvwxyz

To do this automatically, we add the following lines to our script:

exec($tesseract.' '.$path.$resultFile.' '.$path."toutput -l eng letters");
echo preg_replace('/\s/', '', file_get_contents($path."toutput.txt"))."\n";

Now we get an output that looks like:

...................
Tesseract Open Source OCR Engine with Leptonica
fpbmd

It would be now easy to use this extracted information to submit a form. But on the other hand, how wants do something like this :-) The script works not 100% reliable. Especially the difference between z's and 2's is sometimes hard for tesseract. But probably this could optimize with trainings or you just ignore all CAPTCHA with one of this letters and try it with a other one.

The learning of this is surely that it is not too hard to crack homebrewed CAPTCHA's. So I would recommend that you stay whenever you can with a well know CAPTCHA like reCAPTCHA. On the other hand a CAPTCHA is never realy safe, according to wikipedia letting humans solve CAPTCHA's can be quite cheap.

Ähnliche Beiträge:
Jira status
Mailstatus in Skype
PHP Quine
What's php like?
Zend Framwork 1.5 is out
Comments (0)  Permalink