BlogKontaktTagcloud

Make it human (or how to crack a CAPTCHA)

A CAPTCHA is a picture that should be able to separate a computer from a human. So a human should be able to read the content, but a computer should not. But sometimes it's pretty easier to make your computer look human by solving CAPTCHA's. There are some quite good CAPTCHA's out there, e.g. reCAPTCHA from google, but some people still prefer to write their own. During a boring weekend I tried to show how easy such a self written CAPTCHA is to crack.

As I don't want to offend the creator of this CAPTCHA and reveal his identity, I did create a emulator for this CAPTCHA on my own server which does just randomly deliver one of 100 downloaded CAPTCHA's. The address of the emulator is http://leo.buettiker.org/captcha/emulator.php. Such a CAPTCHA is delivered as animated gif and looks like the following picture:

So for cracking it i did use PHP as a scripting language with the WideImage for the image handling, Tesseract-OCR for orc and ImageMagick for handling with the gif. At the start let us define some variable and import the WideImage script.

<?php
include ".\image\WideImage.php";

$path = './captcha/';
$tmpCaptchaName = 'captcha.gif';
$resultFile = 'result.jpg';
$imagemagickconvert ='C:\Users\Leo\Desktop\PHP\ImageMagick-6.6.6-4\convert.exe';
$tesseract = '"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"';
$img = null;
$picOffStep = 4;
$initialOff = 51;
$picOff = $initialOff;

Then we need to download the image with the script. We use curl for this. We set CURLOPT_COOKIESESSION to 1 to get each time a new image and not stick to one session. We save the image to the disk for later use.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://leo.buettiker.org/captcha/emulator.php");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_COOKIESESSION, 1); 
$content=curl_exec($ch);
file_put_contents($path.$tmpCaptchaName, $content);
curl_close($ch);

Then we create an image in which we save the CAPTCHA-word. It's saved in the $img variable. With an image tool we see that the animated image consist out of 38 single pictures. As we only need the first part of the animation (spot goes from left to right) we iterate over the first 19 steps of the animation. With the command

convert captcha.gif[0] captcha0.jpg

We are able to extract to first picture of the animation. Just right afterwards we load the image again and crop a rectangular part of the spot out and insert it into the image. The position of the spot is found manually with an image program and is hardcoded. This is all done in the following snippet.

$img = WideImage::createPaletteImage(91,27);
for($i = 0; $i < 19; ++$i) {
	echo ".";
	$file = "image$i.jpg";
	exec ($imagemagickconvert.' '.$path.$tmpCaptchaName.'['.$i.'] '.$path.$file);
	$img = $img->merge(WideImage::load($path.$file)->crop($picOff,9,23,27),$picOff-$initialOff);
	$picOff += $picOffStep;
}
echo "\n";
$img->saveToFile($path.$resultFile);

Now we have a picture like the following saved on the disk:

We now use Tesseract-OCR to do ocr on the image. We do this with the command:

$tesseract.exe result.jpg toutput -l eng letters

as we now that the words in the CAPTCHA consists only out of digits and lowercase letter we save this character in the letters config file. This looks like:

tessedit_char_whitelist 0123456789abcdefghijklmnopqrstuvwxyz

To do this automatically, we add the following lines to our script:

exec($tesseract.' '.$path.$resultFile.' '.$path."toutput -l eng letters");
echo preg_replace('/\s/', '', file_get_contents($path."toutput.txt"))."\n";

Now we get an output that looks like:

...................
Tesseract Open Source OCR Engine with Leptonica
fpbmd

It would be now easy to use this extracted information to submit a form. But on the other hand, how wants do something like this :-) The script works not 100% reliable. Especially the difference between z's and 2's is sometimes hard for tesseract. But probably this could optimize with trainings or you just ignore all CAPTCHA with one of this letters and try it with a other one.

The learning of this is surely that it is not too hard to crack homebrewed CAPTCHA's. So I would recommend that you stay whenever you can with a well know CAPTCHA like reCAPTCHA. On the other hand a CAPTCHA is never realy safe, according to wikipedia letting humans solve CAPTCHA's can be quite cheap.

Ähnliche Beiträge:
Jira status
Mailstatus in Skype
PHP Quine
What's php like?
Zend Framwork 1.5 is out
Comments (0)  Permalink

Jira status

After my last post I thought it might be also helpful to publish how many open jira tickets I have to my skype status.

You can get each jira search result also as a rss feed. Your browser does indicate the link to the result as rss. This url might look something like:

http://jira.example.com/sr/jira.issueviews:searchrequest-xml/temp/SearchRequest.xml?&&resolution=-1&assigneeSelect=specificuser&assignee=leo.buettiker&sorter/field=priority&sorter/order=DESC&tempMax=100&reset=true&decorator=none

To call this url, even if you have no valid session, you might add your user credentials at the end. This looks like:

&os_username=$username&os_password=$password

You could now use a xml or rss parser to interprete the returned feed. But for my result even that is too much, I only will count how many items in the feed are. The php snippet to do this will look like:

$jiraRss = file_get_contents($url);
$jiraCount = substr_count($jiraRss,'<item>');	
$jiraMessage = $jiraCount?" and $jiraCount open Jira Issues":"";

There might be a lot of other cool usecases you can simply implement (ticket you currently work on, Tickets closed int the last week, etc.). It's just a little bit sad that there is no REST API for Skype which would be make it easier to change the status between platforms.

Ähnliche Beiträge:
Mailstatus in Skype
Make it human (or how to crack a CAPTCHA)
PHP Quine
What's php like?
Zend Framwork 1.5 is out
Comments (0)  Permalink

Mailstatus in Skype

You all know the troubles with overflowing inboxes. I'm a bit fan of Inbox Zero and I found a lot of ways to work fast with my mails. I did switch off signaling ingoing mails, I use a lot of filtering and a good folder structure.

But sometimes my own lazyiness get into my way. So I started to put "Inbox Zero" into my skype status if I get my box empty. But after some times I decided to automatised this message.

I do know that Patrice does automated Skype updates with his Mac. After a quick search I found out that on Windows Skype has a COM-Api and they even provide a little PHP Example. With PHP it is also pretty easy so to acess an IMAP inbox (MS Exchange also provide a IMAP access). So I wrote a quick script that updates my Skype-Message:

$mail = imap_open('{mail.example.com}INBOX','leo.buettiker', 'password');

// Create a Skype4COM object:
$skype = new COM("Skype4COM.Skype");

// Create a conversion object:
$convert = $skype->convert;
$convert->language = "en";

// Start the Skype client:
if (!$skype->client()->isRunning()) {
  $skype->client()->start(true, true);
}


while(true) {
	imap_check($mail);
	$number = imap_num_msg($mail);
	$skype->CurrentUserProfile()->MoodText= 
		"Leo has currently $number mails in his inbox";
	sleep(5);
}

This does not only demonstrate how you can overcom your own lazyiness with open comunication and automated tools. It's in my point of view also a nice example what it's possible with PHP outside of the classical website rendering.

Ähnliche Beiträge:
Jira status
Make it human (or how to crack a CAPTCHA)
PHP Quine
What's php like?
Zend Framwork 1.5 is out
Comments (0)  Permalink

PHP Quine

Ok, Mirko made me again losing a hell lot of time. He wrote a about his implementation of a quine in Ruby. Quines are just programmes that can replicated themselves without opening a file (also not itself, 'cause that would be too easy in PHP). As usual I had to try this in PHP myself. I found the article from Patrick Schneider very helpfully. He explains a quite cool approach with a base64-encoded-dna pretty clear. I just wrote the solution a bit shorter which brought it down to 159 chars (you have to have it all on one line):


<?=($dna='PD89KCRkbmE9JyonKT9zdHJfcmVwbGFjZShjaHIoNDIpLCAkZG5hLCBiYXNlNjRfZGVjb2RlKCRkbmEpKTonJz8+Cg==n')?
str_replace(chr(42), $dna, base64_decode($dna)):''?>

Unfortunately Mirko did not allow my copy-past solution (damn academics!). And for myself the solution with a generator is not too natural, as using another program to generate a quine is probably not like it was supposed to be. So with help of diff I tried to find my own solution:

php quine | diff -u quine -

I still nearly got a knot in the brain (much nicer in swiss german: "chnopf im chopf"). But after some trying I did had a solution which is with 113 characters even shorter:


<?=($a=array (
  0 => '<?=($a=',
  1 => ')?$a[0].var_export($a,1).$a[1]:"";',
))?$a[0].var_export($a,1).$a[1]:"";

By the way, as a nice start for the language of your choice you should look in the messy c2-wiki (although not all solution there might be work).

Ähnliche Beiträge:
Make it human (or how to crack a CAPTCHA)
Jira status
Mailstatus in Skype
What's php like?
Zend Framwork 1.5 is out
Comments (1)  Permalink

What's php like?

Lots of functions listed on PHP.net- thank god!

[stumbled over this on twitter]
Ähnliche Beiträge:
Make it human (or how to crack a CAPTCHA)
Jira status
Mailstatus in Skype
PHP Quine
Zend Framwork 1.5 is out
Comments (0)  Permalink

Zend Framwork 1.5 is out

I know, that's definitely old news. But still it's worth to mention that the Zend Framework 1.5 is out since some weeks. It's a big jump from Zend 1.0 but also they have a lot new features in there (and probably some Zend Developers drink too much Java). They have also a new and cooler website for the project now.

In my point of View specialy the improvment in Zend MVC makes the framework now usable for companies with a lot of developers working on the same project (without patching the code over and over again).

The full list of improvments:
  • New Zend_Form component with support for AJAX-enabled form elements
  • New action and view helpers for automating and facilitating AJAX requests and alternate response formats
  • Infocard, OpenID, and LDAP authentication adapters
  • Support for complex Lucene searches, including fuzzy, date-range, and wildcard queries
  • Support for Lucene 2.1 index file format
  • Partial, Placeholder, Action, and Header view helpers for advanced view composition and rendering
  • New Zend_Layout component for automating and facilitating site layouts
  • UTF-8 support for PDF documents
  • New Nirvanix, Technorati, and SlideShare web services
Ähnliche Beiträge:
Make it human (or how to crack a CAPTCHA)
Jira status
Mailstatus in Skype
PHP Quine
What's php like?
Comments (0)  Permalink

Coding Contest addicted

As I allready mentioned I can't let my finger from coding contest. Unfortunately Bob found in a comment in my blog more nasty stuff about links in html comments which makes parsing even harder.

I trimed my script again under the size of the original script (ok, nearly the original), but I think if my regex skills would be a bit better, I could still squeeze some bytes out of it. But as I go finaly to holiday tomorow I will send my script to Paul and hope to get some points for the shortest script, as it will definitely not win any price for speed or beauty (did not wrote so ugly code since ages).

BTW: If you still trim you script, I brought up a new testfile. You should still come up with the same 11 links. This testfile is so ugly that my old konqueror is not able to parse it correct (but the comments are absolutly valid, according to the documentation and the validator).
Ähnliche Beiträge:
Make it human (or how to crack a CAPTCHA)
Jira status
Mailstatus in Skype
PHP Quine
What's php like?
Comments (0)  Permalink

Coding Contest

Unfortunatly I can not resist if somebody brings up a coding contest. This time Travis and Paul wrote about the coding contest of php architect at planet-php. I did not invest a lot time into it, but still ways more then I planed.

The problem is that the ranking is once by speed and once by size of the script. Two parameters which usually not go well together. After having some great ideas for speeding up my code (even parallel processing, shared memory and map-reduce came to my mind) I decide to let this race to others and fully concentrate to the size. I not even run benchmarks anymore.

Unfortunatly some nasty html special cases (whitespace, case independence, single- and double-quoting, various attributes and so on) blow my perfect sized script a bit. But with some nasty php method tricks it's hopfully still the shortest possible script that gets all valid cases.

Just to let you feel not to save, I wrote a littel nasty html example that might break your own script. (You should get exactly 10 11 links out of it.)
Ähnliche Beiträge:
Make it human (or how to crack a CAPTCHA)
Jira status
Mailstatus in Skype
PHP Quine
What's php like?
Comments (11)  Permalink

Array instead of switch-case in php

First of all, be warned, this article has no pratical relevance. It even might guide you to bad code. But this week it just popped into my mind that I could use an array instead of a switch-case construct. So see how we can do this. This is the example for the switch in the php manual.

switch ($i) {
case
0:
echo
"i equals 0";
break;
case
1:
echo
"i equals 1";
break;
case
2:
echo
"i equals 2";
break;
}

Now I'm able to implement this in a array, for that I put the code for every case statement in a arrayfield Afterwards I can access the field over the parameter and execute the code in it with eval.Here's the example: (Take care to not forget the semicolon in the code string)

$case[0] = "echo \"i equals 0\";";
$case[1] = "echo \"i equals 1\";";
$case[2] = "echo \"i equals 2\";";
eval($case[$i]);

Looks pretty, but what to do if you have to do the default statment. Nothing easier then that, we just have to look if eval goes ok and if not we do something after AND-short circuit:

eval($case[$i]) === false && print("default");
Ähnliche Beiträge:
Make it human (or how to crack a CAPTCHA)
Jira status
Mailstatus in Skype
PHP Quine
What's php like?
Comments (3)  Permalink

Webserver market share (or "I only believe in statistics that I doctored myself.")

First of all: "PHP 4 is dead, finally!". There are only security fixes until my birthday (2008-08-08), nothing more. I read a lot about this in the last week on the planet (for example "So long, and thanks for all the fish!"). This article brought me to the statistic that only 25 percent of all php hosts run allread PHP 5, so a hell lot of hosts to update during the next months.

In this statistic it is also mentioned that only 50% of all internet hosts are run by asp or php so I tried to figure out how owns the other half of the cake. I tried to find out on netcraft and I didn't find it. But I stumbled over this article. It make me think. It's strange that domain parking decide about the market share of a webserver. This brings me again to "I only believe in statistics that I doctored myself" (which is defintly not from Churchill).
Ähnliche Beiträge:
Make it human (or how to crack a CAPTCHA)
Jira status
Mailstatus in Skype
PHP Quine
What's php like?
Comments (16)  Permalink
Next1-10/28