BlogKontaktTagcloud

Make it human (or how to crack a CAPTCHA)

A CAPTCHA is a picture that should be able to separate a computer from a human. So a human should be able to read the content, but a computer should not. But sometimes it's pretty easier to make your computer look human by solving CAPTCHA's. There are some quite good CAPTCHA's out there, e.g. reCAPTCHA from google, but some people still prefer to write their own. During a boring weekend I tried to show how easy such a self written CAPTCHA is to crack.

As I don't want to offend the creator of this CAPTCHA and reveal his identity, I did create a emulator for this CAPTCHA on my own server which does just randomly deliver one of 100 downloaded CAPTCHA's. The address of the emulator is http://leo.buettiker.org/captcha/emulator.php. Such a CAPTCHA is delivered as animated gif and looks like the following picture:

So for cracking it i did use PHP as a scripting language with the WideImage for the image handling, Tesseract-OCR for orc and ImageMagick for handling with the gif. At the start let us define some variable and import the WideImage script.

<?php
include ".\image\WideImage.php";

$path = './captcha/';
$tmpCaptchaName = 'captcha.gif';
$resultFile = 'result.jpg';
$imagemagickconvert ='C:\Users\Leo\Desktop\PHP\ImageMagick-6.6.6-4\convert.exe';
$tesseract = '"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"';
$img = null;
$picOffStep = 4;
$initialOff = 51;
$picOff = $initialOff;

Then we need to download the image with the script. We use curl for this. We set CURLOPT_COOKIESESSION to 1 to get each time a new image and not stick to one session. We save the image to the disk for later use.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://leo.buettiker.org/captcha/emulator.php");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_COOKIESESSION, 1); 
$content=curl_exec($ch);
file_put_contents($path.$tmpCaptchaName, $content);
curl_close($ch);

Then we create an image in which we save the CAPTCHA-word. It's saved in the $img variable. With an image tool we see that the animated image consist out of 38 single pictures. As we only need the first part of the animation (spot goes from left to right) we iterate over the first 19 steps of the animation. With the command

convert captcha.gif[0] captcha0.jpg

We are able to extract to first picture of the animation. Just right afterwards we load the image again and crop a rectangular part of the spot out and insert it into the image. The position of the spot is found manually with an image program and is hardcoded. This is all done in the following snippet.

$img = WideImage::createPaletteImage(91,27);
for($i = 0; $i < 19; ++$i) {
	echo ".";
	$file = "image$i.jpg";
	exec ($imagemagickconvert.' '.$path.$tmpCaptchaName.'['.$i.'] '.$path.$file);
	$img = $img->merge(WideImage::load($path.$file)->crop($picOff,9,23,27),$picOff-$initialOff);
	$picOff += $picOffStep;
}
echo "\n";
$img->saveToFile($path.$resultFile);

Now we have a picture like the following saved on the disk:

We now use Tesseract-OCR to do ocr on the image. We do this with the command:

$tesseract.exe result.jpg toutput -l eng letters

as we now that the words in the CAPTCHA consists only out of digits and lowercase letter we save this character in the letters config file. This looks like:

tessedit_char_whitelist 0123456789abcdefghijklmnopqrstuvwxyz

To do this automatically, we add the following lines to our script:

exec($tesseract.' '.$path.$resultFile.' '.$path."toutput -l eng letters");
echo preg_replace('/\s/', '', file_get_contents($path."toutput.txt"))."\n";

Now we get an output that looks like:

...................
Tesseract Open Source OCR Engine with Leptonica
fpbmd

It would be now easy to use this extracted information to submit a form. But on the other hand, how wants do something like this :-) The script works not 100% reliable. Especially the difference between z's and 2's is sometimes hard for tesseract. But probably this could optimize with trainings or you just ignore all CAPTCHA with one of this letters and try it with a other one.

The learning of this is surely that it is not too hard to crack homebrewed CAPTCHA's. So I would recommend that you stay whenever you can with a well know CAPTCHA like reCAPTCHA. On the other hand a CAPTCHA is never realy safe, according to wikipedia letting humans solve CAPTCHA's can be quite cheap.

Ähnliche Beiträge:
Jira status
Mailstatus in Skype
PHP Quine
What's php like?
Zend Framwork 1.5 is out
Comments (0)  Permalink

Twitter

It looks like I will use the "@" and "#" key more often in the future. After hanging around in facebook, spending a lot of time organizing conveniat, examine some exams and even finding some time for some fancy hacks I finally found some time to open a twitter account(*), like all the cool kids did years ago.

You find my random ramblings and rants in 140 chars @LeoBuettiker

I did not found out what should be that cool about twitter. But I like to see what my geek-friends are thinking about. We will see if there will be more output that we currently have on this blog.

(*) To be honest, there's already a second one @MyTvTipps. But more about this probably later.

Ähnliche Beiträge:
twitter
Comments (0)  Permalink

2009 Forecast

Well, I do not tend to write technological forecasts. There are people out there how are much better at this and even they are often wrong. But this year I will give it a try. Some idea about next year's trends in the web:

Crisis

The credit crunch will get worse next year. It will also affect the web world. A lot of startups will close the doors because they will not find any money. For others this will also be a good posibility, in a time where you even can't trust the banks, investors will search for startups with serious business plans.

For the big ones in the web the time will be pretty hard. There shareholder will ask question about how they will get their money back. Facebook has to find a way to make more money. I think it might be likely to see in 2009 more ads on facebook or "pro-accounts" for which users have to pay, possibly also both.

Google has possibly to stop some projects that do not generate enough money. Or create also a way to pay for these services. This might affect Google Chrome, Google Analytics and others. Likely Google Chrome will not reach more then 1-2 Percent Market share, Google will stop this project and support Firefox instead.

As there is still no usable micropayment (well paypal is still not there and will probably never be) out there I expect Google or Facebook (or a new startup) to come up with one. Possibly they buy a bank for this (UBS anyone?! Ok, this is not serious.)

Cloud

Well there was already a lot of buzz around Cloud Computing in 2008. But I think this will take off in 2009. Googles App Engine will support fast development off small apps. I think Facebook will introduce something similar for scripting apps for facebook. Most likely this will be in PHP (or JavaScript). Amazons AWS will be the platform for a lot of startups.

But with the accepance of Cloud Computing a lot of classical providers (the big German ones) will introduce new price models and also some kind of cloud computing. I think it will be very likely that also some cloud computing open source solution on the base of xen and the LAM* stack. Some comercial solutions will probably follow. This might include Sun with glasfish, mysql and solaris, Microsoft with the dotNet platform and probably others.

Big companies will provide users (open source projects and big customers) some cloud systems to test their software. Microsoft does this already for some open source project. But also commercial Projects (that write clientsoftware) might profit for this. Probably also Red Hat, SuSE/Novell or Sun will follow this example.

Not out (, yet)

I think 2009 will be a year of vaporware. We will not see PHP 6, Perl 6, MySQL 6 or Duke Nukem Forever. The first 3 of this will fight with a bigger community and it will be hard to find desisions. I'm in doubt if MySQL six will ever be here; Forks of it (like perconas version and drizle) are very likely to overtake the lead and will stop sun to work on MySQL longer then Version 5.

Hard Times

2009 will be a hard year for the whole web. But as long as you think you cannot be replaced trough the 3 bellow your job might be still secure. So I wish a very good 2009 to all of you.

job security by merlin mann

Ähnliche Beiträge:
Memcache with quicklz
Mobile App Hackathon
Types inference FMFP
Third Week FMFP
Solution Second Week FMFP
Comments (0)  Permalink

Jira status

After my last post I thought it might be also helpful to publish how many open jira tickets I have to my skype status.

You can get each jira search result also as a rss feed. Your browser does indicate the link to the result as rss. This url might look something like:

http://jira.example.com/sr/jira.issueviews:searchrequest-xml/temp/SearchRequest.xml?&&resolution=-1&assigneeSelect=specificuser&assignee=leo.buettiker&sorter/field=priority&sorter/order=DESC&tempMax=100&reset=true&decorator=none

To call this url, even if you have no valid session, you might add your user credentials at the end. This looks like:

&os_username=$username&os_password=$password

You could now use a xml or rss parser to interprete the returned feed. But for my result even that is too much, I only will count how many items in the feed are. The php snippet to do this will look like:

$jiraRss = file_get_contents($url);
$jiraCount = substr_count($jiraRss,'<item>');	
$jiraMessage = $jiraCount?" and $jiraCount open Jira Issues":"";

There might be a lot of other cool usecases you can simply implement (ticket you currently work on, Tickets closed int the last week, etc.). It's just a little bit sad that there is no REST API for Skype which would be make it easier to change the status between platforms.

Ähnliche Beiträge:
Mailstatus in Skype
Make it human (or how to crack a CAPTCHA)
PHP Quine
What's php like?
Zend Framwork 1.5 is out
Comments (0)  Permalink

Mailstatus in Skype

You all know the troubles with overflowing inboxes. I'm a bit fan of Inbox Zero and I found a lot of ways to work fast with my mails. I did switch off signaling ingoing mails, I use a lot of filtering and a good folder structure.

But sometimes my own lazyiness get into my way. So I started to put "Inbox Zero" into my skype status if I get my box empty. But after some times I decided to automatised this message.

I do know that Patrice does automated Skype updates with his Mac. After a quick search I found out that on Windows Skype has a COM-Api and they even provide a little PHP Example. With PHP it is also pretty easy so to acess an IMAP inbox (MS Exchange also provide a IMAP access). So I wrote a quick script that updates my Skype-Message:

$mail = imap_open('{mail.example.com}INBOX','leo.buettiker', 'password');

// Create a Skype4COM object:
$skype = new COM("Skype4COM.Skype");

// Create a conversion object:
$convert = $skype->convert;
$convert->language = "en";

// Start the Skype client:
if (!$skype->client()->isRunning()) {
  $skype->client()->start(true, true);
}


while(true) {
	imap_check($mail);
	$number = imap_num_msg($mail);
	$skype->CurrentUserProfile()->MoodText= 
		"Leo has currently $number mails in his inbox";
	sleep(5);
}

This does not only demonstrate how you can overcom your own lazyiness with open comunication and automated tools. It's in my point of view also a nice example what it's possible with PHP outside of the classical website rendering.

Ähnliche Beiträge:
Jira status
Make it human (or how to crack a CAPTCHA)
PHP Quine
What's php like?
Zend Framwork 1.5 is out
Comments (0)  Permalink

100@facebook

OK, having a 100 "friends" at facebook is nothing special. Even I have now 100 "friends". Nothing special so far, but I did not add a single one of them, they all added me. After being forced to open a facebookaccount a half a year ago, I decided in a silent protest to use it, but not add anyone unless the send me a request. Unless other networks I add just everybody (as long as he's not a real spamer with hunderts of contacts).

I do not like this crappy, spamy applications which do go in my way and in the one of the usability in general. But it's incredibly how many people do have a facebook account. There are not only the usual suspects (like my internet-save nerd friends) at facebook even my friends from primary school seams to have all an account.

I just want to let you know that facebook is absolutely crazy, no matter if you like it or not. (And for the geek in me, I think I can not help me and will lose a lot of time soon with there puzzles)

Ähnliche Beiträge:
You shall not steal!
Abteilung: Fehler passieren überall
$liip = $bitflux +$mediagonal;
Memcache with quicklz
Mobile App Hackathon
Comments (0)  Permalink

delicious friends

I had the idea to write a script that find out which people you might add to your delicious network, based on the links you have in common. Probably this idea was unconsciously influenced by my co-worker Stefan how had a similar idea for tilllate-users. Unfortunately you don't get the information you need for this out of the del.icio.us api. So I wrote a littel screen-scraper that sucked the information directly from the del.icio.us frontend. Unfortunately I run the first few time into the Yahoo-"you shall not steal"-guard. (Even if I did wait one second between request, seems that for the frontend you have to wait even longer between requests.) This weekend I increased the timeout between requests massively and it did work (but very slow).

The idea is pretty simple:
  • The script get's your last 100 links
  • It does search then all the links that are saved by other person as well
  • It does save all person that have saved this link by username
  • Afterwards it does callculate how often a username is saved
  • It does order the usernames by occurrence
  • It does print out the first 100 usernames with information if they are fan of you or in your network
As the script is, because of the trottling, very slow I can only give you a small sample of output here. If you're near to my "network" likelihood might be big to find you on the list. As you see I did not spend a lot of time in html formating ;-) I did use PHP for scripting, XML_HTML_Sax3 for parsing and Cache_Lite for, well, caching.
Ähnliche Beiträge:
SuperHappyDevFlat >01<
Memcache with quicklz
Mobile App Hackathon
Types inference FMFP
Third Week FMFP
Comments (2)  Permalink

Scaling is not about...

Amdal's Law, do you remember from school?! But not important for this article.I hear and read the world scaling so often lately that I earnestly think about giving it a fixed field on my bullshit bingo card. As a lot of words on bullshit bingo, scaling is often misused.

After talking with Mirko and reading a lot of blogs I really think the world needs yet another one. So this article tries to try to kill 4 common misunderstandings of scaling, because scaling is not about…

[more after the jump]

Ähnliche Beiträge:
Mobile App Hackathon
Make it human (or how to crack a CAPTCHA)
Named parameters in Java (bgl-style)
Jira status
Mailstatus in Skype
Lese ganzen Beitrag Comments (4)  Permalink

You shall not steal!

You shall not steal data from big Y! (at least not without waiting "AT LEAST ONE SECOND")

Ähnliche Beiträge:
100@facebook
The future knocks
$liip = $bitflux +$mediagonal;
Memcache with quicklz
Mobile App Hackathon
Comments (0)  Permalink

Xobni - Pulic beta



xobni the Outlook-Plugin, which turns your Outlook into a mini social network, is now in public beta. Everybody how is using (or is forced to use) outlook should give it a try and download it. It does make the eMail experience in outlook much more pleasant. Congratulation to Gabor and his team for this step.
Ähnliche Beiträge:
Memcache with quicklz
Mobile App Hackathon
Types inference FMFP
Third Week FMFP
Solution Second Week FMFP
Comments (0)  Permalink
Next1-10/93