Google’s Street View neural network cannow decrypt captchas better than ahuman

Updated @ 05:25 April 19: Four months after it first published this
research (detailed in the story below), Google is now promoting this
deep neural network as a win for both Street View and its Recaptcha
product. As far as I can tell, nothing has changed since January —
Google is just now framing it as “our neural network is so advanced
that it can decrypt our captchas as well as a human,” rather than
improving Street View’s ability to decrypt hard-to-read house signs.
The software can decrypt the hardest type of Recaptcha captchas with
99.8% accuracy (which is a lot better than my own accuracy).

Original story
Having spent some time on the internet, you have no doubt been
forced to prove your humanity by typing words and numbers into a
captcha. Google’s own Recaptcha variant has been used not only to
keep bots at bay, but to help the search giant identify the text in
scanned books and house numbers from Street View. Google doesn’t
rely exclusively on hijacking your brain cycles anymore, though. A
new research paper from Google details how the company trained a
neural network to read the millions of unidentified house numbers
captured by Street View cameras without human intervention.
An artificial neural network is a computational model that seeks to
replicate the parallel nature of a living brain. This system works
directly on the pixel images that are captured by Street View cars, and
it works more like your brain than many previous models. Instead of
breaking each address image up into individual digits then identifying
each one, it looks at the whole number and recognizes it, just like we
do.
When you type in an address on Google Maps, you
expect it to return the correct location. Having the
right addresses for each structure is essential to
that, especially in areas where building numbers are
not linear. That’s why there is value in knowing
what it actually says on the front door, and why the
company would go to the trouble of building a
synthetic brain to do it.
To train its neural network, Google used the publicly available Street
View House Numbers (SVHN) Dataset. This is exactly what it sounds
like — a massive dataset with 200,000 addresses split up into number
blocks for a total of 600,000 numerical images to train an electronic
brain. It takes six days for the system to learn the dataset and be able
to identify digits in Street View pictures at a high level of accuracy.
Google simplified the process by placing some constraints on the
images analyzed by the neural network. The addresses must have
already been identified and automatically cropped so that the number
is at least one third the total width of the final image. They also
assume that the number is five or fewer digits long, which works fine
in most regions. Since Google’s neural network doesn’t read digits
slowly, one at a time, the limit on length is essential.

Examples of numbers the machine failed to recognize.
Humans transcribing the numbers from Street View images are about
98% accurate, so that is the threshold Google is shooting for with the
machine. That doesn’t mean 98% of all images necessarily — it refers
to a subset of images that are suitable for the automated system to
identify. About 95% of captured addresses fall into this category and
the neural network meets the 98% accuracy requirement on them.
Google says it has used this system to read 100 million physical
street numbers so far. [Research paper: arxiv.org/abs/1312.6082 -
"Multi-digit Number Recognition from Street View Imagery using Deep
Convolutional Neural Networks"]
This computer model has lightened the load on human eyeballs
considerably, but there are still some images that require a human’s
assessment. As the neural network is improved, Google researchers
hope that it could be of use in reading street signs or phone numbers
on billboards.

1 comment:

  1. So the computer is now competing seriously with man with a really close markup, man you better don't relent.

    ReplyDelete