Article Preview
Top2. Literature Review
Firstly, we aim to study and understand how the current version in actually attacked. The concept of captcha was introduced in 2000 and technology has taken a massive leap since then. Even before the machine learning era, all that was needed to overcome a traditional text-based captcha was a decent enough OCR.
In (Yan & El Ahmad,), an attack on Microsoft’s captcha was conducted to test its strength. It was found that it can be attacked using a cheap attack based on segmenting. A more efficient attack on the yahoo captcha is described in (Gao et al., 2012). Although the attack specified in this publication is performed on the captcha from one provider, the concept can be applied to another captcha provides like google and Microsoft.
The only way captchas are able to defend against these attacks is applying background noise and make the characters more unreadable. This might work against a simple OCR but against a program that leverages machine learning, even these enhancements may not be enough. An experimental study is done conducted in (Alqahtani & Alsulaiman, 2020), where attacks using machine learning were performed on Google reCAPTCHA. The experiment showed how easily machine learning can be used to attack existing captcha. (Wang et al., 2019) show how a deep CNN program can be trained to identify different variation of a captcha. The most complete solution is presented in (Wang et al., 2020). The neural network developed by this method was tested on 20+ captcha variations and yielded a result of over 95%.
As mentioned before there have been techniques developed to overcome this challenge. Another attempt for captcha is mentioned in (Almazyad et al., 2011), where a combination of an image and text are used to make the process difficult for a bot. Another variation is designed to use characters in 3D shapes in (Imsamai & Phimoltares, 2010). Although it may be effective against an OCR but a NN can easily be designed to beat this variation.