
For all our notable achievements, including the development of super computers and placing our own kind on extra-terrestrial bodies – the humble form remains the most common and reliable way to gather and validate user data – because they offer a familiar UX, are relatively easy to complete and from a developmental point of view – they are also rather simple to develop.
These are also the reasons that they are fresh-meat for rouge bots, which are developed by shadowy actors to scour the web and attack vulnerable entry points – such as login forms and APIs – in order to extract user data, create botnets or perhaps, just for the hack of it!
The power of modern web building tools make it easy to create web applications and build complex forms to gather user data, but naivety – which could be described as a lack of negative experiences – means that many such forms are poorly prepared for the attack vectors they may be subjected to and some are simply open doors inviting bots to enter.
Recaptcha to De-capture
Once developers have been made aware of the inherent risk or user-facing forms – either by training or as a result of an attack – there is often a knee-jerk reaction to apply strong layers of protection capable of withstanding brute force attacks – and these measures are not without reason – but the risks of such an approach are also high and swing the responsibility of proving human-ness back to your genuine users – which is truly hard to justify.
Bad UX is the simple way to explain why nearly all Captcha-like solutions are not the right way to secure user submitted data in WordPress Applications – the longer answer is much more nuanced and deserves a deeper analysis to understand.
It is a subject most people have a clear opinion on – either as end-users or as application developers – perhaps the only people who like Captchas are the security teams, who can see their obvious benefits in terms of reducing nefarious activity – but at what cost are those important gains made?
The central issues with Captcha-like solutions can be summarized as follows:
- Biases against users – be it photo identification for the visually impaired or mathematical challenges for the numerically dyslectic – each solution presents new accessibility issues which alienate users and in some cases contravene laws designed to ensure universal access to web-based services.
- Notable performance hits – which can be application-wide if scripts are included and instantiated globally – this may also be render-blocking if not deferred.
- On a human-level – they adds a layer of user frustration and additional friction to every form – simple or complex – lowering conversion rates – which is another way of saying that people are not able to use the tools you are developing, because the security layer is overwhelming.
- And, while developing a single form with Captcha is simple enough, developing a complex application with multiple forms per view quickly becomes difficult to manage – often leading developers to reduce the complexity of the verification steps, which in turn, negates their core purpose.
This list is not great reading – so we should also point out how hugely successful Captcha security layers can be – they can block most spam submissions – they do what they say on the tin.
Spam, Spam, Spam..
Spam is annoying, it’s data noise, evaluating it eats away valuable time and while it helpfully exposes security cracks in our systems, these can be expensive – in both a time or money sense – to resolve.
Captchas offer a quick-fix – they come in many shapes and sizes, are often free and most solutions offer good documentation and are simple to set-up – developers can breathe a sigh of relief – the integrity of their system is secure – the door is bolted shut…
When it’s cold outside and there is a crack in the window glass, it’s can be easier to board-up the entire window than to take the time to replace the pane – but the benefit of fixing the problem correctly is that we do not obstruct our view of the outside world.
It is an interesting paradox. We build beautiful houses, guide users to the address and then slam the door shut when they arrive.
When we should be welcoming visitors inside, we instead waylay them with childish challenges and beguile them with indiscernible scribblings – are we leading them on some mystic journey or simply testing their humanness?
Alternatives
If you do decide that you would rather welcome you guests, but also wish to deter robotic visitors – there are many alterative options available, here is a quick list:
- No security – party-time, all are welcome – and you get to manually sieve thru the junk on a daily basis – this gets old very quick and you’ll be running back to Captcha before you can count to 3 + 7…
- Invisible Recaptcha offers a lower-resistance route and also shifts the decision making to the developer, but they still present user, performance and developmental problems and rely on behavioural analysis, which means they need to be loaded globally across the application.
- And then we have Honey Pots… great name, simple concept, fast loading – in short a reliable, but not bullet-proof solution which is simple to develop and maintain.
Honey Pots?
Like bees to honey, or more like flies to $h1t
…
The concept is pretty simple – place a trap that attracts a hungry visitor – in this case an input-filling bot with an insatiable appetite for adding data to every possible gap it finds – but which is hidden from “real” users, then discard all submissions ( on the server-side ) which have the trap filled.
But, as with all UX questions, it’s not quite so simple – we need to look back at our original objections to Captchas and make sure we have not created different traps for real users and also examine what other new accessibility issues we might be introducing.
Firstly, let’s take a look at a simple code example of a working Honey Pot and then we’ll delve into the detail:
Here is a complete input, which should be generate using JavaScript hooked to an event listener – for example DOMContentLoaded
.
<input type="text" name="honey" class="honey" id="honey" data-form-honey tabindex="-1" autocomplete="false" data-form-required value="" />
Now we’ll break this example down and show how to optimize it.
Hide the Element
This can be achieved using CSS either by adding a class selector, such as .honey – you can use either visibility or display to ensure the input is removed from the visual flow of the page and is ignored by screen readers.
.honey{
visibility: hidden !important;
display: none !important;
}
This could also be achieved using inline CSS as follows:
<input type="text".. style="display:none" />
Keyboard Navigation
By assigning the option tabindex="-1"
we are removing the possibility for users or devices that navigate via keystrokes – such as tab – from focusing the element.
Auto-complete
We are adding a text
input – because this is more tempting for the bot – but we also add the following html attribute autocomplete="false"
to ensure that the browser does not attempt to autofill the field – false
is actually an invalid value, as only on
or off
but false
ensures that values are never added to the input.
Obscure Purpose
Remember, we are trying to trick robots programmed by people – these bots have been programmed with attack patterns – find forms, fill them in and see what happens next – login or registration pages have common fields ( username, email etc ) – comments forms are nearly universally predictable and in some cases bots are programmed to pry away at specific high traffic targets using dedicated instructions.
We can add some complexity to our honey pot by changing certain attributes and by using less obvious naming for other parts ( remember that bots can also gather and return data to their programmers to enable them to be optimized ) – some examples include:
- In the example we did – but don’t call the input
honey
orbyebyebots
or anything so obvious – the less clues we give the bot the better. - As the bots are capable or both recording and learning, a more secure model is to randomize the naming of the honey pot element – you can store a value in a transient field which is revoked on a daily or weekly basis.
- The form input has an empty
value
– this is important as it’s more tempting for the bots which hungrily fill every input they find. - We have invented a data attribute
data-form-required
to try to add more sugar – it’s effectiveness is unproven.
Backend Validation
The front-end provides the bait, but all validation happens on the backend – safely out of the reach of any bots.
We can do a very simple check for the honey key in the $_POST object – and if this is either missing ( no JavaScript on the front-end ) or present and had a value set ( robot filled out the field ) – we can take action – either returning the form with a warning or ramping up the protection.
if (
! isset( 'honey' )
|| $_POST[ 'honey' ] != ''
){
// take avoiding action - bots ahead!
}
Note that all submissions made without JS will fail – as the honeypot is added programmatically. ( the stats show a tiny proportion of users have JS disabled and all all major crawlers are now JS capable, but it’s important to also considering the JS-less experience ).
If we do no discard all submissions which do not include the honeypot element – empty or not – then we are basically introducing a simple backdoor to negate the entire honey pot.
You can also add extra layers of POST validation, for example by creating and validating a nonce or by defining an action in each form and validating that it is set in the POST data.
Bonus Log
The only real-world way to validate a technical solution is by tracking its usage and reviewing data – in this case we can log data for each security check failure to see how many ( for sure there will be some ) false-positives we have bounced back for repeat submissions.
We can simply log the reason for the failure and the posted data – being careful to encrypt sensitive data for example from login or registrations forms, which might contain passwords – we can also capture IP and user-agent data and whatever unique identifiers we add to each form to ensure we know the source clearly.
This data should be regularly audited and tweaks made to the system to attempt to reduce the number of false-positives and to block additional bots
Next Level
Once we start to think about more flexible security solutions, instead of over-engineered quick fixes – we also start to consider the users more and the flows and steps they will need to take to play an active part in your application.
Some further suggestions for tweaks to user-facing forms might include:
- Leveraging specific tools for each requirement, rather than trying to make one system fit all use-cases, for example Akismet can help to protect against comment spam, while adding email verification and holding new users in low-capability “pending” roles are very effective against registration attacks.
- To incrementally increase the challenge on each subsequent submission attempt from testing the water to try to lure in robots to presenting Turing tests of the highest complexity – gradually…
- Introducing some of the more traditional Captcha style systems once we feel more confident that we are blocking a bot and not a genuine user
- Adding a time delay to each submissions – for example by requiring the user to take 15 seconds longer on each submission – bots probably don’t get frustrated or tired.. so they will keep firing back submissions at the same rate.
- Blocking IP addresses for repeat violations ( add time and action parameters to your algorithm for more fine-grain control ).
- There are many advanced bot mitigation processes which can be applied ( most notably behavioural-based approaches ) – these are normally complex and expensive, but would by easily justified on high value projects or where dealing with very sensitive data.
Honey Pots are not a new concept, but they remain a viable solution for reducing SPAM, while also returning control over what happens when a suspected malicious submission is made to the developer – who can escalate or terminate the process or take no action at all – and simply log and review the data – but at least we are aware of what is happening and back in control of our own applications.
Add your thoughts below and happy coding 🙂
No comments on this article yet.