On this page
Sanitize Inputs
Sanitizing inputs can be a good option when the input format is not strict but still somewhat predictable, such as phone numbers or other free-text fields. There are a few different ways to sanitize inputs, you could use an allowlist, a blocklist, or escape input.
Sanitize Input Using an Allowlist
When sanitizing data with an allowlist, only valid characters/strings matching a given pattern are kept. For example, when validating a phone number there are multiple formats people use, US phone numbers could be written as 555-123-1245
, (555) 123-1245
, 555.123.1245
, or a similar combination. Running any of these through an allowlist that only allows numeric characters would leave 5551231245
.
Sanitize Input Using a Blocklist
A blocklist, of course, is the exact opposite of an allowlist. A blocklist can be used to strip HTML <script>
tags or other non-conforming text from inputs before using input values. This technique suffers from the same shortcomings of the above section, Rejecting Bad Inputs. This type of sanitization must be done recursively until the value no longer changes. For example if the value <scr<scriptipt foo bar
is only processed once the result would be still contain <script
, but if done recursively, the result would be foo bar
.
Sanitize Input Using Escaping
Escaping input is one of the easiest and best ways to deal with free-form text. Essentially, instead of trying to determine the parts of the input that are safe (as with the above strategies), you assume the input is unsafe. There are a few different ways to encode strings depending on how the value is used:
HTML/XML Encoding
Example Input:
<img src onerror='alert("haxor")'>
Result:
<img src onerror='alert("haxor")'>
HTML/XML Attribute Encoding
Example Input:
<div attr="" injectedAttr="a value here"><div attr="">
Result:
<div attr="" injectedAttr="a value here"><div attr="">
JSON Encoding
Example Input:
{"key": "", anotherKey": "anotherValue"}
Result:
{"key": "\", anotherKey\": \"anotherValue\""}
Base64 Encoding
Example Input:any random string or binary data
Result:
YW55IHJhbmRvbSBzdHJpbmcgb3IgYmluYXJ5IGRhdGE=
There are ways to escape just about any format you need SQL, CSV, LDAP, etc.
Do Nothing
The last type of input validation is the no-op. Along with being the easiest to implement it is the most dangerous and most strongly discouraged! Almost every application takes input from an untrusted source. Not validating inputs puts your application and users at risk.