HTML parsing
Interweave doesn't rely on an HTML parser for rendering HTML safely; instead, it uses the DOM
itself. It accomplishes this by using DOMImplementation.createHTMLDocument
(MDN), which
creates an HTML document in memory, allowing us to easily set markup, aggregate nodes, and generate
React elements. This implementation is supported by all modern browsers and IE9+.
DOMImplementation
has the added benefit of not requesting resources (images, scripts, etc) until
the document has been rendered to the page. This provides an extra layer of security by avoiding
possible CSRF and arbitrary code execution.
Furthermore, Interweave manages a list of both HTML tags and attributes, further increasing security, and reducing the risk of XSS and vulnerabilities.
Allowed tags
Interweave keeps a mapping of renderable HTML tags to parsing configurations. These configurations handle the following rules and processes.
- Defines the type of rule: allow or deny.
- Defines the type of tag: inline, block, inline-block.
- Flags whether inline children can be rendered.
- Flags whether block children can be rendered.
- Flags whether children of the same tag name can be rendered.
- Maps the parent tags the current element can render in.
- Maps the child tags the current element can render.
The following tags are not supported, but their children will still be rendered.
acronym
, area
, basefont
, bgsound
, big
, blink
, center
, col
, content
, data
,
datalist
, dialog
, dir
, font
, form
, hgroup
, image
, input
, isindex
, keygen
,
listing
, marquee
, menu
, menuitem
, meter
, multicol
, nobr
, noembed
, noframes
,
optgroup
, option
, param
, plaintext
, progress
, select
, shadow
, slot
, spacer
,
strike
, template
, textarea
, tt
, wbr
, xmp
The following tags and their children will never be rendered, even when the allow list is disabled.
applet
, base
, body
, canvas
, command
, embed
, frame
, frameset
, head
, html
, link
,
meta
, noscript
, object
, script
, style
, title
The list of allowed tags can be customized using the
allowList
prop, which accepts a list of HTML tag names.
Allowed attributes
Interweave takes parsing a step further, by also filtering attribute values and HTML nodes. Like tags, a mapping of renderable HTML attributes to parser rules exist. A rule can be one of: allow and cast to string (default), allow and cast to number, allow and cast to boolean, and finally, deny.
Any attribute not found in the mapping will be ignored unless
allowAttributes
is passed.
Render precedence
There are 3 levels of rendering, in order:
- Banned - Tags that will never be rendered, regardless of the allow list, or what the
consumer configures. This is based on the
BANNED_TAG_LIST
constant. This takes the highest precedence. - Blocked - Tags that will not be rendered and are configured through the consumer with the
blockList
prop. This takes precedence overallowList
andallowElements
. - Allowed - Tags that will be rendered. The default allow list is based on the
ALLOWED_TAG_LIST
constant, or can be configured by the consumer with theallowList
prop. TheallowElements
prop has a higher precedence thanallowList
, but both of which are lower than blocked or banned tags.
By-passing allowed
If need be, the allowed tag list can be disabled with the allowElements
prop, which renders all
HTML elements except for banned tags (hard-coded) and blocked tags (provided by blockList
).
Furthermore, the allowed attribute list can be disabled with allowAttributes
, which renders all
non-event and non-XSS attack vector attributes.
These props are highly discouraged as it opens up possible XSS and injection attacks, and should
only be used if the markup passed to Interweave
has been sanitized beforehand.
That being said, banned tags like script
, applet
, and a few others are consistently removed.
Replacing elements
By default, Interweave converts tags to an <Element />
React component, which renders the
appropriate DOM node. For custom block-level elements, the transform
function prop can be passed.
This function receives the parsed DOM node, and can return either a React element (which is inserted
into the React element tree), undefined to use the default <Element />
component, or null to skip
the element entirely.
For example, to replace a
elements with a custom element:
import { Interweave, Node } from 'interweave';
function transform(node: HTMLElement, children: Node[]): React.ReactNode {
if (node.tagName === 'A') {
return <Link href={node.getAttribute('href')}>{children}</Link>;
}
}
<Interweave transform={transform} />;
Note that transform
is run before checking the allowed list, permitting you to use non-allowed
tags in a controlled way. If the transformOnlyAllowList
prop is true
, transform
will not be
ran on tags unless that are in the allowList
. Banned tags like script
will not be transformed.
Disabling HTML
The HTML parser cannot be disabled, however, a noHtml
boolean prop can be passed to both the
<Interweave />
and <Markup />
components. This prop will mark all HTML elements as pass-through,
simply rendering text nodes recursively, including matchers.
If you want to strip user provided HTML, but allow HTML from matchers, use the
noHtmlExceptMatchers
prop instead.