Class HtmlTool
- java.lang.Object
-
- org.apache.velocity.tools.generic.SafeConfig
-
- lt.velykis.maven.skins.reflow.HtmlTool
-
@DefaultKey("htmlTool") public class HtmlTool extends org.apache.velocity.tools.generic.SafeConfig
An Apache Velocity tool that provides utility methods to manipulate HTML code using jsoup HTML5 parser.The methods utilise CSS selectors to refer to specific elements for manipulation.
- Since:
- 1.0
- Author:
- Andrius Velykis
- See Also:
- jsoup HTML parser, jsoup CSS selectors
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interface
HtmlTool.ExtractResult
A container to carry element extraction results.static interface
HtmlTool.IdElement
Representation of a HTML element with ID and a text content.static class
HtmlTool.JoinSeparator
Enum indicating separator handling strategy for document partitioning.
-
Constructor Summary
Constructors Constructor Description HtmlTool()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description String
addClass(String content, String selector, String className)
Adds given class to the elements in HTML.String
addClass(String content, String selector, List<String> classNames)
Adds given class names to the elements in HTML.String
addClass(String content, String selector, List<String> classNames, int amount)
Adds given class names to the elements in HTML.static List<String>
concat(List<String> elements, String text, boolean append)
Utility method to concatenate a String to a list of Strings.protected void
configure(org.apache.velocity.tools.generic.ValueParser values)
String
ensureHeadingIds(String content, String idSeparator)
Transforms the given HTML content by adding IDs to all heading elements (h1-6
) that do not have one.HtmlTool.ExtractResult
extract(String content, String selector, int amount)
Extracts HTML elements from the main HTML content.String
fixTableHeads(String content)
Fixes table heads: wraps rows with<th>
(table heading) elements into<thead>
element if they are currently in<tbody>
.List<String>
getAttr(String content, String selector, String attributeKey)
Retrieves attribute value on elements in HTML.String
headingAnchorToId(String content)
Transforms the given HTML content by moving anchor (<a name="myheading">
) names to IDs for heading elements.List<? extends HtmlTool.IdElement>
headingTree(String content)
Reads all headings in the given HTML content as a hierarchy.static org.jsoup.nodes.Element
parseBodyFragment(String content)
A generic method to use jsoup parser on an arbitrary HTML body fragment.String
remove(String content, String selector)
Removes elements from HTML.String
reorderToTop(String content, String selector, int amount)
Reorders elements in HTML content so that selected elements are found at the top of the content.String
reorderToTop(String content, String selector, int amount, String wrapRemaining)
Reorders elements in HTML content so that selected elements are found at the top of the content.String
replace(String content, String selector, String replacement)
Replaces elements in HTML.String
replaceAll(String content, Map<String,String> replacements)
Replaces elements in HTML.String
setAttr(String content, String selector, String attributeKey, String value)
Sets attribute to the given value on elements in HTML.static String
slug(String input)
Creates a slug (latin text with no whitespace or other symbols) for a longer text (i.e.static String
slug(String input, String separator)
Creates a slug (latin text with no whitespace or other symbols) for a longer text (i.e.List<String>
split(String content, String separatorCssSelector)
Splits the given HTML content into partitions based on the given separator selector.List<String>
split(String content, String separatorCssSelector, String separatorStrategy)
Splits the given HTML content into partitions based on the given separator selector.List<String>
split(String content, String separatorCssSelector, HtmlTool.JoinSeparator separatorStrategy)
Splits the given HTML content into partitions based on the given separator selector.The separators are either dropped or joined with before/after depending on the indicated separator strategy.List<String>
splitOnStarts(String content, String separatorCssSelector)
Splits the given HTML content into partitions based on the given separator selector.List<String>
text(String content, String selector)
Retrieves text content of the selected elements in HTML.String
wrap(String content, String selector, String wrapHtml, int amount)
Wraps elements in HTML with the given HTML.
-
-
-
Method Detail
-
configure
protected void configure(org.apache.velocity.tools.generic.ValueParser values)
- Overrides:
configure
in classorg.apache.velocity.tools.generic.SafeConfig
- See Also:
SafeConfig.configure(ValueParser)
-
split
public List<String> split(String content, String separatorCssSelector)
Splits the given HTML content into partitions based on the given separator selector. The separators themselves are dropped from the results.- Parameters:
content
- HTML content to splitseparatorCssSelector
- CSS selector for separators.- Returns:
- a list of HTML partitions split on separator locations, but without the separators.
- Since:
- 1.0
- See Also:
split(String, String, JoinSeparator)
-
splitOnStarts
public List<String> splitOnStarts(String content, String separatorCssSelector)
Splits the given HTML content into partitions based on the given separator selector. The separators are kept as first elements of the partitions.Note that the first part is removed if the split was successful. This is because the first part does not include the separator.
- Parameters:
content
- HTML content to splitseparatorCssSelector
- CSS selector for separators- Returns:
- a list of HTML partitions split on separator locations (except the first one), with separators at the beginning of each partition
- Since:
- 1.0
- See Also:
split(String, String, JoinSeparator)
-
split
public List<String> split(String content, String separatorCssSelector, String separatorStrategy)
Splits the given HTML content into partitions based on the given separator selector. The separators are either dropped or joined with before/after depending on the indicated separator strategy.- Parameters:
content
- HTML content to splitseparatorCssSelector
- CSS selector for separatorsseparatorStrategy
- strategy to drop or keep separators, one of "after", "before" or "no"- Returns:
- a list of HTML partitions split on separator locations.
- Since:
- 1.0
- See Also:
split(String, String, JoinSeparator)
-
split
public List<String> split(String content, String separatorCssSelector, HtmlTool.JoinSeparator separatorStrategy)
Splits the given HTML content into partitions based on the given separator selector.The separators are either dropped or joined with before/after depending on the indicated separator strategy.Note that splitting algorithm tries to resolve nested elements so that returned partitions are self-contained HTML elements. The nesting is normally contained within the first applicable partition.
- Parameters:
content
- HTML content to splitseparatorCssSelector
- CSS selector for separatorsseparatorStrategy
- strategy to drop or keep separators- Returns:
- a list of HTML partitions split on separator locations. If no splitting occurs, returns the original content as the single element of the list
- Since:
- 1.0
-
reorderToTop
public String reorderToTop(String content, String selector, int amount)
Reorders elements in HTML content so that selected elements are found at the top of the content. Can be limited to a certain amount, e.g. to bring just the first of selected elements to the top.- Parameters:
content
- HTML content to reorderselector
- CSS selector for elements to bring to top of the contentamount
- Maximum number of elements to reorder- Returns:
- HTML content with reordered elements, or the original content if no such elements found.
- Since:
- 1.0
-
reorderToTop
public String reorderToTop(String content, String selector, int amount, String wrapRemaining)
Reorders elements in HTML content so that selected elements are found at the top of the content. Can be limited to a certain amount, e.g. to bring just the first of selected elements to the top.- Parameters:
content
- HTML content to reorderselector
- CSS selector for elements to bring to top of the contentamount
- Maximum number of elements to reorderwrapRemaining
- HTML to wrap the remaining (non-reordered) part- Returns:
- HTML content with reordered elements, or the original content if no such elements found.
- Since:
- 1.0
-
extract
public HtmlTool.ExtractResult extract(String content, String selector, int amount)
Extracts HTML elements from the main HTML content. The result consists of the extracted HTML elements and the remainder of HTML content, with these elements removed. Can be limited to a certain amount, e.g. to extract just the first of selected elements.- Parameters:
content
- HTML content to extract elements fromselector
- CSS selector for elements to extractamount
- Maximum number of elements to extract- Returns:
- HTML content of the extracted elements together with the remainder of the original content. If no elements are found, the remainder contains the original content.
- Since:
- 1.0
-
setAttr
public String setAttr(String content, String selector, String attributeKey, String value)
Sets attribute to the given value on elements in HTML.- Parameters:
content
- HTML content to set attributes onselector
- CSS selector for elements to modifyattributeKey
- Attribute namevalue
- Attribute value- Returns:
- HTML content with modified elements. If no elements are found, the original content is returned.
- Since:
- 1.0
-
getAttr
public List<String> getAttr(String content, String selector, String attributeKey)
Retrieves attribute value on elements in HTML. Will return all attribute values for the selector, since there can be more than one element.- Parameters:
content
- HTML content to read attributes fromselector
- CSS selector for elements to findattributeKey
- Attribute name- Returns:
- Attribute values for all matching elements. If no elements are found, empty list is returned.
- Since:
- 1.0
-
addClass
public String addClass(String content, String selector, List<String> classNames, int amount)
Adds given class names to the elements in HTML.- Parameters:
content
- HTML content to modifyselector
- CSS selector for elements to add classes toclassNames
- Names of classes to add to the selected elementsamount
- Maximum number of elements to modify- Returns:
- HTML content with modified elements. If no elements are found, the original content is returned.
- Since:
- 1.0
-
addClass
public String addClass(String content, String selector, List<String> classNames)
Adds given class names to the elements in HTML.- Parameters:
content
- HTML content to modifyselector
- CSS selector for elements to add classes toclassNames
- Names of classes to add to the selected elements- Returns:
- HTML content with modified elements. If no elements are found, the original content is returned.
- Since:
- 1.0
-
addClass
public String addClass(String content, String selector, String className)
Adds given class to the elements in HTML.- Parameters:
content
- HTML content to modifyselector
- CSS selector for elements to add the class toclassName
- Name of class to add to the selected elements- Returns:
- HTML content with modified elements. If no elements are found, the original content is returned.
- Since:
- 1.0
-
wrap
public String wrap(String content, String selector, String wrapHtml, int amount)
Wraps elements in HTML with the given HTML.- Parameters:
content
- HTML content to modifyselector
- CSS selector for elements to wrapwrapHtml
- HTML to use for wrapping the selected elementsamount
- Maximum number of elements to modify- Returns:
- HTML content with modified elements. If no elements are found, the original content is returned.
- Since:
- 1.0
-
remove
public String remove(String content, String selector)
Removes elements from HTML.- Parameters:
content
- HTML content to modifyselector
- CSS selector for elements to remove- Returns:
- HTML content with removed elements. If no elements are found, the original content is returned.
- Since:
- 1.0
-
replace
public String replace(String content, String selector, String replacement)
Replaces elements in HTML.- Parameters:
content
- HTML content to modifyselector
- CSS selector for elements to replacereplacement
- HTML replacement (must parse to a single element)- Returns:
- HTML content with replaced elements. If no elements are found, the original content is returned.
- Since:
- 1.0
-
replaceAll
public String replaceAll(String content, Map<String,String> replacements)
Replaces elements in HTML.- Parameters:
content
- HTML content to modifyreplacements
- Map of CSS selectors to their replacement HTML texts. CSS selectors find elements to be replaced with the HTML in the mapping. The HTML must parse to a single element.- Returns:
- HTML content with replaced elements. If no elements are found, the original content is returned.
- Since:
- 1.0
-
text
public List<String> text(String content, String selector)
Retrieves text content of the selected elements in HTML. Renders the element's text as it would be displayed on the web page (including its children).- Parameters:
content
- HTML content with the elementsselector
- CSS selector for elements to extract contents- Returns:
- A list of element texts as rendered to display. Empty list if no elements are found.
- Since:
- 1.0
-
headingAnchorToId
public String headingAnchorToId(String content)
Transforms the given HTML content by moving anchor (<a name="myheading">
) names to IDs for heading elements.The anchors are used to indicate positions within a HTML page. In HTML5, however, the
name
attribute is no longer supported on<a>
) tag. The positions within pages are indicated usingid
attribute instead, e.g.<h1 id="myheading">
.The method finds anchors inside, immediately before or after the heading tags and uses their name as heading
id
instead. The anchors themselves are removed.- Parameters:
content
- HTML content to modify- Returns:
- HTML content with modified elements. Anchor names are used for adjacent headings, and anchor tags are removed. If no elements are found, the original content is returned.
- Since:
- 1.0
-
concat
public static List<String> concat(List<String> elements, String text, boolean append)
Utility method to concatenate a String to a list of Strings. The text can be either appended or prepended.- Parameters:
elements
- list of elements to append/prepend the text totext
- the given text to append/prependappend
- iftrue
, text will be appended to the elements. Iffalse
, it will be prepended- Returns:
- list of elements with the text appended/prepended
- Since:
- 1.0
-
ensureHeadingIds
public String ensureHeadingIds(String content, String idSeparator)
Transforms the given HTML content by adding IDs to all heading elements (h1-6
) that do not have one.IDs on heading elements are used to indicate positions within a HTML page in HTML5. If a heading tag without an
id
is found, its "slug" is generated automatically based on the heading contents and used as the ID.Note that the algorithm also modifies existing IDs that have symbols not allowed in CSS selectors, e.g. ":", ".", etc. The symbols are removed.
- Parameters:
content
- HTML content to modify- Returns:
- HTML content with all heading elements having
id
attributes. If all headings were with IDs already, the original content is returned. - Since:
- 1.0
-
fixTableHeads
public String fixTableHeads(String content)
Fixes table heads: wraps rows with<th>
(table heading) elements into<thead>
element if they are currently in<tbody>
.- Parameters:
content
- HTML content to modify- Returns:
- HTML content with all table heads fixed. If all heads were correct, the original content is returned.
- Since:
- 1.0
-
slug
public static String slug(String input, String separator)
Creates a slug (latin text with no whitespace or other symbols) for a longer text (i.e. to use in URLs).- Parameters:
input
- text to generate the slug fromseparator
- separator for whitespace replacement- Returns:
- the slug of the given text that contains alphanumeric symbols and separator only
- Since:
- 1.0
- See Also:
- https://www.codecodex.com/wiki/Generate_a_url_slug
-
slug
public static String slug(String input)
Creates a slug (latin text with no whitespace or other symbols) for a longer text (i.e. to use in URLs). Uses "-" as a whitespace separator.- Parameters:
input
- text to generate the slug from- Returns:
- the slug of the given text that contains alphanumeric symbols and "-" only
- Since:
- 1.0
-
headingTree
public List<? extends HtmlTool.IdElement> headingTree(String content)
Reads all headings in the given HTML content as a hierarchy. Subsequent smaller headings are nested within bigger ones, e.g.<h2>
is nested under preceding<h1>
.Only headings with IDs are included in the hierarchy. The result elements contain ID and heading text for each heading. The hierarchy is useful to generate a Table of Contents for a page.
- Parameters:
content
- HTML content to extract heading hierarchy from- Returns:
- a list of top-level heading items (with id and text). The remaining headings are nested within these top-level items. Empty list if no headings are in the content.
- Since:
- 1.0
-
parseBodyFragment
public static org.jsoup.nodes.Element parseBodyFragment(String content)
A generic method to use jsoup parser on an arbitrary HTML body fragment. Allows writing HTML manipulations in the template without adding Java code to the class.- Parameters:
content
- HTML content to parse- Returns:
- the wrapper element for the parsed content (i.e. the body element as if the content was body contents).
- Since:
- 1.0
-
-