public class Parser
extends java.lang.Object
Document
. Generally, it is simpler to use one of the parse methods in
Jsoup
.
Note that a Parser instance object is not threadsafe. To reuse a Parser configuration in a multi-threaded
environment, use newInstance()
to make copies.
Modifier and Type | Field and Description |
---|---|
private ParseErrorList |
errors |
private ParseSettings |
settings |
private boolean |
trackPosition |
private TreeBuilder |
treeBuilder |
Modifier | Constructor and Description |
---|---|
private |
Parser(Parser copy) |
|
Parser(TreeBuilder treeBuilder)
Create a new Parser, using the specified TreeBuilder
|
Modifier and Type | Method and Description |
---|---|
ParseErrorList |
getErrors()
Retrieve the parse errors, if any, from the last parse.
|
TreeBuilder |
getTreeBuilder()
Get the TreeBuilder currently in use.
|
static Parser |
htmlParser()
Create a new HTML parser.
|
boolean |
isContentForTagData(java.lang.String normalName)
(An internal method, visible for Element.
|
boolean |
isTrackErrors()
Check if parse error tracking is enabled.
|
boolean |
isTrackPosition()
Test if position tracking is enabled.
|
Parser |
newInstance()
Creates a new Parser as a deep copy of this; including initializing a new TreeBuilder.
|
static Document |
parse(java.lang.String html,
java.lang.String baseUri)
Parse HTML into a Document.
|
static Document |
parseBodyFragment(java.lang.String bodyHtml,
java.lang.String baseUri)
Parse a fragment of HTML into the
body of a Document. |
static java.util.List<Node> |
parseFragment(java.lang.String fragmentHtml,
Element context,
java.lang.String baseUri)
Parse a fragment of HTML into a list of nodes.
|
static java.util.List<Node> |
parseFragment(java.lang.String fragmentHtml,
Element context,
java.lang.String baseUri,
ParseErrorList errorList)
Parse a fragment of HTML into a list of nodes.
|
java.util.List<Node> |
parseFragmentInput(java.lang.String fragment,
Element context,
java.lang.String baseUri) |
Document |
parseInput(java.io.Reader inputHtml,
java.lang.String baseUri) |
Document |
parseInput(java.lang.String html,
java.lang.String baseUri) |
static java.util.List<Node> |
parseXmlFragment(java.lang.String fragmentXml,
java.lang.String baseUri)
Parse a fragment of XML into a list of nodes.
|
ParseSettings |
settings()
Gets the current ParseSettings for this Parser
|
Parser |
settings(ParseSettings settings)
Update the ParseSettings of this Parser, to control the case sensitivity of tags and attributes.
|
Parser |
setTrackErrors(int maxErrors)
Enable or disable parse error tracking for the next parse.
|
Parser |
setTrackPosition(boolean trackPosition)
Enable or disable source position tracking.
|
Parser |
setTreeBuilder(TreeBuilder treeBuilder)
Update the TreeBuilder used when parsing content.
|
static java.lang.String |
unescapeEntities(java.lang.String string,
boolean inAttribute)
Utility method to unescape HTML entities from a string
|
static Parser |
xmlParser()
Create a new XML parser.
|
private TreeBuilder treeBuilder
private ParseErrorList errors
private ParseSettings settings
private boolean trackPosition
public Parser(TreeBuilder treeBuilder)
treeBuilder
- TreeBuilder to use to parse input into Documents.private Parser(Parser copy)
public Parser newInstance()
public Document parseInput(java.lang.String html, java.lang.String baseUri)
public Document parseInput(java.io.Reader inputHtml, java.lang.String baseUri)
public java.util.List<Node> parseFragmentInput(java.lang.String fragment, Element context, java.lang.String baseUri)
public TreeBuilder getTreeBuilder()
public Parser setTreeBuilder(TreeBuilder treeBuilder)
treeBuilder
- new TreeBuilderpublic boolean isTrackErrors()
public Parser setTrackErrors(int maxErrors)
maxErrors
- the maximum number of errors to track. Set to 0 to disable.public ParseErrorList getErrors()
setTrackErrors(int)
public boolean isTrackPosition()
public Parser setTrackPosition(boolean trackPosition)
trackPosition
- position tracking setting; true
to enablepublic Parser settings(ParseSettings settings)
settings
- the new settingspublic ParseSettings settings()
public boolean isContentForTagData(java.lang.String normalName)
public static Document parse(java.lang.String html, java.lang.String baseUri)
html
- HTML to parsebaseUri
- base URI of document (i.e. original fetch location), for resolving relative URLs.public static java.util.List<Node> parseFragment(java.lang.String fragmentHtml, Element context, java.lang.String baseUri)
fragmentHtml
- the fragment of HTML to parsecontext
- (optional) the element that this HTML fragment is being parsed for (i.e. for inner HTML). This
provides stack context (for implicit element creation).baseUri
- base URI of document (i.e. original fetch location), for resolving relative URLs.public static java.util.List<Node> parseFragment(java.lang.String fragmentHtml, Element context, java.lang.String baseUri, ParseErrorList errorList)
fragmentHtml
- the fragment of HTML to parsecontext
- (optional) the element that this HTML fragment is being parsed for (i.e. for inner HTML). This
provides stack context (for implicit element creation).baseUri
- base URI of document (i.e. original fetch location), for resolving relative URLs.errorList
- list to add errors topublic static java.util.List<Node> parseXmlFragment(java.lang.String fragmentXml, java.lang.String baseUri)
fragmentXml
- the fragment of XML to parsebaseUri
- base URI of document (i.e. original fetch location), for resolving relative URLs.public static Document parseBodyFragment(java.lang.String bodyHtml, java.lang.String baseUri)
body
of a Document.bodyHtml
- fragment of HTMLbaseUri
- base URI of document (i.e. original fetch location), for resolving relative URLs.public static java.lang.String unescapeEntities(java.lang.String string, boolean inAttribute)
string
- HTML escaped stringinAttribute
- if the string is to be escaped in strict mode (as attributes are)public static Parser htmlParser()
public static Parser xmlParser()