Programmer’s manual

Simple example:

Say you have an XML document in the variable $xmlstr, and you want to add a mogrified='true' attribute to all tags that have an href attribute, while munging cite attributes to point to a different URL. Just require() the file tagmogrify.php and do this:

function addattribute(&$tag) {
    $tag->setAttribute("mogrified","true");
}
function mungecite(&$attribute) {
    $attribute->nodeValue = "http://munged.com/";
}
$m = new TagMogrifier("utf-8",
    $fatal_errors=false,
    $entity_declarations="");
$m->add_tag_match("string:a","string:href",null);
$m->add_attribute_match(null,"string:cite",null);
$a->add_tag_mogrificator("addattribute");
$a->add_attribute_mogrificator("mungecite");
print $a->run($xmlstr);

In this example, a TagMogrifier instance is created, a tag match expression is added, an attribute match expression is added, and mogrificators function for tag and attribute are added, then the XML in $xmlstr is fed to the mogrifier. You can see from the example that the mogrifier is prepared to receive XML in the UTF-8 encoding, and that errors encountered during parsing will not cause program termination (but will produce warnings).

In this example, addattribute is called a tag mogrificator function, while mungecite is called an attribute mogrificator function. Tag mogrificators receive a DOMElement every time a tag matches the provided expression in add_tag_match(). Attribute mogrificators receive a DOMAttribute every time an attribute matches the provided expression in add_attribute_match().

add_tag_match() and add_attribute_match() require a maximum of three arguments: a tag name expression, an attribute name expression, and an attribute value expression. For a tag or attribute to actually match (and hence, be mogrified by the mogrificator functions you added), it needs to match all three arguments. Valid expressions for tags, attribute names and attribute values are:

  • anything: null. null always matches.
  • an exact match on a string: "string:abcde" would match “abcde”
  • a case-insensitive exact match on a string: "istring:abcde" would match “abCDe”
  • a regular expression match: "regexp:/abcdefg/" would run PCRE using abcdefg as an expression
  • a function match: "function:a_function_name" would invoke a_function_name passing the tag name, the attribute name or the attribute value, and if the function returns true, the match is considered positive

Examples:

  • add_tag_match(null,null,null): matches all tags
  • add_tag_match("string:a",null,null): matches all a tags
  • add_tag_match(null,"string:href",null): matches all tags with an ‘href’ attribute
  • add_tag_match("string:a","string:href","regexp:|^http://rudd-o.com|"): matches all hyperlinks that point to http://rudd-o.com and children

You can specify several matches instead of one, by repeatedly calling add_tag_match() or add_attribute_match() with your desired arguments. If any of the specified matches do match, then the corresponding (tag or attribute) mogrificator function is called (only once for each node, even if multiple matches do match).

Remember that attribute matches that do match invoke attribute mogrificators you have added, while tag matches that do match invoke tag mogrificators. Also remember that tag mogrificators receive tag nodes (DOMElements) while attribute mogrificators receive attribute nodes (DOMAttributes).

If the mogrifier is fed invalid XML via the run() method, the mogrifier normally bombs out with a fatal error. If you initialized it with $fatal_errors=false, instead of bombing out, it will only print out a warning and return false, so you can process this condition in the caller. To silence the warnings, you can usually prepend the run() call with an at sign, like this: @$instance->run($xmlstr);.