Regular expressions for HTML
Regular expressions can be used to find HTML code which can then be used for cleaning or in CMS’s. Here are a couple I’ve been looking for.
To find all tags on a page
</?\w+((\s+\w+(\s*=\s*(?:”.*?”|’.*?’|[^'">\s]+))?)+\s*|\s*)/?>
To find a class or id tag
class=”[^"]*["]|id=”[^"]*["]|style=”[^"]*["]
- \sid=” – the \s represent any type of space (tab,
space, form/line feed). Checking for a space is important for some tags
as it ensures that you are not picking up fragments of other
attributes. For instance, align=” would pick up both align and valign tag attributes. - [^”]*
- this matches any character except the double quote (”) character and
continues to until it finds a double quote. This is because the [^”]
rule is proceeded by a asterisk (*). - ” – picks up the closing double quote to complete the regex.
- |\sclass=”[^”]*”|
\sstyle=”[^”]*”- the vertical line character (|) signifies an either/or
rule. Therefore, it will find all id, class and style tags that begin
and end with double quotes
Replacing tags in dreamweaver
In the following example:
<th>Column Header 1</th>
We want to replace the th tag contents with a link. Use parenthesis
<th[^>]*>([^<]*)</th>
<th><a title=”Sort By $1″ href=”#”>$1</a></th>
Find image tags
<img[^>]*[>]
Find links
<a href=”[^>]*[>]
or
<a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>([^<]+|.*?)?<\/a>
Powered by ScribeFire.

