Why Regexes Are Cool (In a Nerdy Sort of Way)
I have used Dreamweaver as one of my main tools for developing websites. Like most folks, the WYSISYG feature is all but meaningless; it is the great set of scripting features that make it worth while. Like most other text editors, Dreamweaver has a search and replace feature. Alone, that would be helpful, but nothing out of the ordinary. However, the application allows you to do search and replaces using regular expressions. This opens a whole new door into what can be done with pre-existing code that seemed impossible to work with. I am not going to go into the major details of regular expressions for two reasons – one, it is not the point of this article and two, I feel myself teaching the fundamentals of regular expressions is like the Hulk moderating an anger-management meeting. To get the skinny on the basics of regular expressions, Adobe has a very good tutorial on using regular expressions in Dreamweaver.
One of the jobs I had as a designer/developer was for a project that entailed maintaining over 1,500 HTML files. If a change needed to be made site-wide, you can imagine how much time it took. I was only using basic find and replace techniques at the time which in retrospect was very clumsy and slow. If I had been exposed to regexes (short for regular expressions) at the time, most aggressive site-wide changes would not have been an issue. Instead of spending hours upon hours of repetitive, mindless work to alter code, we could have left it to the computer to do at almost instantaneous rate. I can only imagine what could have been done in those hours taken up by basic data re-entry. With regexes, we no longer need to do mindless copy/pasting or deleting. For people with minds, mindless work is useless.
This might just be your favorite feature in Dreamweaver very soon.
Regular Uses For Regular Expressions
A perfect example where regular expressions are helpful is if you have HTML files littered with id and class attributes. If you are planning to re-write the CSS for this site with a different structure, removing these tag attributes would be necessary. Obviously, there are going to be a multitude of classes and ids, therefore a basic find and replace will not do the trick. This is where regular expressions can be helpful.
<div id="wrapper" class="box"> <div id="navigation" class="greenBackground fullWidth"> </div> <div id="content" class="redBackground floatLeft"> </div> <div id="rightColumn" class="blueBackground floatRight" style="color:#f00;"> </div> </div>
Obviously, stripping the id and class attributes would not be too much work at this size. However, under normal circumstances, there would be many more tags mixed among actual content. As we all know, HTML starts to get messy very quickly and it is easy to start missing a tag or two here and there. Multiply that by 20 or 30 pages and a task that seemed simple is going to take some time. With a relatively simple regular expressions shown below however, all id, class and style tag attributes with their corresponding values can be found and replaced with whatever you so desire (in this case, nothing).
Let’s break down this regular expression down piece by piece. I suggest using the table of rules provided by Adobe as reference.
- sid=” – the s represent any type of space (tab, space, form/line feed). Checking for a space is important for some tags as it ensures that you are not picking up fragments of other attributes. For instance, align=” would pick up both align and valign tag attributes.
- [^"]* – this matches any character except the double quote (“) character and continues to until it finds a double quote. This is because the [^"] rule is proceeded by a asterisk (*).
- ” – picks up the closing double quote to complete the regex.
- |sclass=”[^"]*”| sstyle=”[^"]*”- the vertical line character (|) signifies an either/or rule. Therefore, it will find all id, class and style tags that begin and end with double quotes.
By putting this regular expression in the Find text box, leaving the Replace box blank and hitting the Replace All button, all id, class and style tags are essentially removed as they are replaced with a blank value. There are no doubt many ways to accomplish this same feat, but this is how I decided to do it. As I said before, I do not pretend to be a regex guru, but this will do the trick. This basic example shows you how much more can be done using regular expressions. This obviously has only skimmed the surface, but it shows you the potential of what can be done.
Say you have a table with table headers (th tags) that will eventually have sorting functions applied to them. For the time being, you just want to add empty links to each column header. Because each column obviously has a different header, a simple replace of any kind will not do the trick as we will need to maintain the title of the column header. This is where using variables is necessary.
<tr> <th>Column Header 1</th> <th>Column Header 2</th> <th>Column Header 3</th> <th>Column Header 4</th> <th>Column Header 5</th> <th >Column Header 6</th> <th>Column Header 7</th> <th>Column Header 8</th> </tr>
Take the row of table headers above. You will notice that there is an extra space in the th tag for Column Header 6. This was added intentionally as we cannot assume the HTML is going to be perfect all the time. The regex below will catch each th from opening to closing tag.
The content of each tag – [^<]* – is put in parenthesis in order to tell Dreamweaver to store that part of the string as a variable. The variable can then be used in the replace box to re-insert the content wherever desired. The way variables for regular expressions work is a simple numeric indexing. For instance if you had a such as (omg)(wtf)(lol) $1 would be omg, $2 would be wtf and $3 would be lol. Simple enough. As can be seen from the regular expression below, we use the column header name as part of the link title.
<th><a title="Sort By $1" href="#">$1</a></th>
Here is the resulting code:
<tr> <td><a title="Sort By Column Header 1" href="#">Column Header 1</a></td> <td><a title="Sort By Column Header 2" href="#">Column Header 2</a></td> <td><a title="Sort By Column Header 3" href="#">Column Header 3</a></td> <td><a title="Sort By Column Header 4" href="#">Column Header 4</a></td> <td><a title="Sort By Column Header 5" href="#">Column Header 5</a></td> <td><a title="Sort By Column Header 6" href="#">Column Header 6</a></td> <td><a title="Sort By Column Header 7" href="#">Column Header 7</a></td> <td><a title="Sort By Column Header 8" href="#">Column Header 8</a></td> </tr>
Worth the Time
Even though regexes add so much more power to find and replacing, it can take time to get your expression working correctly. The feature that makes using regular expressions in Dreamweaver worth while is the fact that you can save and load your regexes at any time. So all that time you took to create a regex that clears out all the deprecated HTML from your pages is worth it because you never have to do it by hand again. I do not think any of us are going to be missing that. To save your regular expressions, click the disk icon located directly above and right-aligned with the Find text field.
List of Regular Expression Resources
I have added some regexes to the collection, specifically in deprecated tag and non-standard character replacement.