Skip to main content

A regular expression to strip all attributes from HTML tags.

Remove all attributes from an html tag
--------------------------------------

  <([a-z][a-z0-9]*)[^>]*?(/?)>

<         # Match '<' at beginning of tags
(         # Start Capture Group $1 - Tag Name
[a-z]     # Match 'a' through 'z'
[a-z0-9]* # Match 'a' through 'z' or '0' through '9' zero or more times
)         # End Capture Group
[^>]*?    # Match anything other than '>', Zero or More times, not-greedy (wont eat the /)
(/?)      # Capture Group $2 - '/' if it is there
>         # Match '>'

Add some quoting, and use the replacement text <$1$2> it should strip any
text after the tagname until the end of tag /> or just >.

Example
-------

# Subject

  <p style="padding:0px;">
    <strong style="padding:0;margin:0;">hello</strong>
  </p>

# Result

  <p>
    <strong>hello</strong>
  </p>

PHP Example
-----------

  $text = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello</strong></p>';

  echo preg_replace("/<([a-z][a-z0-9]*)[^>]*?(/?)>/i",'<$1$2>', $text);
  // <p><strong>hello</strong></p>