In the following, the 'file' argument is the filename to be processed. Assumed output redirection to a destination file is omitted for simplicity. (All commands are single-line — beware of folding in smaller browser windows...)
Change single newlines to spaces -- leave double newlines alone (as paragraphs)
matt -v '(\n\n)|(\n)' -o '$1$(2 )' file
Discard HTML/XML tags
Uses two passes — removing comments first avoids confusion.matt -sav '<!--.*-->' file | matt -sav '<.*>'
Determine tagnames used in HTML/XML file
matt '<([^/!][^ >]*)' -o '$1\n' file|sort|uniq
Extract links from HTML
matt -sai '<a href=\"(.*)\"[^>]*>(.*)</a>' -o '$2: $1\n' file
Generate HTML from plain text
Fixes entities, adds <P> tags at double newlines.matt -v '(<)|(>)|(&)|(\n\n)' -o '$(1<)$(2>)$(3&)$(4<p>\n\n)' file
Display UTF-8 characters in a text file as their byte sequences
(append '|sort|uniq' to get a shorter list...)
as octal:
as hex:
Find which shared libraries are used by which (assumed) executablesScans all files in the current directory. (Template is cosmetic, to indent results.)matt -8si 'lib[a-z0-9_]*\.so' -o ' $0\n' *
|