The patterns to be found from the web page are href="some_link" (i.e. href="[^"]*" as regular expression). Notice that several such links may appear on the same html line, and of course we want to extract them all, one per output line.
Here is a one-liner that does the job
Code: Select all
cat page.htm | grep -o 'href="[^"]*"'
Code: Select all
cat page.htm | grep -o 'href="[^"]*"' | sed 's/.*"\([^"]*\)".*/\1/'