How to specify and extract html element by curl
when I tried to curl some pages.
I can get like following result
<html> <body> <div> <dl> <dd> 10 times </dd> </dl> </div> </body> </html>
my desired result is like simply
Are there any good way to achieve this ?
If someone has opinion please let me know
If you are are unable to use a html parser for what ever reason, for your given simple html example, you could use:
curl http://test.com | sed -rn '[email protected](^.*<dd>)(.*)(</dd>)@\[email protected]'
Redirect the output of the curl command into sed and enable regular expression interpretation with -r or -E. Split the lines into three sections and substitute the line for the second section only, printing the result.