Your Location is: Home > Linux

How to specify and extract html element by curl

From: Macao View: 2838 Heisenberg 


when I tried to curl some pages.


I can get like following result

    <dd> 10 times </dd>

my desired result is like simply 10 times..

Are there any good way to achieve this ?

If someone has opinion please let me know


Best answer

If you are are unable to use a html parser for what ever reason, for your given simple html example, you could use:

 curl | sed -rn '[email protected](^.*<dd>)(.*)(</dd>)@\[email protected]'

Redirect the output of the curl command into sed and enable regular expression interpretation with -r or -E. Split the lines into three sections and substitute the line for the second section only, printing the result.