[nmglug] How do I get the text content only from a website?

Andrew Farnsworth farnsaw at stonedoor.com
Tue Jul 22 09:09:31 PDT 2008


VA wrote:
> Does WGET have the option to obtain only the text of the webpage only?
>
> The recursive downloading option allows me to create the website 
> locally, but I want the content (the text) only, from all html's on 
> the site in one text (ascii) file. 
>
> Any ideas?
>
> Thanks,
> Virginia
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> nmglug mailing list
> nmglug at nmglug.org
> https://nmglug.org/mailman/listinfo/nmglug
>   
Use WGET in conjunction with HTML2TXT or HTML2RTF depending on what you 
are really trying to do and then just CAT them together.  A bit of 
scripting around these tools should get you a single command line that 
will give you a complete

Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nmglug.org/pipermail/nmglug-nmglug.org/attachments/20080722/0b0e4482/attachment.htm>


More information about the nmglug mailing list