[nmglug] How do I get the text content only from a website?
js at jasonschaefer.com
Tue Jul 22 23:58:39 PDT 2008
I think this will work on your recursively downloaded site:
for i in `find -name "*.html"`; do cat $i; done | html2text -o converted.text
On Tue, Jul 22, 2008 at 10:01 AM, VA <virginia_2002us at yahoo.com> wrote:
> Does WGET have the option to obtain only the text of the webpage only?
> The recursive downloading option allows me to create the website locally,
> but I want the content (the text) only, from all html's on the site in one
> text (ascii) file.
> Any ideas?
> nmglug mailing list
> nmglug at nmglug.org
More information about the nmglug