[nmglug] How do I get the text content only from a website?

Jason Schaefer js at jasonschaefer.com
Tue Jul 22 23:58:39 PDT 2008


Virginia

I think this will work on your recursively downloaded site:
for i in `find  -name "*.html"`; do cat $i; done | html2text -o converted.text

-Jason


On Tue, Jul 22, 2008 at 10:01 AM, VA <virginia_2002us at yahoo.com> wrote:
> Does WGET have the option to obtain only the text of the webpage only?
>
> The recursive downloading option allows me to create the website locally,
> but I want the content (the text) only, from all html's on the site in one
> text (ascii) file.
>
> Any ideas?
>
> Thanks,
> Virginia
>
>
>
> _______________________________________________
> nmglug mailing list
> nmglug at nmglug.org
> https://nmglug.org/mailman/listinfo/nmglug
>
>



More information about the nmglug mailing list