[nmglug] How do I get the text content only from a website?

andres andres at paglayan.com
Tue Jul 22 09:16:16 PDT 2008


not sure what you want to do,
but for scrapping you can use
ruby hpricot and scrape libraries, they are very very good at gathering
info from web pages,


    On Tue, 2008-07-22 at 09:01 -0700, VA wrote:

> Does WGET have the option to obtain only the text of the webpage only?
> 
> 
> The recursive downloading option allows me to create the website
> locally, but I want the content (the text) only, from all html's on
> the site in one text (ascii) file.  
> 
> Any ideas?
> 
> Thanks,
> Virginia
> 
> 
> 
> 
> _______________________________________________
> nmglug mailing list
> nmglug at nmglug.org
> https://nmglug.org/mailman/listinfo/nmglug

-- 
Andres Paglayan
CTO, StoneSoup LLC 
Ph: 505 629-4344
Mb: 505 690-2871
FWD: 65-5587
Testi. Codi. Vinci.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nmglug.org/pipermail/nmglug-nmglug.org/attachments/20080722/bffaa336/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5537 bytes
Desc: not available
URL: <http://lists.nmglug.org/pipermail/nmglug-nmglug.org/attachments/20080722/bffaa336/attachment.bin>


More information about the nmglug mailing list