[nmglug] Links from March 17th meeting
Akkana Peck
akkana at shallowsky.com
Fri Mar 18 08:39:02 PDT 2022
Sorry I had to miss it. I had another meeting (which turned out to
be boring, I probably should have ditched it for NMGLUG).
Robert Citek writes:
> Here are some of the links that Mark mentioned:
[ ... ]
> https://github.com/unoconv/unoconv # deprecated, see unoserver
> https://github.com/unoconv/unoserver/
Why is unoconv deprecated?
I read the "Comparison with unoconv" section at the bottom of the
unoserver README, but I'm still not really clear why it's needed
(for the user; I do get why a clean rewrite is better for the
maintainer). I use unoconv a lot, and I haven't (knowingly) hit
any of those problems. The only problem I've seen is that, being
LibreOffice, it's slow, and unoconv's default timeout isn't long
enough on some processors, so I tend to run it with -T 10 so it
will wait up to 10 seconds for LO to start up.
For people who need to do a lot of Word-to-HTML conversions,
consider mammoth (a Python module that can be run as a command
as well as used in a program). It's a different approach from
unoconv: instead of producing horrible unmaintainable HTML that
tries to mimic every style of the Word document, it produces
clean, semantic HTML with tags like <em> and <strong>.
https://github.com/mwilliamson/python-mammoth
I use both mammoth and unoconv. For one-time conversions where I
want to preserve the formatting as much as possible, including
text colors, I use unoconv. But when someone sends me content for a
web page that I'm going to have to maintain for years, or if I need
to parse the page to use the contents in some other way, mammoth
produces much better output. Mammoth only understands .docx, not .doc,
so for .doc files I first use unoconv to convert doc to docx,
then run mammoth on the docx. It's worth the extra step to get
the clean mammoth output.
There's also wvHtml, but I haven't used that in a while, and can't
remember exactly why I stopped using it.
...Akkana
More information about the nmglug
mailing list