Show HN: PlutoPrint – Generate PDFs and PNGs from HTML with Python
github.com> PlutoBook depends on the following external libraries:
> Required: cairo, freetype, harfbuzz, fontconfig, expat, icu
> Optional: curl, turbojpeg, webp (enable additional features)
Asking your favorite LLM will give you da codez
PS: I'm not trying to discount this tool. I'm only pointing out an alternative that might be useful
We're doing a very similar thing (custom lightweight engine) over at https://github.com/DioxusLabs/blitz. We have more of a focus on UI, but there's definitely overlap (we support rendering to image, but don't have pagination/fragmentation implemented).
Have you run the WPT tests against your engine to test spec conformance?
This is exactly what I was looking for a few months ago. I might revisit that project with it.
[1]: https://ahapdf.nyc3.cdn.digitaloceanspaces.com/samplers/logi... (PDF)
They would give a much better idea of its complex printing capabilities.
[1]: https://github.com/plutoprint/plutobook/blob/main/FEATURES.m...
This is the kind of thing that might be fixed with more people attempting to use it, or it could be another pita like having to install an old wkhtmltopdf for Odoo to use.
What are the known issues or the unsupported css this library has?
* Invoices: Totals get pushed to a new page with no repeated <thead> header. This is a classic failure of CSS table rendering across page breaks. properties like page-break-inside: avoid are notoriously inconsistent in browser print to PDF engines. Line items get split mid row because the engine doesn't understand the semantic integrity of the data.
* Bills of Lading & Manifests: These documents are infamous for unpredictable page breaks. One page cuts a row in half, the next duplicates headers, the next drops content entirely. This often stems from complex flexbox or grid layouts that the PDF rendering engine struggles to paginate deterministically.
* Shipping Labels: A barcode or QR code shifting by a few pixels is often a DPI or scaling artifact. The browser rendering at a logical 96 DPI doesn't translate perfectly to a 300 or 600 DPI thermal printer format, introducing rounding errors that are catastrophic for scanners. Addresses drift outside the printable area because CSS margins (margin, padding) can be interpreted differently by the print media engine versus the screen engine.
* Digital Forms: This is a classic failure of absolute vs. relative positioning. When you overlay HTML form fields on a scanned PDF background (a common requirement), the HTML box model's flow layout simply cannot guarantee pixel-perfect alignment with the fixed grid of the underlying image. I've seen teams resort to printing, using white out, and hand filling forms because the software couldn't align (x, y) coordinates.
* Tickets & Passes: Scanner rejection due to incorrect sizing is often due to the browser engine's "print scaling" or "fit-to-page" logic, which can be difficult to disable and varies between environments (e.g., a local Docker container vs. an AWS Lambda function with different system fonts or libraries installed).
This always turns into a long tail of support tickets. The only truly reliable solution is to bypass the HTML/CSS rendering model entirely and build the document on a canvas with an absolute coordinate system. This means using libraries like FPDF (PHP), ReportLab (Python), or lower-level tools like iText/PDFBox (Java), where you aren't "converting" a document, you are drawing it. You place text at (x, y), draw a line from (x1, y1) to (x2, y2), and manage page breaks and object placement explicitly.
It's not cheap. The initial build cost is high because every layout is effectively a small, “programmaticd CAD project”. You can't just "throw HTML at it". But the payoff in reliability is immense. It becomes a set and forget system that produces identical documents every time, which stops the endless firefighting.
Yes, two years later it can be painful to update when the original developer is gone. But I would take that trade off any day over constantly battling with imprecise, non deterministic tools. In twenty years of building systems where documents are mission critical, "close enough" rendering was almost never good enough.
It needs javascript support so charting libraries work but they mention working toward that in the roadmap.
It's more like PrinceXML than a browser. This is great Prince is the gold standard for HTML print out and the only engine to fully support Paged Media level 3 last time I looked. Normal browsers don't seem to care as much about full print css support so Prince has a monopoly here and is not cheap.
Printing those things is really difficult. All the time I get split cells (with some rows not printed) and every kind of problems (like broken word wrap etc).
Building a rendering HTML/CSS rendering engine is no easy job. Congratulations! I'm curious how were you able to pull this off? What documentations were helpful and what was your inspiration? I'm in awe and wat to learn more about this initiative.
At first, my plan was simple. I wanted to make an HTML rendering library. But soon, I realized it could be even more useful if it focused on paged output so I could make PDFs directly. C and C++ do not have an HTML-to-PDF library that is not a full web engine. I started coding and thought I could finish in a year by working a few hours each day. But reality came fast. HTML and CSS are much harder than SVG, and even small things caused big problems.
I studied KHTML and WebKit to see how real HTML and CSS engines work. The official specs were very helpful. Slowly, everything started to come together. It felt like discovering a hidden world behind the web pages we see every day.
The hardest part has been TableLayout. Tables look simple, but handling row and column spans, nested tables, alignment, page breaks, and box calculations was very hard. I spent many hours fixing layout bugs that only appeared in some situations. It was frustrating, humbling, and also very satisfying when it worked.
I am still learning and improving. I hope other people enjoy PlutoPrint and PlutoBook as much as I do.
Quick question:
1. I see you've hand-written parsers yourself both css & html, why not use existing parsers? was minimizing dependencies one of your goals?
2. Does the project recongnize headers / footers and other such @page css rules?
3. Fragmentation(pagination) logic has a huge set of challenges (at least from what I read about Chrome implementing fragmentation) - did you come across this? - https://developer.chrome.com/docs/chromium/renderingng-fragm....
Was fragmentation logic really that difficult to implement?
1. The documentation for HTML and CSS parsers is pretty straightforward and easier to implement, so I thought it was better to write them myself.
2. It fully supports margin boxes (headers and footers) using properties like @top-left and @bottom-center inside @page rules. You can see more here: https://github.com/plutoprint/plutobook/blob/main/FEATURES.m...
3. Yes, I did come across this. Fragmentation logic is as difficult as it sounds. Right now PlutoBook works with a single, consistent page size throughout a document and does not support named pages, which simplifies things a lot.
Feel free to contact me via email if you have more questions.
First of all, big congratulations on pulling this off! Creating an HTML rendering app that exports to PDF is no simple task—it’s a massive effort. Thank you for sharing this, I’ll definitely check it out. If you could also provide ready-to-use executables (especially for Windows), that would be a huge help.
Just a small suggestion:
Since you’re well-versed in both C++ and HTML technologies, would you be interested in contributing to wkhtmltopdf? It’s one of the most widely used tools for generating PDFs, especially in Odoo ERP for production websites. Your contribution there would be a tremendous benefit to the community.
Thanks again!
There's also https://github.com/odoo/paper-muncher from Odoo S.A. (an open core ERP vendor) that seems to go into a good direction.
Shouldn’t this be a URL like https://example.com
Also, is there support for creating a linkable table of contents?