Product Launch
anonymous
2 points
1 comments
Postedabout 2 months agoActiveabout 2 months ago
Show HN: Turn any webpage into structured data via LLM codegen
github.comLLMweb scrapingdata extractionAI
Discussion (1 comments)
Showing 2 comments
about 2 months ago
The regeneration loop was probably the most interesting part to work on: you need very strict constraints on what “good” content looks like and what the specific issue is when codegen fails. I found Pydantic annotations to be specifically useful for this.
about 2 months ago
How does Hikugen sandbox, constrain, and audit the LLM-generated Python extraction code to prevent arbitrary code execution (e.g., filesystem writes, network egress, or imports outside stdlib), and does it enforce a deterministic execution environment (via AST rewriting, syscall filtering, seccomp profiles, or a WASM/Python sandbox) to guarantee that regenerated scrapers cannot drift into unsafe or non-reproducible states?