I Fed 24 Years of My Blog Posts to a Markov Model

Posted20 days agoActive17 days ago

zdw

303 points

122 comments

susam.netTech DiscussionstoryHigh profile

informativeneutral

Debate

20/100

Markov ModelConversational UIBlog Analysis

Key topics

Markov Model

Conversational UI

Blog Analysis

The fascinating experiment of feeding 24 years of blog posts to a Markov model has sparked a lively debate about the capabilities of traditional statistical models versus modern AI transformers. Commenters are weighing in on whether Markov chains can hold their own against large language models (LLMs), with some arguing that LLMs are essentially sophisticated Markov chains, while others insist they're fundamentally different. The discussion is highlighting the nuances of how these models work, with some pointing out that LLMs' ability to efficiently compute probabilities is the real breakthrough. As one commenter quipped, the comparison is "the question today," with transformers emerging as a clear leap forward in AI capabilities.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

50m

Peak period

6-12h

Avg / period

10.8

Comment distribution119 data points

Loading chart...

Based on 119 loaded comments

Key moments

01Story posted
Dec 13, 2025 at 3:19 PM EST
20 days ago
Step 01
02First comment
Dec 13, 2025 at 4:10 PM EST
50m after posting
Step 02
03Peak activity
37 comments in 6-12h
Hottest window of the conversation
Step 03
04Latest activity
Dec 16, 2025 at 3:24 PM EST
17 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (122 comments)

Showing 119 comments of 122

swyx

20 days ago

2 replies

now i wonder if you can compare vs feeding into a GPT style transformer of a similar Order of Magnitude in param count..

0_____0

20 days ago

1 reply

I thought for a moment your comment was the output of a Markov chain trained on HN

bitwize

20 days ago

No mention of Rust or gut bacteria. Definitely not.

fragmede

20 days ago

That's the question today. Turns out transformers really are a leap forwards in terms of AI, whereas Markov chains, scaled up to today's level of resources and capacity, will still output gibberish.

atum47

20 days ago

4 replies

I usually have this technical hypothetical discussions with ChatGpt, I can share if you like, me asking him this: aren't LLMs just huge Markov Chains?! And now I see your project... Funny

pavel_lishin

20 days ago

1 reply

> I can share if you like

Respectfully, absolutely nobody wants to read a copy-and-paste of a chat session with ChatGPT.

atum47

20 days ago

1 reply

When you say nobody you mean you, right? You can't possible be answering for every single person in the world.

I was having a discussion about similarities between Markov Chains and LLMs and short after I found this topic on HN, when I wrote "I can share if you like" was as a proof about the coincidence.

fwip

20 days ago

[delayed]

empiko

20 days ago

4 replies

LLMs are indeed Markov chains. The breakthrough is that we are able to efficiently compute well performing probabilities for many states using ML.

famouswaffles

20 days ago

2 replies

LLMs are not Markov Chains unless you contort the meaning of a Markov Model State so much you could even include the human brain.

sophrosyne42

20 days ago

1 reply

Well LLMs aren't human brains, unless you contort the definition of matrix algebra so much you could even include them.

ben_w

19 days ago

1 reply

QM and GR can be written as matrix algebra, atoms and electrons are QM, chemistry is atoms, biology is chemistry, brains are biology.

An LLM could be implemented with a Markov chain, but the naïve matrix is ((vocab size)^(context length))^2, which is far too big to fit in this universe.

Like, the Bekenstein bound means writing the transition matrix for an LLM with just 4k context (and 50k vocabulary) at just one bit resolution, the first row (out of a bit more than 10^18795 rows) ends up with a black hole ~10^9800 times larger than the observable universe.

sophrosyne42

19 days ago

Yes, sure enough, but brains are not ideas, and there is no empirical or theoretical model for ideas in terms of brain states. The idea of unified science all stemming from a single ultimate cause is beautiful, but it is not how science works in practice, nor is it supported by scientific theories today. Case in point: QM models do not explain the behavior of larger things, and there is no model which gives a method to transform from quantum to massive states.

The case for brain states and ideas is similar to QM and massive objects. While certain metaphysical presuppositions might hold that everything must be physical and describable by models for physical things, science, which should eschew metaphysical assumptions, has not shown that to be the case.

chpatrick

20 days ago

2 replies

Not sure why that's contorting, a markov model is anything where you know the probability of going from state A to state B. The state can be anything. When it's text generation the state is previous text to text with an extra character, which is true for both LLMs and oldschool n-gram markov models.

wizzwizz4

20 days ago

1 reply

A GPT model would be modelled as an n-gram Markov model where n is the size of the context window. This is slightly useful for getting some crude bounds on the behaviour of GPT models in general, but is not a very efficient way to store a GPT model.

chpatrick

20 days ago

1 reply

I'm not saying it's an n-gram Markov model or that you should store them as a lookup table.

srean

19 days ago

You say state can be anything,no restrictions at all. Let me sell you a perfect predictor then :) The state is the next token.

famouswaffles

20 days ago

1 reply

You are just proving my point.

Yes, technically you can frame an LLM as a Markov chain by defining the "state" as the entire sequence of previous tokens. But this is a vacuous observation under that definition, literally any deterministic or stochastic process becomes a Markov chain if you make the state space flexible enough. A chess game is a "Markov chain" if the state includes the full board position and move history. The weather is a "Markov chain" if the state includes all relevant atmospheric variables.

The problem is that this definition strips away what makes Markov models useful and interesting as a modeling framework. A “Markov text model” is a low-order Markov model (e.g., n-grams) with a fixed, tractable state and transitions based only on the last k tokens. LLMs aren’t that: they model using un-fixed long-range context (up to the window). k is not negotiable. It's a constant, not a variable. Once you make it a variable, any process can be described as markovian, and the word is useless.

chpatrick

20 days ago

2 replies

Sure many things can be modelled as Markov chains, which is why they're useful. But it's a mathematical model so there's no bound on how big the state is allowed to be.

famouswaffles

20 days ago

1 reply

>Sure many things can be modelled as Markov chains

Again, no they can't, unless you break the definition. K is not a variable. It's as simple as that. There's no bound to how big the state can be, but the state cannot be flexible. The markov model uses k tokens, not k tokens sometimes, n tokens other times and whatever you want it to be the rest of the time.

chpatrick

20 days ago

1 reply

It's not n sometimes, k tokens some other times. LLMs have fixed context windows, you just sometimes have less text. They're pure functions from a fixed size block of text to a probability distribution of the next character.

famouswaffles

20 days ago

2 replies

1. A context limit is not a Markov order. An n-gram model’s defining constraint is: there exists a small constant k such that the next-token distribution depends only on the last k tokens, full stop. An LLM’s defining behavior is the opposite: within its window it can condition on any earlier token, and which tokens matter can change drastically with the prompt (attention is content-dependent). “Window size = 8k/128k” is not “order k” in the Markov sense; it’s just a hard truncation boundary.

2. “Fixed-size block” is a padding detail, not a modeling assumption. Yes, implementations batch/pad to a maximum length. But the model is fundamentally conditioned on a variable-length prefix (up to the cap), and it treats position 37 differently from position 3,700 because the computation explicitly uses positional information. That means the conditional distribution is not a simple stationary “transition table” the way the n-gram picture suggests.

3. “Same as a lookup table” is exactly the part that breaks. A classic n-gram Markov model is literally a table (or smoothed table) from discrete contexts to next-token probabilities. A transformer is a learned function that computes a representation of the entire prefix and uses that to produce a distribution. Two contexts that were never seen verbatim in training can still yield sensible outputs because the model generalizes via shared parameters; that is categorically unlike n-gram lookup behavior.

I don't know how many times I have to spell this out for you. LLMs do not follow the markov property in any sense of the imagination. You are just plain wrong.

chpatrick

20 days ago

I think you're confusing Markov chains and "Markov chain text generators". A Markov chain is a mathematical structure where the probabilities only depend on the current state and not the previous path. It doesn't say anything about whether the probabilities are computed or stored, it just exists.

lelanthran

18 days ago

> An n-gram model’s defining constraint is: there exists a small constant k such that the next-token distribution depends only on the last k tokens, full stop.

I don't necessarily agree with GP, but I also don't think that a markov chain and markov generator definitions include the word "small".

That constant can be as large as you need it to be.

sigbottle

20 days ago

1 reply

The etymology of the "markov property" is that the current state does not depend on history.

And in classes, the very first trick you learn to skirt around history is to add Boolean variables to your "memory state". The issue obviously being that this is exponential.

I don't think LLMs embody the markov property at all, even if you can make everything eventually follow the markov property by just "considering every single possible state". Of which there are (size of token set)^(length) states at minimum because of the KV cache.

chpatrick

20 days ago

1 reply

The KV cache doesn't affect it. LLMs are stateless and don't take any other input than a fixed block of text. They don't have memory, which is the requirement for a Markov chain.

sigbottle

20 days ago

1 reply

Have you ever actually worked with a basic markov problem?

The markov property states that your state is a transition of probabilities entirely from the previous state.

These states, inhabit a state space. The way you encode "memory" if you need it, e.g. say you need to remember if it rained the last 3 days, is by expanding said state space.

If you define it that way, then yes, a LLM is a "markov chain" of state space size (# tokens)^(context length), at minimum. That's not a helpful abstraction and defeats the original purpose of the markov observation.

Are deliberately missing the point or what?

chpatrick

20 days ago

> Sure, a LLM is a "markov chain" of state space size (# tokens)^(context length), at minimum.

Okay, so we're agreed.

cwyers

20 days ago

1 reply

Yeah, there's only two differences between using Markov chains to predict words and LLMs:

* LLMs don't use Markov chains, * LLMs don't predict words.

arboles

20 days ago

* Markov chains have been used to predict syllables or letters since the beginning, and an LLMs tokenizer could be used for Markov chains

* The R package markovchain[1] may look like it's using Markov chains, but it's actually using the R programming language, zeros and ones.

[1] https://cran.r-project.org/web/packages/markovchain/index.ht...

arboles

20 days ago

1 reply

Markov models with more than 3 words as "context window" produce very unoriginal text in my experience (corpus used had almost 200k sentences, almost 3 million words). This is because 4 words form a token, not a context window. Markov models lack what LLMs have.

If you use a syllable-level token in Markov models the model can't form real words beyond the second syllable, and you have no way of making it make more sense other than increasing the size the token size, which exponentially decreases originality. This is the simplest way I can explain it.

arboles

17 days ago

I was really sleepy when I wrote this

srean

19 days ago

1 reply

They are definitely not Markov Chains they may, however, be Markov Models. There's a difference between MC and MM.

empiko

19 days ago

1 reply

What do you mean? The states are fully observable (current array of tokens), and using an LLM we calculate the probabilities of moving between them. What is not MC about this?

srean

19 days ago

1 reply

I suggest getting familiar with the difference between a Markov Chain and a Markov Model. The former is a substantial restriction of the latter. The classic by Kemeny and Snell is a good readable reference.

MC have constant finite context and their state is the most recent k tuple of emitted alphabets.

empiko

19 days ago

1 reply

LLMs definitely also have finite context length. And if we consider padding, it is also constant. The k is huge compared to most Markov chains used historically, but it doesn't make it less finite.

srean

19 days ago

1 reply

That's not correct. Even a toy like an exponential weighted moving averaging produces unbounded context (of diminishing influence).

empiko

19 days ago

1 reply

What do you mean? I can only input k tokens into my LLM to calculate the probs. That is the definition of my state. In the exact way that N-gram LMs use N tokens, but instead of using ML models, they calculate the probabilities based on observed frequencies. There is no unbounded context anywhere.

srean

19 days ago

That's different.

You can certainly feed k-grams one at a time to, estimate the the probability distribution over next token and use that to simulate a Markov Chain and reinitialize the LLM (drop context). In this process the LLM is just a look up table to simulate your MC.

But an LLM on its own doesn't drop context to generate, it's transition probabilities change depending on the tokens.

roarcher

20 days ago

1 reply

...are you under the impression that you have an exclusive relationship with "him"? Everyone else has access to ChatGPT too.

atum47

20 days ago

1 reply

Yes. Yes I was. Thank you for the wake up call. I was under the impression that he was talking only to me.

roarcher

20 days ago

It's just baffling when people paste (or in your case, offer to paste) ChatGPT transcripts on here. If we wanted ChatGPT's input we could ask it ourselves.

It's like saying "hey guys, want me to Google this and paste the results page here"? It contributes nothing to the conversation.

atum47

20 days ago

2 replies

Don't know what happened. I stumbled onto a funny coincidence - me talking to a LLM about its similarities with MC - decided to share on a post about using MC to generate text. Got some nasty comments and a lot of down votes. Even though my comment sparked a pretty interesting discussion.

Hate to be that guy, but I remember this place being nicer.

pavel_lishin

19 days ago

Nobody was being nasty. roarcher explained why people reacted the way they did.

roarcher

20 days ago

Ever since LLMS became popular, there's been an epidemic of people pasting ChatGPT output onto forums (or in your case, offering to). These posts are always received similarly to yours, so I find it odd that you're surprised by the reaction.

Everyone has access to ChatGPT. If we wanted its input we could ask it ourselves. Your offer is akin to "Hey everyone, want me to Google this and paste the results page here?"

These posts are low-effort and add nothing to the conversation, yet the people who do it seem to expect everyone to be impressed by their contribution. If you can't understand why people bristle at this, I'm not sure what to tell you.

lacunary

20 days ago

2 replies

I recall a Markov chain bot on IRC in the mid 2000s. I didn't see anything better until gpt came along!

nurettin

20 days ago

1 reply

Yes, I made one using bitlbee back in the 2000s, good times!

pavel_lishin

20 days ago

1 reply

I made one for Hipchat at a company. I can't remember if it could emulate specific users, or just channels, but both were definitely on my roadmap at the time.

lloydatkinson

20 days ago

I'm hoping someone can find it so I can bookmark it but I once read a story about a company that let multiple markov chain bots loose in a Slack channel. A few days later production went down because one of them ran a Slack command that deployed or destroyed their infrastructure.

frumiousirc

19 days ago

1 reply

Perhaps you are thinking of megahal https://homepage.kranzky.com/megahal/Index.html or if a bit later in the millennium, cobe https://teichman.org/blog/

lacunary

19 days ago

ah, probably so, looks like there were eggdrop scripts for megahal, thanks!

vunderba

20 days ago

7 replies

I did something similar many years ago. I fed about half a million words (two decades of mostly fantasy and science fiction writing) into a Markov model that could generate text using a “gram slider” ranging from 2-grams to 5-grams.

I used it as a kind of “dream well” whenever I wanted to draw some muse from the same deep spring. It felt like a spiritual successor to what I used to do as a kid: flipping to a random page in an old 1950s Funk & Wagnalls dictionary and using whatever I found there as a writing seed.

idiotsecant

20 days ago

1 reply

Did it work?

vunderba

20 days ago

1 reply

So that's the key difference. A lot of people train these Markov models with the expectation that they're going to be able to use the generated output in isolation.

The problem with that is either your n-gram level is too low in which case it can't maintain any kind of cohesion, or your n-gram level is too high and it's basically just spitting out your existing corpus verbatim.

For me, I was more interested in something that could potentially combine two or three highly disparate concepts found in my previous works into a single outputted sentence - and then I would ideate upon it.

So I haven't opened the program in a long time so I just spun it up and generated a few outputs:

  A giant baby is navel corked which if removed causes a vacuum.

I'm not sure what the original pieces of text were based on that particular sentence but it starts making me think about a kind of strange void harkonnen with heart plugs that lead to weird negatively pressurized areas. That's the idea behind the dream well.

kqr

19 days ago

> A giant baby is navel corked which if removed causes a vacuum.

Very The Age of Wire and String.

echelon

20 days ago

1 reply

What would the equivalent be with LLMs?

I spend all of my time with image and video models and have very thin knowledge when it comes to running, fine tuning, etc. with language models.

How would one start with training an LLM on the entire corpus of one's writings? What model would you use? What scripts and tools?

Has anyone had good results with this?

Do you need to subsequently add system prompts, or does it just write like you out of the box?

How could you make it answer your phone, for instance? Or discord messages? Would that sound natural, or is that too far out of domain?

ipaddr

20 days ago

1 reply

Simplest way pack all text into a prompt.

You could use a vector database.

You could train a model from scratch.

Probably easiest to use OpenAI tools. Upload documents. Make custom model.

How do you make it answer your phone? You could use twillio api + script + llm + voice model. Want natural use a service.

echelon

20 days ago

1 reply

I think you're absolutely right about the easiest approach. I hope you don't mind me asking for a bit more difficulty.

Wouldn't fine tuning produce better results so long as you don't catastrophically forget? You'd preserve more context window space, too, right? Especially if you wanted it to memorize years of facts?

Are LoRAs a thing with LLMs?

Could you train certain layers of the model?

dannyw

20 days ago

A good place to start with your journey is this guide from Unsloth:

https://docs.unsloth.ai/get-started/fine-tuning-llms-guide

Tallain

20 days ago

1 reply

Curious if you've heard of or participated in NaNoGenMo[0] before. With such a corpus at your fingertips could be a fun little project; obviously, pure Markov generation wouldn't be quite sufficient but a good starting point maybe.

[0]: https://nanogenmo.github.io/

vunderba

20 days ago

Hey that's neat! I hadn't heard of it. It says you need to publish the novel and the source at the end - so I guess as part of the submission you'd include the RNG seed.

The only thing I'm a bit wary of is the submission size - a minimum of 50,000 words. At that length, It'd be really difficult to maintain a cohesive story without manual oversight.

bitwize

20 days ago

Terry Davis, pbuh, did something very similar!

davely

20 days ago

I gave a talk in 2015 that did the same thing with my tweet history (about 20K at the time) and how I used it as source material for a Twitter bot that could reply to users. [1]

It was pretty fun!

[1] https://youtu.be/rMmXdiUGsr4

boznz

20 days ago

[delayed]

wrp

19 days ago

There was an MS-DOS tool by James Korenthal called Babble[0], which did something similar. It apparently worked according to a set of grammatical transformers rather than by generating n-grams, so it was more akin to the "cut-up" technique[1]. He reported that he got better output from smaller, more focused corpora. Its output was surprisingly interesting.

[0] https://archive.org/details/Babble_1020, https://vetusware.com/download/Babble%21%202.0/?id=11924

[1] https://en.wikipedia.org/wiki/Cut-up_technique

hexnuts

20 days ago

1 reply

I just realized, one of the things that people might start doing is making a gamma model of their personality. I won't even approach who they were as a person, but it will give their Descendants (or bored researchers) a 60% approximation of who they were and their views. (60% is pulled from nowhere to justify my gamma designation, since there isn't a good scale for personality mirror quality for LLMs as far as I'm aware.)

jacquesm

20 days ago

"Dixie can't meaningfully grow as a person. All that he ever will be is burned onto that cart;"

"Do me a favor, boy. This scam of yours, when it's over, you erase this god-damned thing."

sebastianmestre

20 days ago

4 replies

LLMs usually do some sort of tokenization step prior to learning parameters. So I decided to try out order-1 Markov models over text tokenized with byte pair encoding (BPE).

Trained on TFA I got this:

> I Fed by the used few 200,000 words. All comments were executabove. This value large portive comment then onstring takended to enciece of base for the see marked fewer words in the...

Then I bumped up the order to 2

> I Fed 24 Years of My Blog Posts to a Markov Model > By Susam Pal on 13 Dec 2025 > > Yesterday I shared a little program calle...

It just reproduced the entire article verbatim. This makes sense as BPE removes any pair of repeated tokens, making order-2 Markov transitions fully deterministic.

I've heard that in NLP applications, it's very common to run BPE only up to a certain number of different tokens, so I tried that out next.

Before limiting, BPE was generating 894 tokens. Even adding a slight limit (800) stops it from being deterministic.

> I Fed 24 years of My Blog Postly coherent. We need to be careful about not increasing the order too much. In fact, if we increase the order of the model to 5, the generated text becomes very dry and factual

It's hard to judge how coherent the text is vs the author's trigram approach because the text I'm using to initialize my model has incoherent phrases in it anyways.

Anyways, Markov models are a lot of fun!

countWSS

20 days ago

1 reply

the trick to prevent 'dry' output that quotes verbatim is to make the 5 words limit flexible: if there is only one path, reduce it to 4.

Tallain

20 days ago

I have a pet tool I use for conlang work for writing/worldbuilding that is built on Markov chains and I am smacking my forehead right now at how obvious this seems in hindsight. This is great advice, thank you.

andai

20 days ago

1 reply

Nice :) I did something similar a few days ago. What I ended up with was a 50/50 blend of hilarious nonsense, and verbatim snippets.There seemed to be a lot of chains where there was only one possible next token.

I'm considering just deleting all tokens that have only one possible descendant, from the db. I think that would solve that problem. Could increase that threshold to, e.g. a token needs to have at least 3 possible outputs.

However that's too heavy handed: there's a lot of phrases or grammatical structures that would get deleted by that. What I'm actually trying to avoid is long chains where there's only one next token. I haven't figured out how to solve that though.

vunderba

20 days ago

That's where a dynamic n-gram comes into play. Train the markov model from 1 to 5 n-grams, and then scale according to the number of potential paths available.

yard2010

19 days ago

2 replies

Reading this I get this weird feeling that something there is trying to communicate, which is equally horrifying as the alternative - we are alone, our minds are trying to find order in chaos, there is no meaning except what we create.

sebastianmestre

19 days ago

I had the same feeling while testing the code. It might be caused by seeing the increasingly coherent output of the different models, makes you feel like it's getting smarter.

travisjungroth

19 days ago

The alternative to something trying to communicate through a Markov model isn’t that we’re alone. Just because there’s no life on Mars doesn’t mean there’s no other life in the universe.

samus

19 days ago

1 reply

[delayed]

sebastianmestre

19 days ago

Pretty sure that OpenAI uses BPE in their GPT models

Aperocky

20 days ago

1 reply

Here's a quick custom markov page you can have fun with, (all in client) https://aperocky.com/markov/

npm package of the markov model if you just want to play with it on localhost/somewhere else: https://github.com/Aperocky/weighted-markov-generator

anthk

19 days ago

Hailo from CPAN (Perl) it's much lighter than any NPM solution.

monoidl

20 days ago

1 reply

I think this is more correctly described as a trigram model than a Markov model, if it would naturally expand to 4-grams when they were available, etc, the text would look more coherent

Iirc there was some research on "infini-gram", that is a very large ngram model, that allegedly got performance close to LLMs in some domains a couple years back

Legend2440

19 days ago

1 reply

Google made some very large ngram models around twenty years ago. This being before the era of ultra-high-speed internet, it was distributed as a set of 6 DVDs.

It achieved state-of-the-art performance at tasks like spelling correction at the time. However, unlike an LLM, it can't generalize at all; if an n-gram isn't in the training corpus it has no idea how to handle it.

https://research.google/blog/all-our-n-gram-are-belong-to-yo...

tqian

19 days ago

I have this DVD set in my basement. Technically, there are still methods for estimating the probability of unseen ngrams. Backoff (interpolating with lower grams) is an option. You can also impose prior distributions like a Bayesian so that you can make "rational" guesses.

Ngrams are surprisingly powerful for how little computation they require. They can be trained in seconds even with tons of data.

andai

20 days ago

1 reply

In 2020, a friend and I did this with our mutual WhatsApp chat history.

Except instead we fine-tuned GPT-2 instead. (As was the fashion at the time!)

We used this one, I think https://github.com/minimaxir/gpt-2-simple

I think it took 2-3 hours on my friend's Nvidia something.

The result was absolutely hilarious. It was halfway between a markov chain and what you'd expect from a very small LLM these days. Completely absurd nonsense, yet eerily coherent.

Also, it picked up enough of our personality and speech patterns to shine a very low resolution mirror on our souls...

###

Andy: So here's how you get a girlfriend:

1. Start making silly faces

2. Hold out your hand for guys to swipe

3. Walk past them

4. Ask them if they can take their shirt off

5. Get them to take their shirt off

6. Keep walking until they drop their shirt

Andy: Can I state explicitly this is the optimal strategy

Tepix

20 days ago

That‘s funny! Now imagine you‘re using Signal on iOS instead of WhatsApp. You cannot do this with your chat history because Signal won‘t let you access your own data outside of their app.

GarnetFloride

20 days ago

1 reply

The one author that I think we have a good chance of recreating would be Barbara Cartwright. She wrote 700+ romance novels all pretty much the same. It should be possible to generate another of her novels given that large a corpus.

LanceH

20 days ago

I'm not sure how we'd know. My wife sometimes buys and rereads a novel she's already finished.

user_7832

19 days ago

2 replies

I can’t believe no one’s mentioned the Harry Potter fanfic written by a Markov Chain. If you’re familiar with HP, I highly recommend reading Harry Potter and the Portrait of What Looked Like a Large Pile of Ash.

Here’s a link: https://botnik.org/content/harry-potter.html

mattacular

19 days ago

2 replies

Genuine question: Why would anyone want to read that? I glanced at the first sentence and decided not to go any further.

It is hollow text. It has no properties of what I'd want to get out of even the worst book produced by human minds.

grahamnorton39

19 days ago

While hollow, it is also bad (and absurd) enough to be quite entertaining. It’s from an era where this wasn’t far off the state of the art for coming up with machine-generated text—context that makes it quite a bit funnier than if it were generated by an LLM today.

That said, it’s obviously not to everyone’s tastes!

user_7832

19 days ago

It's purely meant to be an absurdist read. It obviously makes no sense, yet is close enough to actual language patterns for (some, at least) people to find it hilarious. I had tears in my eyes from laughing way too hard when I first read it.

cluckindan

19 days ago

That reads as coherently as a pile of Zen koans.

Peteragain

19 days ago

1 reply

So, are current LLMs better because artificial neural networks are better predictors than Markov models, or because of the scale of the training data? Just putting it out there..

microtonal

19 days ago

2 replies

Markov models usually only predict the next token given the two preceding tokens (trigram model) because the data gets so exceptionally sparse beyond that, that it becomes impossible to make probability estimations (despite back-off, smoothing, etc.).

I recommend you to read Bengio et al.’s 2003 paper which describes this issue in more detail and introduces distributional representations (embeddings) in an RNN to avoid this sparsity.

While we are using transformers and sentence pieces now, this paper aptly describes the motivation underpinning modern models.

https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf

Peteragain

19 days ago

1 reply

Thanks for the reference and I stand corrected. And yes I had looked at it a long time ago and will give it another read. But I think it is saying that RNNs are a means of approximating a statistical property of a collection of text. That property is what we today think of as "completion"? That is, glorified auto complete, and not "distributed representations" of the world. Would you agree?

microtonal

19 days ago

1 reply

distributed representations

Distributional representations, not distributed.

https://en.wikipedia.org/wiki/Distributional_semantics#Distr...

Peteragain

17 days ago

This is the problem. I am arguing there are no distributed representations (cf Harnad's original symbol grounding problem paper, Hinton, and others). There are "distributional representations" by definition (cf the Wikipedia entry)

lelanthran

18 days ago

> Markov models usually only predict the next token given the two preceding tokens (trigram model) because the data gets so exceptionally sparse beyond that

Of course, that's because it is a probability along a single dimension with a chain-length along that one dimension while LLMs and NNs use multiple dimensions (They are meshed, not chained).

I really want to know what the result would look like with a few more dimensions resulting in a markov mesh type structure rather than a chain structure.

srean

19 days ago

2 replies

There seems to be quite a bit of argument about what is a Markov Model, if LLMs are one.

Markov Models have state and emit tokens based on its current state and undergoes a state transition. A statistical/probabilistic analogue of a state machine.

For a Markov Model to be non-vacuous, non-vapid discussion point, however, one needs to specify very precisely the relationships allowed between state and tokens/observations, whether it's hidden or visible, discrete or continuous ...

The simplest such model is that the state is a specified, computable function of the last k observations. One such function is the identity function.

One can make the state a specified, computable function of k previous states and k most recent tokens/observations.

The functions may be specified only upto a class of computable functions, finite or infinite in size.

You can make the context length a computable function of the k most recent observations (therefore of varying length), but you have to ensure that the contexts are always full for this model to be well defined. Context length can be a computable function of current state and k most recent observations.

On and on.

NuclearPM

19 days ago

1 reply

Learnable? What does that mean?

srean

19 days ago

1 reply

There are various notions of this.

The most basic/naive one is where one can estimate the unknown parameters of the model given example token streams generated by the model.

TomatoCo

19 days ago

1 reply

So learnable, in this context, rhymes with reverse-engineerable?

igorkraw

19 days ago

Another term used is identifiable (although learnable and identifiable are not synonyms, I think identifiability is one precondition for learnability).

Identifiability means that out of all possible models, you can learn the correct one given enough samples.causal identifiability has some other connotations

See here https://causalai.net/r80.pdf as a good start (a nose in a causal graph is Markov given its parents, and a k-step Markov chain is a k-layer causal dag)

canjobear

19 days ago

1 reply

This is right in terms of the rigorous statistical sense of “Markov model”. But in practice in the world of NLP and chatbots through the 90s and 2000s, “Markov model” was usually used to refer to Markov chains (ie you only condition on the previous k words). Hence the term “Hidden Markov Model” to refer to what you’re calling a Markov model.

srean

18 days ago

I covered hidden Markov Models briefly in my comment.

It depends on whether the state is visible in the observations or not. Hidden or not is an orthogonal axis of variation compared to the other variations mentioned in the comment.

In a non-hidden model there is no ambiguity or uncertainty about what the current state is.

ikhatri

20 days ago

When I was in college my friends and I did something similar with all of Donald Trump’s tweets as a funny hackathon project for PennApps. The site isn’t up anymore (RIP free heroku hosting) but the code is still up on GitHub: https://github.com/ikhatri/trumpitter

pessimizer

19 days ago

You could literally buy this at Egghead software for $3 from the bargain bin in 1992. I know, because I did. I fed it 5 years worth of my juvenile rants, and laughed at how pompous I sounded through a blender.

https://archive.org/details/Babble_1020

https://forum.winworldpc.com/discussion/12953/software-spotl...

manthangupta109

20 days ago

Damn interesting!

keithalewis

19 days ago

https://math.uchicago.edu/~shmuel/Network-course-readings/MC...

kazinator

20 days ago

[delayed]

anthk

20 days ago

Megahal/Hailo (cpanm -n hailo for Perl users) can still be fun too.

Usage:

      hailo -t corpus.txt -b brain.brn

Where "corpus.txt" should be a file with one sentence per line. Easy to do under sed/awk/perl.

      hailo -b brain.brn

This spawns the chatbot with your trained brain.

anthk

19 days ago

Quick test for Perl users (so anyone there with a Unix-like). Run these as a NON root user:

       cpanm -n local::lib

       cpanm -n Hailo

       ~/perl5/bin/hailo -t corpus.txt -b brain.brn

       ~/perl5/bin/hailo -b brain.brn

As corpus.txt, you can use a Perl/sed command for instance with book from Gutenberg.

frizlab

19 days ago

> Also, these days, one hardly needs a Markov model to generate gibberish; social media provides an ample supply.

delfugal

19 days ago

Should call it Trump Speech Generator. Loads of gibberish.

msapaydin

19 days ago

https://cdn.cs50.net/ai/2023/x/lectures/6/src6/markov/# This is a nice Markov text generator.

effnorwood

18 days ago

It laughed and gave him a kiss

hilti

20 days ago

First of all: Thank you for giving.

Giving 24 years of your experience, thoughts and life time to us.

This is special in these times of wondering, baiting and consuming only.

rumgewieselt

19 days ago

I love the design of the website more than the Markov model. Good Job!

OuterVale

20 days ago

Really fascinating how you can get such intriguing output from such a simple system. Prompted me to give it a whirl with the content on my own site.

https://vale.rocks/micros/20251214-0503

litver

19 days ago

"I Fed 24 Years of My Blog Posts to a Markov Model" you're not the first who did it. Already dozens of LLMs did it.

3 more comments available on Hacker News

View full discussion on Hacker News

ID: 46257607Type: storyLast synced: 12/16/2025, 8:05:44 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN