PreGPT Search | Not Hacker News!

Discussion (160 comments)

Showing 160 comments

johng

about 1 month ago

3 replies

I don't know how this works under the hood but it seems like no matter how it works, it could be gamed quite easily.

cryzinger

about 1 month ago

1 reply

If it's just using Google search "before <x date>" filtering I don't think there's a way to game it... but I guess that depends on whether Google uses the date that it indexed a page versus the date that a page itself declares.

madars

about 1 month ago

1 reply

Date displayed in Google Search results is often the self-described date from the document itself. Take a look at this "FOIA + before Jan 1, 1990" search: https://www.google.com/search?q=foia&tbs=cdr:1,cd_max:1/1/19...

None of these documents were actually published on the web by then, incl., a Watergate PDF bearing date of Nov 21, 1974 - almost 20 years before PDF format got released. Of course, WWW itself started in 1991.

Google Search's date filter is useful for finding documents about historical topics, but unreliable for proving when information actually became publicly available online.

littlestymaar

about 1 month ago

1 reply

Are you sure it works the same way for documents that Google indexed at the time of publication? (Because obviously for things that existed before Google, they had to accept the publication date at face value).

madars

about 1 month ago

1 reply

Yes, it works the same way even for content Google indexed at publication time. For example, here are chatgpt.com links that Google displays as being from 2010-2020, a period when Google existed but ChatGPT did not:

https://www.google.com/search?q=site%3Achatgpt.com&tbs=cdr%3...

So it looks like Google uses inferred dates over its own indexing timestamps, even for recently crawled pages from domains that didn't exist during the claimed date range.

littlestymaar

about 1 month ago

Interesting, thanks.

I wonder why they do that when they could use time of first indexing instead.

qwertygnu

about 1 month ago

1 reply

True, but there's probably many ways to do this and unless AI content starts falsifying tons of its metadata (which I'm sure would have other consequences), there's definitely a way.

Plus other sites that link to the content could also give away it's date of creation, which is out of the control of the AI content.

layman51

about 1 month ago

1 reply

I have heard of a forum (I believe it was Physics Forums) which was very popular in the older days of the internet where some of the older posts were actually edited so that they were completely rewritten with new content. I forget what the reasoning behind it was, but it did feel shady and unethical. If I remember correctly, the impetus behind it was that the website probably went under new ownership and the new owners felt that it was okay to take over the accounts of people who hadn't logged on in several years and to completely rewrite the content of their posts.

I believe I learned about it through HN, and it was this blog post: https://hallofdreams.org/posts/physicsforums/

It kind of reminds me of why some people really covet older accounts when they are trying to do a social engineering attack.

joshuaissac

about 1 month ago

> website probably went under new ownership

According to the article, it was the founder himself who was doing this.

CGamesPlay

about 1 month ago

1 reply

"Gamed quite easily" seems like a stretch, given that the target is definitionally not moving. The search engine is fundamentally searching an immutable dataset that "just" needs to be cleaned.

johng

about 1 month ago

1 reply

How? They have an index from a previous date and nothing new will be allowed since that date? A whole copy of the internet? I don't think so.... I'm guessing, like others, it's based on the date the user/website/blog lists in the post. Which they can change at any time.

fragmede

about 1 month ago

Yes they do. It's called common crawl, and is available from your chosen hyperscaler vendor.

1gn15

about 1 month ago

1 reply

Does this filter out traditional SEO blogfarms?

JKCalhoun

about 1 month ago

1 reply

Yeah, might prefer AI-slop to marketing-slop.

al_borland

about 1 month ago

1 reply

They are the same. I was looking for something and tried AI. It gave me a list of stuff. When I asked for its sources, it linked me to some SEO/Amazon affiliate slop.

All AI is doing is making it harder to know what is good information and what is slop, because it obscures the source, or people ignore the source links.

venturecruelty

about 1 month ago

I've started just going to more things in person, asking friends for recommendations, and reading more books (should've been doing all of these anyway). There are some niche communities online I still like, and the fediverse is really neat, but I'm not sure we can stem the Great Pacific Garbage Patch-levels of slop, at this point. It's really sad. The web, as we know and love it, is well and truly dead.

swyx

about 1 month ago

4 replies

somebody said once we are mining "low-background tokens" like we are mining low-background (radiation) steel post WW2 and i couldnt shake the concept out of my head

(wrote up in https://www.latent.space/i/139368545/the-concept-of-low-back... - but ironically repeating something somebody else said online is kinda what i'm willingly participating in, and it's unclear why human-origin tokens should be that much higher signal than ai-origin ones)

jeffchuber

about 1 month ago

1 reply

that was me swyx

rollulus

about 1 month ago

2 replies

Multiple people have coined the idea repeatedly, way before you. The oldest comment on HN I could find was in December 2022 by user spawarotti: https://news.ycombinator.com/item?id=33856172

threeducks

about 1 month ago

2 replies

Here is an even older comment chain about it from 2020: https://news.ycombinator.com/item?id=23895706

Apparently, comparing low-background steel to pre-LLM text is a rather obvious analogy.

pseidemann

about 1 month ago

As well as that people often do think alike.

If you have a thought, it's likely it's not new.

rollulus

about 1 month ago

Oh wow, great find! That’s really early days.

jeffchuber

about 1 month ago

1 reply

i didnt claim to invent it.

i claimed swyx heard it through me - which he did

swyx

about 1 month ago

you did!!

jrjfjgkrj

about 1 month ago

11 replies

every human generation built upon the slop of the previous one

but we appreciated that, we called it "standing on the shoulders of giants"

walrusted

about 1 month ago

1 reply

the only structure you can build with slop is a burial mound

Dilettante_

about 1 month ago

1 reply

What is unhardened cement but slop?

walrusted

about 1 month ago

while it makes the sound of slop, but the similarity effectively ends there. you cant build a load bearing beam from slop but you can build a burial mound from both.

hoppp

about 1 month ago

1 reply

You can't build on slop because slop is a slippery slope

Dilettante_

about 1 month ago

Maybe we'll have to slurp the slop so we don't slip on the slope.

kgwgk

about 1 month ago

1 reply

Nothing conveys better the idea of a solid foundation to build upon than the word ‘slop’.

DeepSeaTortoise

about 1 month ago

Every foundation needs some time to settle.

- Sir, this is an elevator.

teiferer

about 1 month ago

1 reply

Because the pyramids, the theory of general relativity and the Linux kernel are all totally comparable to ChatGPT output. /s

Why is anybody still surprised that the AI bubble made it that big?

jrjfjgkrj

about 1 month ago

2 replies

for every theory of relativity the is the religious non-sense and superstitions of the medieval ages or today

JumpCrisscross

about 1 month ago

2 replies

> for every theory of relativity the is the religious non-sense and superstitions of the medieval ages or today

If Einstein came up with relativity by standing on "the religious non-sense and superstitions of the medieval ages," you'd have a point.

scotty79

about 1 month ago

1 reply

If we have billions of AIs one might pick the correct learning materials. Same way human Einstein did.

teiferer

about 1 month ago

1 reply

Depends what you mean with "AI". Actual intelligence? Yeah, maybe. LLM? No, they don't actually reason, an activity that Einstein did a lot of and without which coming up with his theory would have been impossible. He didn't just go on a random walk like what LLM temperature is doing.

scotty79

about 1 month ago

1 reply

Who knew that such a huge amount of software development problems can be solved without ever "actually" reasoning.

mrbungie

about 1 month ago

Well, depends on the software project itself and where you are in its development lifecycle but:

- (1) A lot of developing can be just chores around managing scaffolds and repeatable work, and due to this macros, autogenerated code and other tools have been a thing at many layers for a long time; and

- (2) I remember copy-pasting from Google/StackOverflow (i.e. mostly search + pattern matching with some minimal reasoning) being criticized as a low-effort mode of development during the 2010s, before ChatGPT and AI assisted coding tools took over that part.

So yes, I'd argue a huge amount of software development problems can be solved without ever actually reasoning from first principles, AI tools just made that more visible.

Ajakks

about 1 month ago

1 reply

Well, "spooky action at a distance" is something that believing outright at the time, literally required faith - much like the faith required of scientist today, everything comes from nothing for no reason and without cause and as consequence - everything everywhere... Ohhh!!! and like 11 dimensions with maybe infinite universes, too -> oh yeah, and the universe is infinite, maybe.

They have so many ways of saying "God" without saying God.

teiferer

about 1 month ago

1 reply

> much like the faith required of scientist today

You might be missing the point of science.

It's ultimately an endeavor of finding testable descriptions of the world in the face of being fallible. It's not about the "why". It's about "how" the world is. No faith required. "Why" the world is is a philosophical question and perhaps a religious one. But that has nothing to do with testable theories.

Any scientific theory gains credibility by providing ways to test it. Each such experiment that fails to disprove the theory increases confidence in the theory's validity. There is no faith required for any of that and no god either. If you can predict that conditions A and B lead to C happening, and I can try it and see that indeed C is happening, then you have science going on, without any faith.

Ajakks

26 days ago

I think you misunderstood science today.

Scientists only do safe experiments that will 100% verify the findings they are paid to attain.

Nobody will do peers review of whatever study unless its controversial- in which case they will come out of the woodwork to "discover" how correct/not THAT wrong their peers were/are still.

It is nonsense that we consider what comes from such a system to be "knowledge" - Modern science is many things, faithless science validating experimental theory -> that is not a real thing.

flir

about 1 month ago

I know we're just pointlessly abusing the analogy here, but... mediaeval cathedrals are a greater work of artifice than pyramids.

bigiain

about 1 month ago

1 reply

> we called it "standing on the shoulders of giants"

We do not see nearly so far though.

Because these days we are standing on the shoulders of giants that have been put into a blender and ground down into a slippery pink paste and levelled out to a statistically typical 7.3mm high layer of goo.

_kb

about 1 month ago

2 replies

The secret is you then have to heat up that goo. When the temperature gets high enough things get interesting again.

gilleain

about 1 month ago

You get Flubber?

pseidemann

about 1 month ago

Just simulate some evolution here and there.

shevy-java

about 1 month ago

1 reply

This sounds like an Alan Kay quote. He meant that in regards to useful inventions. AI-generated spam just decreases the quality. We'd need a real alternative to this garbage from Google but all the other search engines are also bad. And their UI is also horrible - not as bad as Google, but also bad. Qwant just tries to copy/paste Google for instance (though interestingly enough, sometimes it has better results than Google - but also fewer in general, even ignornig false positive results).

visarga

about 1 month ago

Deep Research reports I think are above average internet quality, they collect hundreds of sources, synthesize and contrast them & provide backlinks. Almost like a generative wikipedia.

I think all we can expect from internet information is a good description of the distribution of materials out there, not truth. This is totally within the capabilities of LLMs. For additional confidence run 3 reports on different models.

rebuilder

about 1 month ago

That's because the things we built on weren't slop

Mistletoe

about 1 month ago

How to make fire or kill a woolly mammoth was not slop come on.

ben_w

about 1 month ago

There's a reason this is comedy:

  Listen, lad. I built this kingdom up from nothing. When I started here, all there was was swamp. Other kings said I was daft to build a castle on a swamp, but I built it all the same, just to show 'em. It sank into the swamp. So, I built a second one. That sank into the swamp. So, I built a third one. That burned down, fell over, then sank into the swamp, but the fourth one... stayed up! And that's what you're gonna get, lad: the strongest castle in these islands.

While this is religious:

  [24] “Everyone then who hears these words of mine and does them will be like a wise man who built his house on the rock. [25] And the rain fell, and the floods came, and the winds blew and beat on that house, but it did not fall, because it had been founded on the rock. [26] And everyone who hears these words of mine and does not do them will be like a foolish man who built his house on the sand. [27] And the rain fell, and the floods came, and the winds blew and beat against that house, and it fell, and great was the fall of it.”

Humans build not on each other's slop, but on each other's success.

Capitalism, freedom of expression, the marketplace of ideas, democracy: at their best these things are ways to bend the wisdom of the crowds (such as it is) to the benefit of all; and their failures are when crowds are not wise.

The "slop" of capitalism is polluted skies, soil and water, are wage slaves and fast fashion that barely lasts one use, and are the reason why workplace health and safety rules are written in blood. The "slop" of freedom of expression includes dishonest marketing, libel, slander, and propaganda. The "slop" of democracy is populists promising everything to everyone with no way to deliver it all. The "slop" of the marketplace of ideas is every idiot demanding their own un-informed rambling be given the same weight as the considered opinions of experts.

None of these things contributed our social, technological, or economic advancement, they are simply things which happened at the same time.

AI has stuff to contribute, but using it to make an endless feed of mediocrity is not it. The flood of low-effort GenAI stuff filling feeds and drowning signal with noise, as others have said: just give us your prompt.

pseidemann

about 1 month ago

You may have one point.

The industrial age was built on dinosaur slop, and they were giant.

groestl

about 1 month ago

We have two optimization mechanisms though which reduce noise with respect to the optimization functions: evolution and science. They are implicitly part of "standing on the shoulders of giants", you pick the giant to stand on (or it is picked for you).

Whether or not the optimization functions align with human survival, and thus our whole existence is not a slop, we're about to find out.

mwidell

about 1 month ago

2 replies

Low background steel is no longer necessary.

"...began to fall in 1963, when the Partial Nuclear Test Ban Treaty was enacted, and by 2008 it had decreased to only 0.005 mSv/yr above natural levels. This has made special low-background steel no longer necessary for most radiation-sensitive uses, as new steel now has a low enough radioactive signature."

https://en.wikipedia.org/wiki/Low-background_steel

juvoly

about 1 month ago

2 replies

Interesting. I guess that analogously, we might find that X years after some future AI content production ban, we could similarly start ignoring the low background token issue?

actionfromafar

about 1 month ago

3 replies

We used a rather low number of atmospheric bombs, while we are carpet bombing the internet every day with AI marketing copy.

MadnessASAP

about 1 month ago

2 replies

The eternal September has finally ended. We've now entered the AI winter. It promises to be long, dark, and full of annoyances.

embedding-shape

about 1 month ago

1 reply

"Winter" in AI (or cryptocurrency, or any at all) ecosystems denote a period of low activity, and a focus on fundamentals instead of driven by hype.

What we're seeing now is something more like the peak of summer. If it ends up being a bubble, and it burtst, some months after that will be "AI Winter" as investors won't want to continue chucking money at problems anymore, and it'll go back to "in the background research" again, as it was before.

MadnessASAP

about 1 month ago

It was a continuation of the nuclear analogy, a nuclear winter following a large scale nuclear exchange.

Also that winter comes after September (fall)

anticensor

about 1 month ago

We instead entered eternal December.

xjm

about 1 month ago

We used a low number _and_ it was a while ago (it would be different if we used the same number spread out on the same time span)

SecretDreams

about 1 month ago

We're bombing the internet into extinction. But we were way before AI. It got real bad during the SEO/monetization phase. AI was just the final nail.

piker

about 1 month ago

What’s the half-life of a viral meme?

doe88

about 1 month ago

Can't wait, in fifty years we will have our data clean again.

alansaber

about 1 month ago

Since synthetic data for training is pretty ubiquitous seems like a novelty

anticensor

about 1 month ago

1 reply

You should call it Predecember, referring to the eternal December.

unfunco

about 1 month ago

1 reply

September?

littlestymaar

about 1 month ago

2 replies

ChatGPT was released exactly 3 years ago (on the 30th of November) so December it is in this context.

permo-w

about 1 month ago

2 replies

surely that would be eternal November then

littlestymaar

about 1 month ago

3 replies

No, being released on Nov 30th means November was still before the slop era.

retsibsi

about 1 month ago

1 reply

In the end the analogy doesn't really work, because 'eternal September' referred to what used to be a regular, temporary thing (an influx of noobs disrupting the online culture, before eventually leaving or assimilating) becoming the new normal. 'Eternal {month associated with ChatGPT}' doesn't fit because LLM-generated content was never a periodic phenomenon.

hackable_sand

about 1 month ago

AI R&D certainly was periodic. Good thing we put a stop to that!

AlecSchueler

about 1 month ago

Yes, and this site is for everything before the slop era, hence eternal November.

permo-w

about 1 month ago

to be honest, GPT-3, which was pretty solid and extremely capable of producing webslop, had been out for a good while before ChatGPT, and GPT-2 even had been being used for blogslop years before. maybe ChatGPT was the beginning of when the public became aware of it, but it was going on well beforehand. and, as the sibling commenter points out, the analogy doesn't quite fit structurally either

123malware321

about 1 month ago

1 reply

everything is dead after november passes

permo-w

about 1 month ago

oof I only caught the meaning of this on the third read. yeah. probably

anticensor

about 1 month ago

aka 0 December

themanmaran

about 1 month ago

1 reply

The low-background steel of the internet

https://en.wikipedia.org/wiki/Low-background_steel

HelloUsername

about 1 month ago

1 reply

As mentioned half a year ago at https://news.ycombinator.com/item?id=44239481

thm

about 1 month ago

1 reply

As mentioned 7 months ago https://news.ycombinator.com/item?id=43811732

Ginger-Pickles

about 1 month ago

As mentioned in this thread :P https://news.ycombinator.com/item?id=46103662

tkgally

about 1 month ago

4 replies

Somewhat related, the leaderboard of em-dash users on HN before ChatGPT:

https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...

maplethorpe

about 1 month ago

4 replies

They should include users who used a double hyphen, too -- not everyone has easy access to em dashes.

venturecruelty

about 1 month ago

1 reply

Oof, I feel like you'll accidentally capture a lot of getopt_long() fans. ;)

Kinrany

about 1 month ago

Excluding those with asymmetrical whitespace around might be enough

gblargg

about 1 month ago

1 reply

Does AI use double hyphens? I thought the point was to find who wasn't AI that used proper em dashes.

jader201

about 1 month ago

2 replies

Anytime I do this — and I did it long before AI did — they are always em dashes, because iOS/macOS translates double dashes to em dashes.

I think there may be a way to disable this, but I don’t care enough to bother.

If people want to think my posts are AI generated, so be it.

JumpCrisscross

about 1 month ago

2 replies

> Anytime I do this — and I did it long before AI did — they are always em dashes

It depends if you put the space before and after the dashes--that, to be clear, are meant to be there--or if you don't.

oniony

about 1 month ago

2 replies

I cannot remember ever reading a book where there was a space around the dashes.

kuschku

about 1 month ago

3 replies

That depends on the language — whereas German puts spaces around —, English afaik usually doesn’t.

(Similarly, French puts spaces before and after . ? !, while English and German only put spaces afterwards.)

greenicon

about 1 month ago

3 replies

In German you use en-dashes with spaces, whereas in English it’s em-dashes without spaces. Some people dislike em-dashes in English though and use en-dashes with spaces as well.

bloak

about 1 month ago

In British English en-dashes with spaces is more common than em-dashes without spaces, I think, but I don't have any data for that, just a general impression.

JumpCrisscross

about 1 month ago

> whereas in English it’s em-dashes without spaces

Didn't know! Woot, I win!

Why does AI have a preference for doing it differently?

dragonwriter

about 1 month ago

In English, typically em-dashes are set without spaces or with thin spaces when used to separate appositives/parentheticals (though that style isn't universal even in professional print, there are places that aet them open, and en-dashes set open can also be used in this role); when representating an interruption, they generally have no space before but frequently have space following. And other uses have other patterns.

iLoveOncall

about 1 month ago

French doesn't put one before the period.

bratwurst3000

about 1 month ago

french does "," and "." like the british and germans the rest is space befor space after

LoganDark

about 1 month ago

1 reply

Technically, there are supposed to be hair spaces around the dashes, not regular spaces. They're small enough to be sometimes confused for kerning.

cachius

about 1 month ago

Em dashes used as parenthetical dividers, and en dashes when used as word joiners, are usually set continuous with the text. However, such a dash can optionally be surrounded with a hair space, U+200A, or thin space, U+2009 or HTML named entities &hairsp; and   These spaces are much thinner than a normal space (except in a monospaced (non-proportional) font), with the hair space in particular being the thinnest of horizontal whitespace characters.

https://en.wikipedia.org/wiki/Whitespace_character#Hair_spac...

Typographers usually add space to the left side of the following marks:

    : ; ” ’ ! ? / ) ] } * ¿ › » @ ® ™ ℓ ° ¡ ' " † + = ÷ - – —

And they usually add space to the right of these:

    “ ‘ / ( [ { > ≥ < ≤ £ $ ¢ € ‹ « √ μ # @ + = ÷ - – —

https://www.smashingmagazine.com/2020/05/micro-typography-sp...

1. (letterpress typography) A piece of metal type used to create the narrowest space. 2. (typography, US) The narrowest space appearing between letters and punctuation.

https://en.wiktionary.org/wiki/hair_space

Now I'd like to see how the metal type looks like, but ehm... it's difficult googling it. Also a whole collection of space types and what they're called in other languages.

fragmede

about 1 month ago

1 reply

What, no love for our friend the en-dash?

- vs – vs —

chickensong

about 1 month ago

2 replies

I once spent a day debugging some data that came from an English doc written by someone in Japan that had been pasted into a system and caused problems. Turned out to be an en-dash issue that was basically invisible to the eye. No love for en-dash!

1718627440

about 1 month ago

1 reply

This issue also exists with (so called) "smart" quotes.

fragmede

about 1 month ago

2 replies

Which, the iOS keyboard “helpfully” uses for you.

mh-

about 1 month ago

Pretty much the first thing I turn off on a new laptop (it's in the keyboard settings on iOS too.)

withinboredom

about 1 month ago

Especially when you're sending some quick scratch code in a slack message.

ben_w

about 1 month ago

Similar.

Compiler error while working on some ObjC. Nothing obviously wrong. Copy-pasted the line, same thing on the copy. Typed it out again, no issue with the re-typed version. Put the error version and the ok version next to each other, apparently identical.

I ended up discovering I'd accidentally lent on the option key while pressing the "-"; Monospace font, Xcode, m-dash and minus looked identical.

teiferer

about 1 month ago

There is also the difference in using space around em-dashes.

bigiain

about 1 month ago

1 reply

That would false positive me. I have used double dashes to delimit quote attribution for decades.

Like this:

"You can't believe everything you read on the internet." -- Abraham Lincoln, personal correspondence, 1863

dragonwriter

about 1 month ago

That's literally a standard use of em-dash being approximated by a double hyphen, though.

SoftTalker

about 1 month ago

1 reply

Double-hyphen is an en-dash. Triple-hyphen is an em-dash.

dragonwriter

about 1 month ago

Double hyphen is replaced in some software with an en-dash (and in those, a triple hyphen is often replaced with an em-dash), and in some with an em-dash; its usually used (other than as input to one of those pieces of software) in places where an em-dash would be appropriate, but in contexts where both an em-dash set closed and an en-dash set open might be used, it is often set open.

So, it’s not unambiguously s substitute for either is essentially its own punctuation mark used in ASCII-only environments with some influence from both the use of em-dashed and that of en-dashes in more formal environments.

a5c11

about 1 month ago

2 replies

Apparently, it's not only em-dash that's distinctive. I've went through comments of the leader, and spot he also uses the backtick "’" instead of the apostrophe.

baiwl

about 1 month ago

1 reply

Just to be clear this is done automatically by macOS or iOS browsers when configured properly.

a5c11

about 1 month ago

Never happened to me. And I'm using Mac and iPhone.

kuschku

about 1 month ago

I (~100 in the leaderboard, regardless of how you sort) also frequently use ’ (unicode apostrophe) instead of ' :D

lxgr

about 1 month ago

Amazing! But no love for en dashes?

Ajakks

about 1 month ago

I have used a dash - like that for almost 20 years, 100% of the time I ought to use a semi-colon and about half of the time for commas - it let's me just keep talking about things, the comma is harder pause. I've recently started seriously writing at a literary level, and I have fallen in love with the em dash - it has a fantastic function within established professional writing, where it is used often - its why the AI uses it so much.

GaryBluto

about 1 month ago

1 reply

Why use this when you can use the before: syntax on most search engines?

aDyslecticCrow

about 1 month ago

2 replies

doesn't actually do anything anymore in Google or bing.

Thorrez

about 1 month ago

Searching Google for

chatgpt

chatgpt before:2022-01-01

give me quite different results. In the 2nd query, most results have a date listed next to them in the results page, and that date is always prior to 2022. So the date filtering is "working". However, most of the dates are actually Google making a mistake and misinterpreting some unimportant date it found on the page as the date the page was created. At least one result is a Youtube video posted before 2022, that edited its title after Chatgpt was released to say Chatgpt.

Disclosure: I work at Google, but not on search.

GaryBluto

about 1 month ago

I use it frequently to find older websites to browse. Works relatively well for most search terms. If you want something from 2005 or before I find -inurl:https works well.

permo-w

about 1 month ago

7 replies

besides for training future models, is this really such a big deal? most of the AI-gened text content is just replacing content-farm SEO-spam anyway. the same stuff that any half-awares person wouldn't have read in the past is now slightly better written, using more em dashes and instances of the word "delve". if you're consistently being caught out by this stuff then likely you need to improve your search hygiene, nothing so drastic as this

the only place I've ever had any issue with AI content is r/chess, where people love to ask ChatGPT a question and then post the answer as if they wrote it, half the time seemingly innocently, which, call me racist, but I suspect is mostly due to the influence of the large and young Indian contingent. otherwise I really don't understand where the issue lies. follow the exact same rules you do for avoiding SEO spam and you will be fine

system2

about 1 month ago

1 reply

Yes indeed, it is a problem. Now the old good sites have turned into AI-slop sites because they can't fight the spammers by writing slowly with humans.

permo-w

about 1 month ago

if a potential defense is to simply the spammers, then the site was previously just as likely to start hiring content-farm human slop writers as they are now likely to use AI, i.e. the site probably wasn't that great in the first place and had equal potential to deteriorate, AI or no

Cadwhisker

about 1 month ago

1 reply

In the past, I'd find one wrong answer and I could easily spot the copies. Now there's a dozen different sites with the same wrong answer, just with better formatting and nicer text.

finaard

about 1 month ago

The trick is to only search for topics where there are no answers, or only one answer leading to that blog post you wrote 10 years ago and forgot about.

darkwater

about 1 month ago

2 replies

> besides for training future models, is this really such a big deal? most of the AI-gened text content is just replacing content-farm SEO-spam anyway.

Yes, it is because of the other side of the coin. If you are writing human-generated, curated content, previously you would just do it in your small patch of Internet, and probably SEs (Google...) will pick it up anyway because it was good quality content. You just didn't care about SEO-driven shit anyway. Now you nicely hand-written content is going to be fed into LLM training and it's going to be used - whatever you want it or not - in the next generation of AI slop content.

permo-w

about 1 month ago

this is basically the equivalent of saying that content-farm writers might read your content and bastardise it into seo slop. okay, sure, it's true, but it was always true and AI doesn't change it significantly

visarga

about 1 month ago

It's not slop if it is inspired from good content. Basically you need to add your original spices into the soup to make it not slop, or have the LLM do deep research kind of work to contrast among hundreds of sources.

Slop did not originate from AI itself, but from the feed ranking Algorithm which sets the criteria for visibility. They "prompt" humans to write slop.

AI slop is just an extension of this process, and it started long before LLMs. Platforms optimizing for their own interest at the expense of both users and creators is the source of slop.

never_inline

about 1 month ago

3 replies

A colleague sent me a confident ChatGPT formatted bug report.

It misidentified what the actual bug was.

But the tone was so confident, and he replied to my later messages using chat gpt itself, which insisted I was wrong.

I don't like this future.

crazygringo

about 1 month ago

It's not the future. Tell him not to do that. If it happens again, bring it to the attention of his manager. Because that's not what he's being paid for. If he continues to do it, that's grounds for firing.

What you're describing is not the future. It's a fireable offense.

blitzar

about 1 month ago

I have dozens of these over the years - many of the people responsible have "Head of ..." or "Chief ..." job titles now.

artursapek

about 1 month ago

Did you call his ass out for being lazy and wasting your time?

pajamasam

about 1 month ago

SEO-spam was often at least somewhat factual and not complete generated garbage. Recipe sites, for example, usually have a button that lets you skip the SEO stuff and get to the actual recipe.

Also, the AI slop is covering almost every sentence or phrase you can think of to search. Before, if I used more niche search phrases and exact searches, I was pretty much guaranteed to get specific results. Now, I have to wade through pages and pages of nonsense.

zwnow

about 1 month ago

Yes it is a big deal. I cant find new artists without having a fear of their art being AI generated, same for books and music. I also cant post my stuff to the internet anymore because I know its going to be fed into LLM training data. The internet is dead to me mostly and thankfully I lost almost all interest of being on my computer as much as I used to be.

Aurornis

about 1 month ago

> the only place I've ever had any issue with AI content is r/chess, where people love to ask ChatGPT a question and then post the answer as if they wrote it, half the time seemingly innocently

Some of the science, energy, and technology subreddits receive a lot of ChatGPT repost comment. There are a lot of people who think they’ve made a scientific or philosophical breakthrough with ChatGPT and need to share it with the world.

Even the /r/localllama subreddit gets constant AI spam from people who think they’ve vibecoded some new AI breakthrough. There have been some recent incidents where someone posted something convincing and then others wasted a lot of time until realizing the code didn’t accomplish what the post claimed it did.

Even on HN some of the “Show HN” posts are AI garbage from people trying to build portfolios. I wasted too much time trying to understand one of them until I realized they had (unknowingly?) duplicated some commits from upstream project and then let the LLM vibe code a README that sounded like an amazing breakthrough. It was actually good work, but it wasn’t theirs. It was just some vibecoding tool eventually arriving at the same code as upstream and then putting the classic LLM written, emoji-filled bullet points in the README

pknerd

about 1 month ago

2 replies

Something generated by humans does not mean high quality.

Krssst

about 1 month ago

2 replies

Yes, but AI-generated is always low quality so it makes sense to filter it out.

IshKebab

about 1 month ago

I wouldn't say always... Especially because you probably only noticed the bad slop. Usually it is crap though.

josephjrobison

about 1 month ago

Grokipedia would like a word

a5c11

about 1 month ago

At least when reading a human-made material you can spot author's uncertainty in some topics. Usually, when someone doesn't have knowledge of something, he doesn't try to describe that. AI, however, will try to convince you that pigs can fly.

zkmon

about 1 month ago

1 reply

Most of college courses and school books haven't changed in decades. Some reputed college keep courses for Pascal and Fortran instead of Python or Java, just because, it might affect their reputation of being classical or pure or to match their campus buildings style.

fastasucan

about 1 month ago

Or because the core knowledge stay the same no matter how it is expressed.

defraudbah

about 1 month ago

2 replies

ChatGPT also returns content only created before ChatGPT release, which is why I still have to google damn it!

stinos

about 1 month ago

1 reply

Is that still the case? And even if so how is it going to avoid keeping it like that in the future? Are they going to stop scraping new content, or are they going to filter it with a tool which recognizes their own content?

defraudbah

about 1 month ago

it's a known problem in ML, I think grok solved it partially and chatGPT uses another model on top to search web like suggested below. Hence MLOps field appeared, to solve models management

I find it a bit annoying to navigate between hallucinations and outdated content. Too much invalid information to filter out.

fragmede

about 1 month ago

Click the globe icon below the input box to enable web searching by ChatGPT.

phplovesong

about 1 month ago

1 reply

The slop is getting worse, as there is so much llm generated shit online, now new models are getting trained on the slop. Slop training slop, and slop. We have gone full circle just in a matter of a few years.

muixoozie

about 1 month ago

1 reply

I was replaying Cyberpunk 2077 and trying to think of all the ways one might have dialed up the dystopia to 11 (beyond what the game does). And pervasive AI slop was never on my radar. Kinda reminds me of the foreword in Neuromancer bringing attention to the fact the book was written before cellphones became popular. It's already fucking with my mind. I recently watched Frankenstein 2025 and 100% thought gen ai had a role in the CGI only to find out the director hates it so much he rather die than use it. I've been noticing little things in old movies and anime where I thought to myself (if I didn't know this was made before gen ai, I would have thought this was generated for sure). One example (https://www.youtube.com/watch?v=pGSNhVQFbOc&t=412) cityscape background in this a outro scene with buildings built on top of buildings gave me ai vibes (really the only thing in this whole anime), yet this came out ~1990. So I can already recognize a paranoia / bias in myself and really can't reliably tell what's real.. Probably also other people have this and why some non-zero number of people always thinks every blog post that comes out was written by gen ai.

Cyan488

about 1 month ago

I had the same experience, watching a nature documentary on a streaming service recently. It was... not so good, at least at the beginning. I was wondering if this was a pilot for AI generated content on this streaming service.

Actually, it came out in 2015 and was just low budget.

dinkblam

about 1 month ago

google results were already 90% SEO crap long before ChatGPT

just use Kagi and block all SEO sites...

progman32

about 1 month ago

Not affiliated, but I've been using kagi's date range filter to similar effect. The difference in results for car maintenance subjects is astounding (and slightly infuriating).

RomanPushkin

about 1 month ago

For that purpose I do not update my book on LeanPub about Ruby. I just know one day people gonna read it more, because human-written content would be gold.

ETH_start

about 1 month ago

I'm grateful that I published a large body of content pre-ChatGPT so that I have proof that I'm not completely inarticulate without AI.

ricardo81

about 1 month ago

FWIW Mojeek (an organic search engine in the classic sense) can do this with the before: operator.

https://www.mojeek.com/search?q=britney+spears+before%3A2010...

cryptozeus

about 1 month ago

technically you can ask chatgpt to return the same result by asking it to filter by year

tobr

about 1 month ago

For images, https://same.energy is a nice option that, being abandoned but still functioning since a few years, seems to naturally not have crawled any AI images. And it’s all around a great product.

tejaallu

about 1 month ago

ccccc

tejaallu

about 1 month ago

gdh

voiper1

about 1 month ago

Of course my first thought was: Let's use this as a tool for AI searches (when I don't need recent news).

Resources