Google Flags Immich Sites as Dangerous

https://old.reddit.com/r/immich/comments/1oby8fq/immich_is_a...

3 months ago

3 replies

I tried to submit this, but the direct link here is probably better than the Reddit thread I linked to:

I had my personal domain I use for self-hosting flagged. I've had the domain for 25 years and it's never had a hint of spam, phishing, or even unintentional issues like compromised sites / services.

It's impossible to know what Google's black box is doing, but, in my case, I suspect my flagging was the result of failing to use a large email provider. I use MXRoute for locally hosted services and network devices because they do a better job of giving me simple, hard limits for sending accounts. That way if anything I have ever gets compromised, the damage in terms of spam will be limited to (ex) 10 messages every 24h.

I invited my sister to a shared Immich album a couple days ago, so I'm guessing that GMail scanned the email notifying her, used the contents + some kind of not-google-or-microsoft sender penalty, and flagged the message as potential spam or phishing. From there, I'd assume the linked domain gets pushed into another system that eventually decides they should blacklist the whole domain.

The thing that really pisses me off is that I just received an email in reply to my request for review and the whole thing is a gas-lighting extravaganza. Google systems indicate your domain no longer contains harmful links or downloads. Keep yourself safe in the future by blah blah blah blah.

Umm. No! It's actually Google's crappy, non-deterministic, careless detection that's flagging my legitimate resources as malicious. Then I have to spend my time running it down and double checking everything before submitting a request to have the false positive mistake on Google's end fixed.

Convince me that Google won't abuse this to make self hosting unbearable.

akerl_

3 months ago

1 reply

> I suspect my flagging was the result of failing to use a large email provider.

This seems like the flagging was a result of the same login page detection that the Immich blog post is referencing? What makes you think it's tied to self-hosted email?

https://photos.example.com/albums/xxxxxxxx-xxxx-xxxx-xxxx-xx...

3 months ago

I'm not using self hosted email. My theory is that Google treats smaller mail providers as less trustworthy and that increases the odds of having messages flagged for phishing.

In my case, the Google Search Console explicitly listed the exact URL for a newly created shared album as the cause.

I wish I would have taken a screenshot. That URL is not going to be guessed randomly and the URL was only transmitted once to one person via e-mail. The sending was done via MXRoute and the recipient was using GMail (legacy Workspace).

The only possible way for Google to have gotten that URL to start the process would have been by scanning the recipient's e-mail. What I was trying to say is that the only way it makes sense to me is if Google via GMail categorized that email as phishing and that kicked off the process to add my domain to the block list.

So, if email categorization / filtering is being used as a heuristic for discovering URLs for the block list, it's possible Google's discriminating against domains that use smaller email hosts that Google doesn't trust as much as themselves, Microsoft, etc..

All around it sucks and Google shouldn't be allowed to use non-deterministic guesswork to put domains on a block list that has a significant negative impact. If they want to operate a clown show like that, they should at least be liable for the outcomes IMO.

david_van_loon

3 months ago

1 reply

I'm in a similar boat. Google's false flag is causing issues for my family members who use Chrome, even for internal services that aren't publicly exposed, just because they're on related subdomains.

It's scary how much control Google has over which content people can access on the web - or even on their local network!

Larrikin

3 months ago

1 reply

It's a good opportunity to recommend Firefox when you can show a clear abuse of position

https://news.ycombinator.com/item?id=45538760

3 months ago

Firefox uses the same list.

foobarian

3 months ago

Wonder if there would be any way to redress this in small claims court.

captnasia

3 months ago

1 reply

This seems related to another hosting site that got caught out by this recently:

o11c

3 months ago

Not quite the same (other than being an abuse of the same monopoly) since this one is explicitly pointing to first-party content, not user content.

kevinsundar

3 months ago

5 replies

This may not be a huge issue depending on mitigating controls but are they saying that anyone can submit a PR (containing anything) to Immich, tag the pr with `preview` and have the contents of that PR hosted on https://pr-<num>.preview.internal.immich.cloud?

Doesn't that effectively let anyone host anything there?

daemonologist

3 months ago

1 reply

I think only collaborators can add labels on github, so not quite. Does seem a bit hazardous though (you could submit a legit PR, get the label, and then commit whatever you want?).

ajross

3 months ago

Exposure also extends not just to the owner of the PR but anyone with write access to the branch from which it was submitted. GitHub pushes are ssh-authenticated and often automated in many workflows.

rixed

3 months ago

1 reply

So basically like https://docs.google.com/ ?

jeroenhd

3 months ago

Yes, except on Google Docs you can't make the document steal credentials or download malware by simply clicking on the link.

It's more like sites.google.com.

bo0tzz

3 months ago

No, it doesn't work at all for PRs from forks.

tgsovlerkhgsel

3 months ago

That was my first thought - have the preview URLs possibly actually been abused through GitHub?

warkdarrior

3 months ago

Excellent idea for cost-free phishing.

NelsonMinar

3 months ago

7 replies

Be sure to see the team's whole list of Cursed Knowledge. https://immich.app/cursed-knowledge

levkk

3 months ago

2 replies

The Postgres query parameters one is funny. 65k parameters is not enough for you?!

strken

3 months ago

2 replies

As it says, bulk inserts with large datasets can fail. Inserting a few thousand rows into a table with 30 columns will hit the limit. You might run into this if you were synchronising data between systems or running big batch jobs.

Sqlite used to have a limit of 999 query parameters, which was much easier to hit. It's now a roomy 32k.

tym0

3 months ago

1 reply

Right, for postgres I would use unnest for inserting a non-static amount of rows.

strken

3 months ago

1 reply

In the past I've used batches of data, inserted into a separate table with all the constraints turned off and using UNNEST, and then inserted into the final table once it was done. We ended up both batching the data and using UNNEST because it was faster but it still let us resume midway through.

We probably should have been partitioning the data instead of inserting it twice, but I never got around to fixing that.

COPY is likely a better option if you have access to the host, or provider-specific extensions like aws_s3 if you have those. I'm sure a data engineer would be able to suggest a better ETL architecture than "shove everything into postgres", too.

devjab

3 months ago

Was MERGE too slow/expensive? We tend to MERGE from staging or temporary tables when we sync big data sets. If we were on postgres I think we'd use ... ON CONFLICT, but MERGE does work.

evertedsphere

3 months ago

COPY is often a usable alternative.

reliabilityguy

3 months ago

1 reply

> PostgreSQL USER is cursed > The USER keyword in PostgreSQL is cursed because you can select from it like a table, which leads to confusion if you have a table name user as well.

is even funnier :D

3 months ago

SQL's "feature" of having table and field names in the same syntactic namespace as an ever expanding set of english language keywords is the original eldritch curse behind it all.

nemothekid

3 months ago

7 replies

Some of these seem less cursed, and more just security design?

>Some phones will silently strip GPS data from images when apps without location permission try to access them.

That strikes me as the right thing to do?

gausswho

3 months ago

1 reply

Huh. Maybe? I don't want that information available to apps to spy on me. But I do want full file contents available to some of them.

And wait. Uh oh. Does this mean my Syncthing-Fork app (which itself would never strike me as needing location services) might have my phone's images' location be stripped before making their way to my backup system?

EDIT: To answer my last question: My images transferred via Syncthing-Fork on a GrapheneOS device to another PC running Fedora Atomic have persisted the GPS data as verified by exiftool. Location permissions have not been granted to Syncthing-Fork.

Happy I didn't lose that data. But it would appear that permission to your photo files may expose your GPS locations regardless of the location permission.

krs_

3 months ago

With the Nextcloud app I remember having to enable full file permissions to preserve the GPS data of auto-uploaded photos a couple of years ago. Which I only discovered some months after these security changes went into effect on my phone. That was fun. I think Android 10 or 11 introduced it.

Looking now I can't even find that setting anymore on my current phone. But the photos still does have the GPS data intact.

3 months ago

1 reply

> That strikes me as the right thing to do

Yep, and it's there for very goos reasons. However if you don't know about it, it can be quite surprising and challenging to debug.

Also it's annoying when your phones permissions optimiser runs and removes the location permissions from e.g. Google Photos, and you realise a few months later that your photos no longer have their location.

stronglikedan

3 months ago

1 reply

There is never a good reason to permanently modify my files, if that is what is going on here. Seems like I wouldn't be able to search my photos by location reliably if that data was stripped from them.

3 months ago

Nothing is "permanently modifying your files".

What happens is that when an application without location permissions tries to get photos, the corresponding OS calls strip the geo location data when passing them. The original photos still have it, but the application doesn't, because it doesn't have access to your location.

This was done because most people didn't know that photos contain their location, and people got burned by stalkers and scammers.

marcosdumay

3 months ago

1 reply

IMO, the problem is that it fails silently.

Every kind of permission should fail the same way, informing the user about the failure, and asking if the user wants to give the permission, deny the access, or use dummy values. If there's more than one permission needed for an operation, you should be able to deny them all, or use any combination of allowing or using dummy values.

[0] https://tc39.es/proposal-temporal/docs/

3 months ago

And permissions should also not be so wide. You should be able to give permission to the GPS data in pictures you consciously took without giving permission to track your position whenever.

serial_dev

3 months ago

I think the “cursed” part (from the developers point of view) is that some phones do that, some don’t, and if you don’t have both kinds available during testing, you might miss something?

kevincox

3 months ago

I think the bad part is that the users are often unaware. Stripping the data by default makes sense but there should be an easy option not to.

Try to get an iPhone user to send you an original copy of a photo with all metadata. Even if they want to do it most of them don't know how.

_ZeD_

3 months ago

How does it makes sense?

monegator

3 months ago

It's not if it silently alters the file. i do want GPS data for geolocation, so that when i import the images in the right places they are already placed where they should be on the map

eco

3 months ago

1 reply

This kind of makes we wish CURSED.md was a standard file in projects. So much hard-earned knowledge could be shared.

MrDresden

3 months ago

You know you can just start doing that in your projects. That's how practice often becomes standard.

kyle-rb

3 months ago

2 replies

> JavaScript date objects are 1 indexed for years and days, but 0 indexed for months.

I don't disagree that months should be 1-indexed, but I would not make that assumption solely based on days/years being 1-indexed, since 0-indexing those would be psychotic.

kaoD

3 months ago

2 replies

The only reason I can think of to 0-index months is so you can do monthName[date.getMonth()] instead of monthName[date.getMonth() - 1].

I don't think adding counterintuitive behavior to your data to save a "- 1" here and there is a good idea, but I guess this is just legacy from the ancient times.

oblio

3 months ago

1 reply

That would have a better solution in a date.getCurrentMonth(), in my opinion.

kaoD

3 months ago

Temporal[0] is coming which solves many many many issues with JS Date, 1-based months[1] included!

Can't wait for it to be stable and widely available, it's just too good.

> month values start at 1, which is different from legacy Date where months are represented by zero-based indices (0 to 11)

[1] https://tc39.es/proposal-temporal/docs/plaindate.html#month

its_brass

3 months ago

1 reply

A [StackOverflow thread](https://stackoverflow.com/a/41992352) about this interface says it was introduced by Java way back in 1995, and copied by the first JavaScript implementation.

zahlman

3 months ago

(We don't have Markdown formatting here, BTW. But thanks for the heads up, and welcome to YC.)

watermelon0

3 months ago

Why so? Months in written form also start with 1, same as days/years, so it would make sense to match all of them.

For example, the first day of the first month of the first year is 1.1.1 AD (at least for Gregorian calendar), so we could just go with 0-indexed 0.0.0 AD.

3 months ago

3 replies

I love Immich & greatly appreciate the amazing work the team put into maintaining it, but between the OP & this "Cursed Knowledge" page, the apparent team culture of shouting from the rooftops complaints that expose their own ignorance about technology is a little concerning to be honest.

I've now read the entire Cursed Knowledge list & - while I found some of them to be invaluable insights & absolutely love the idea of projects maintaining a public list of this nature to educate - there are quite a few red flags in this particular list.

Before mentioning them: some excellent & valuable, genuinely cursed items: Postgres NOTIFY (albeit adapter-specific), npm scripts, bcrypt string lengths & especially the horrifically cursed Cloudflare fetch: all great knowledge. But...

> Secure contexts are cursed

> GPS sharing on mobile is cursed

These are extremely sane security feature. Do we think keeping users secure is cursed? It honestly seems crazy to me for them to have published these items in the list with a straight face.

> PostgreSQL parameters are cursed

Wherein their definition of "cursed" is that PG doesn't support running SQL queries with more than 65535 separate parameters! It seems to me that any sane engineer would expect the limit to be lower than that. The suggestion that making an SQL query with that many parameters is normal seems problematic.

> JavaScript Date objects are cursed

Javascript is zero-indexed by convention. This one's not a huge red flag but it is pretty funny for a programmer to find this problematic.

> Carriage returns in bash scripts are cursed

Non-default local git settings can break your local git repo. This isn't anything to do with bash & everyone knows git has footguns.

MzHN

3 months ago

4 replies

> Carriage returns in bash scripts are cursed

Also the full story here seemed to be

1. Person installs git on Windows with autocrlf enabled, automatically converting all LF to CRLF (very cursed in itself in my opinion).

2. Does their thing with git on the Windows' side (clone, checkout, whatever).

3. Then runs the checked out (and now broken due to autocrlf) code on Linux instead of Windows via WSL.

The biggest footgun here is autocrlf but I don't see how this is whole situation is the problem of any Linux tooling.

darthwalsh

3 months ago

1 reply

The biggest mistake was running Linux programs over files created by Windows programs. Anything you move between those worlds is suspect.

3 months ago

It wouldn't be a problem if git didn't try to magic away the difference.

Merad

3 months ago

You will have the same problem if you build a Linux container image using scripts that were checked out on the windows host machine. What's even more devious is that some editors (at least VS Code) will automatically save .sh files with LF line endings on Windows, so the problem doesn't appear for the original author, only someone who clones the repo later. I spent probably half a day troubleshooting this a while back. IMO it's not the fault of any one tool, it's just a thing that most people will never think about until it bites them.

TL;DR - if your repo will contain bash scripts, use .gitattributes to make sure they have LF line endings.

3 months ago

This is imo ultimately a problem with git.

If git didn't have this setting, then after checking out a bash file with LFs in it, there are many Windows editors that would not be able to edit that file properly. That's a limitation of those editors & nobody should be using those pieces of software to edit bash files. This is a problem that is entirely out of scope for a VCS & not something Git should ever have tried to solve.

In fact, having git solve this disincentives Windows editors from solving it correctly.

marcosdumay

3 months ago

> I don't see how this is whole situation is the problem of any Linux tooling

Well, bash could also handle crlf nicely. There's no gain from interpreting cr as a non-space character.

(The same is valid for every language out there and all the spacey things, like zero-width space, non-breaking space, and vertical tabs.)

superconduct123

3 months ago

1 reply

You're taking the word cursed way too seriously

This is just a list of things that can catch devs off guard

3 months ago

I guess you're right - I find the tone off but it's not egregious & it is mostly a very useful list.

NelsonMinar

3 months ago

The Date complaint is

> JavaScript date objects are 1 indexed for years and days, but 0 indexed for months.

This mix of 0 and 1 indexing in calendar APIs goes back a long way. I first remember it coming from Java but I dimly recall Java was copying a Taligent Calendar API.

6c696e7578

3 months ago

Saw the long passwords are cursed one. Reminded me of ancient DES unix passwords only reading the first eight characters. What's old is new again...

marcosdumay

3 months ago

Hum...

Dark-grey text on black is cursed. (Their light theme is readable.)

Also, you can do bulk inserts in postgres using arrays. Take a look at unnest. Standard bulk inserts are cursed in every database, I'm with the devs here that it's not worth fixing them in postgres just for compatibility.

Animats

3 months ago

2 replies

If you block those internal subdomains from search with robots.txt, does Google still whine?

snailmailman

3 months ago

1 reply

I’ve heard anecdotes of people using an entirely internal domain like “plex.example.com” even if it’s never exposed to the public internet, google might flag it as impersonating plex. Google will sometimes block it based only on name, if they think the name is impersonating another service.

Its unclear exactly what conditions cause a site to get blocked by safe browsing. My nextcloud.something.tld domain has never been flagged, but I’ve seen support threads of other people having issues and the domain name is the best guess.

https://photos.example.com/albums/xxxxxxxx-xxxx-xxxx-xxxx-xx...

3 months ago

5 replies

I'm almost positive GMail scanning messages is one cause. My domain got put on the list for a URL that would have been unknowable to anyone but GMail and my sister who I invited to a shared Immich album. It was a URL like this that got emailed directly to 1 person:

Then suddenly the domain is banned even though there was never a way to discover that URL besides GMail scanning messages. In my case, the server is public so my siblings can access it, but there's nothing stopping Google from banning domains for internal sites that show up in emails they wrongly classify as phishing.

Think of how Google and Microsoft destroyed self hosted email with their spam filters. Now imagine that happening to all self hosted services via abuse of the safe browsing block lists.

beala

3 months ago

1 reply

It doesn’t seem like email scanning is necessary to explain this. It appears that simply having a “bad” subdomain can trigger this. Obviously this heuristic isn’t working well, but you can see the naive logic of it: anything with the subdomain “apple” might be trying to impersonate Apple, so let’s flag it. This has happened to me on internal domains on my home network that I've exposed to no one. This also has been reported at the jellyfin project: https://github.com/jellyfin/jellyfin-web/issues/4076

https://photos.example.com/albums/xxxxxxxx-xxxx-xxxx-xxxx-xx...

3 months ago

1 reply

In my case though, the Google Search Console explicitly listed the exact URL for a newly created shared folder as the cause.

That's not going to be gleaned from a CT log or guessed randomly. The URL was only transmitted once to one person via e-mail. The sending was done via MXRoute and the recipient was using GMail (legacy Workspace).

The only possible way for Google to have gotten that URL to start the process would have been by scanning the recipient's e-mail.

mkl

3 months ago

1 reply

Not quite. Presumably the recipient clicked the link, at which point their browser knows it and, depending on browser and settings, may submit it to Google to check if it's "safe": https://support.google.com/chrome/answer/9890866#zippy=%2Cen...

3 months ago

Good point. Thank you.

I've read almost everything linked in this post and on Reddit and, with what you pointed out considered, I'd say the most likely thing that got my domain flagged is having a redirect to a default styled login page.

The thing that really frustrates me if that's the case is that it has a large impact on non-customized self-hosted services and Google makes no effort to avoid the false positives. Something as simple as guidance for self-hosted apps to have a custom login screen to differentiate from each other would make a huge difference.

Of course, it's beneficial to Google if they can make self-hosting as difficult as possible, so there's no incentive to fix things like this.

EdwardKrayer

3 months ago

1 reply

Well, that's potentially horrifying. I would love for someone to attempt this in as controlled of a manner as possible. I would assume it's possible for anyone using Google DNS servers to also trigger some type of metadata inspection resulting in this type of situation as well.

Also - when you say banned, you're speaking of the "red screen of death" right? Not a broader ban from the domain using Google Workplace services, yeah?

3 months ago

> Also - when you say banned, you're speaking of the "red screen of death" right?

Yes.

> I would love for someone to attempt this in as controlled of a manner as possible.

I'm pretty confident they scanned a URL in GMail to trigger the blocking of my domain. If they've done something as stupid as tying GMail phishing detection heuristics into the safe browsing block list, you might be able to generate a bunch of phishy looking emails with direct links to someone's login page to trigger the "red screen of death".

im3w1l

3 months ago

Chrome sends visited urls to Google (ymmv depending on settings and consents you have given)

r_lee

3 months ago

if it was just the domain, remember that there is a Cert Transparency log for all TLS certs issued nowadays by valid CAs, which is probably what Google is also using to discover new active domains

liqilin1567

3 months ago

This reminds me of another post where a scammer sent a gmail message containing https://site.google.com/xxx link to trick users into click, but gmail didn't detect the risk.

david_van_loon

3 months ago

1 reply

Yes, my family Immich instance is blocked from indexing both via headers and robots.txt, yet it's still flagged by Google as dangerous.

jeroenhd

3 months ago

2 replies

I'm kind of curious, do you have your own domain for immich or is this part of a malware-flagged subdomain issue? It's kind of wild to me that Google would flag all instances of a particular piece of self-hosted software as malicious.

david_van_loon

3 months ago

1 reply

I have my own domain, and Immich is hosted on an "immich" subdomain.

jeroenhd

3 months ago

I see, thank you for clarifying.

I'm guessing Google's phishing analysis must be going off the rails seeing all of these login prompts saying "immich" when there's an actual immich cloud product online.

If I were tasked with automatically finding phishing pages, I too would struggle to find a solution to differentiate open-source, self-hosted software from phishing pages.

I find it curious that this is happening to Immich so often while none of my own self-hosted services have ever had this problem, though. Maybe this is why so many self-hosted tools have you configure a name/descriptor/title/whatever for your instance, so they can say "log in to <my amazing photo site>" rather than "log in to Product"? Not that Immich doesn't offer such a setting.

skatsubo

3 months ago

G would flag _some_ instances.

Possible scenario:

- A self-hosted project has a demo instance with a default login page (demo.immich.app, demo.jellyfin.org, demo1.nextcloud.com) that is classified as "primary" by google's algorithms

- Any self-hosted instance with the same login page (branding, title, logo, meta html) becomes a candidate for deceptive/phishing by their algorithm. And immich.cloud has a lot of preview envs falling in that category.

BUT in Immich case its _demo_ login page has its own big banner, so it is already quite different from others. Maybe there's no "original" at all. The algorithm/AI just got lost among thousands of identically looking login pages and now considers every other instance as deceptive...

arccy

3 months ago

10 replies

If you're going to host user content on subdomains, then you should probably have your site on the Public Suffix List https://publicsuffix.org/list/ . That should eventually make its way into various services so they know that a tainted subdomain doesn't taint the entire site....

o11c

3 months ago

1 reply

Is that actually relevant when only images are user content?

Normally I see the PSL in context of e.g. cookies or user-supplied forms.

dspillett

3 months ago

1 reply

> Is that actually relevant when only images are user content?

Yes. For instance in circumstances exactly as described in the thread you are commenting in now and the article it refers to.

Services like google's bad site warning system may use it to indicate that it shouldn't consider a whole domain harmful if it considers a small number of its subdomains to be so, where otherwise they would. It is no guarantee, of course.

thayne

3 months ago

1 reply

Well, using the public suffix list _also_ isolates cookies and treats the subdomains as different sites, which may or may not be desirable.

For example, if users are supposed to log in on the base account in order to access content on the subdomains, then using the public suffix list would be problematic.

dspillett

3 months ago

Cross domain identity management is a little extra work, but it's far from a difficult problem. I understand the objection to needing to do it when a shared cookie is so easy, but if you want subdomains to be protected from each other because they do not have shared responsibility for each other then it makes sense in terms of privacy & security that they don't automatically share identity tokens and other client-side data.

r_lee

3 months ago

1 reply

Does Google use this for Safe Browsing though?

akerl_

3 months ago

Looks like it? https://developers.google.com/safe-browsing/reference/URLs.a...

3 months ago

6 replies

I think it's somewhat tribal webdev knowledge that if you host user generated content you need to be on the PSL otherwise you'll eventually end up where Immich is now.

I'm not sure how people not already having hit this very issue before is supposed to know about it beforehand though, one of those things that you don't really come across until you're hit by it.

hu3

3 months ago

1 reply

This is the first time I hear about https://publicsuffix.org

btown

3 months ago

You're in good company! From 12 days ago: https://news.ycombinator.com/item?id=45538760

no_wizard

3 months ago

1 reply

I’ve been doing this for at least 15 years and it’s the first I heard of this.

Fun learning new things so often but I never once heard of the public suffix list.

That said, I do know the other best practices mentioned elsewhere

foobarian

3 months ago

1 reply

First rule of the public suffix list...

no_wizard

3 months ago

1 reply

I think what gets me more is I don't see an easy way to add suffixes to the list. I'm sure if I dig I can figure it out but you'd think given how its used they'd have an obvious step by step guide on the website

3 months ago

1 reply

Last link the menu header: https://publicsuffix.org/submit/

Which then links to: https://github.com/publicsuffix/list/wiki/Guidelines#submitt...

Fairly obvious and typical webpage > documentation flow I think, doesn't seem too hard to find.

3 months ago

1 reply

Ok so we need a GitHub (Microsoft) account to avoid needing a Google account to in case some undocumented system decides to shut down a website we host. Great.

3 months ago

I agree, that's pretty dumb. But I wouldn't say "no easy way to add suffixes to the list" at the very least.

tonyhart7

3 months ago

1 reply

so its skill issue ??? or just google being bad????

yndoendo

3 months ago

3 replies

I will go with Google being bad / evil for 500.

Google 90s to 2010 is nothings like Google 2025. There is a reason they removed "Don't be evil" ... being evil and authoritarian makes more money.

Looking at you Manifest V2 ... pour one out for your homies.

3 months ago

3 replies

Sympathy for the devil, people keep using Google's browser because the safe search guards catch more bad actors than they false positive good actors.

hulitu

3 months ago

1 reply

> people keep using Google's browser because the safe search guards catch more bad actors than they false positive good actors.

This is the first thing i disable in Chrome, Firefox and Edge. The only safe thing they do is safely sending all my browsing history to Google or Microsoft.

3 months ago

That's a reasonable thing for you to do (especially if you have some other signal source you use for malware protection), but HN readers are rarely representative of average users.

This feature is there for my mother-in-law, who never saw a popup ad she didn't like. You might think I'm kidding; I am not. I periodically had to go into her Android device and dump twenty apps she had manually installed from the Play Store because they were in a ring of promoting each other.

bigbadfeline

3 months ago

1 reply

> the safe search guards catch more bad actors than they false positive good actors.

Well, if the legal system used the same "Guilty until proven innocent" model, we would definitely "catch more bad actors than false positive good actors".

That's a tricky one, isn't it.

3 months ago

You do not want malware protection to be running at the speed of the legal system.

A better analogy, unfortunately for all the reasons it's unfortunate, is police: acting on the partial knowledge in the field to try to make the not-worst decision.

3 months ago

1 reply

This is not an honest argument. Most people don't even know this web censorship mechanism exists until they see something (usually legit) blocked.

3 months ago

Do they then switch browsers in response?

3 months ago

1 reply

Don't get me wrong, Google is bad/evil in many ways, but the public suffix list exists to solve a real risk to users. Google is flagging this for a legit reason in this particular case.

3 months ago

It's not a legit reason at all. A website isn't "unsafe" just because it looks similar to another one to Google's AI. At best such an automated flag should trigger a human review, not take the website offline.

Google needs to be held liable for the damages they do in cases like this or they will continue to implement the laziest solutions as long as they can externalize the costs.

tonyhart7

3 months ago

downvoted for saying truth

many google employee is in here, so I dont expect them to be agree with you

bo0tzz

3 months ago

1 reply

The Immich domains that are hit by this issue are -not- user generated content.

3 months ago

1 reply

They clearly are? It seems like GitHub users submitting a PR could/can add a `preview` label, and that would lead to the application + their changes to be deployed to a public URL under "*.immich.cloud". So they're hosted content generated by users (built application based on user patches) on domains under their control.

bo0tzz

3 months ago

1 reply

I'm the guy that built the system, lol. Labels can only be added by maintainers, and the whole system only works for PRs from internal branches.

3 months ago

Ah, then that's a different situation then, sorry for misunderstanding the context and thanks for clearing that up! I was under the impression that Immich accepted outside contributions, and those would also have those preview sites created for their pending contributions.

fn-mote

3 months ago

Clearly they are not reading HN enough. It hasn’t even been two weeks since this issue last hit the front page.

I wish this comment were top ranked so it would be clear immediately from the comments what the root issue was.

nickjj

3 months ago

Besides user uploaded content it's pretty easy to accidentally destroy the reputation of your main domain with subdomains.

For example:

    1. Add a subdomain to test something out
    2. Complete your test and remove the subdomain from your site
    3. Forget to remove the DNS entry and now your A record points to an IP address

At this point if someone else on that hosting provider gets that IP address assigned, your subdomain is now hosting their content.

I had this happen to me once with PDF books being served through a subdomain on my site. Of course it's my mistake for not removing the A record (I forgot) but I'll never make that mistake again.

10 years of my domain having a good history may have gotten tainted in an unrepairable way. I don't get warnings visiting my site but traffic has slowly gotten worse over time since around that time, despite me posting more and more content. The correlation isn't guaranteed, especially with AI taking away so much traffic but it's something I do think about.

827a

3 months ago

1 reply

They aren't hosting user content; it was their pull request preview domains that was triggering it.

This is very clearly just bad code from Google.

antonvs

3 months ago

Or anticompetitive behavior.

0xbadcafebee

3 months ago

1 reply

  In the past, browsers used an algorithm which only denied setting wide-ranging cookies for top-level domains with no dots (e.g. com or org). However, this did not work for top-level domains where only third-level registrations are allowed (e.g. co.uk). In these cases, websites could set a cookie for .co.uk which would be passed onto every website registered under co.uk.

  Since there was and remains no algorithmic method of finding the highest level at which a domain may be registered for a particular top-level domain (the policies differ with each registry), the only method is to create a list. This is the aim of the Public Suffix List.
  
  (https://publicsuffix.org/learn/)

So, once they realized web browsers are all inherently flawed, their solution was to maintain a static list of websites.

God I hate the web. The engineering equivalent of a car made of duct tape.

3 months ago

3 replies

"The engineering equivalent of a car made of duct tape"

Kind of. But do you have a better proposition?

3 months ago

4 replies

A part of the issue is IMO that browsers have become ridiculously bloated everything-programs. You could take about 90% of that out and into dedicated tools and end up with something vastly saner and safer and not a lot less capable for all practical purposes. Instead, we collectively are OK with frosting this atrocious layer cake that is today's web with multiple flavors of security measures of sometimes questionable utility.

End of random rant.

3 months ago

3 replies

"You could take about 90% of that out and into dedicated tools "

But then you would loose plattform independency, the main selling point of this atrocity.

Having all those APIs in a sandbox that mostly just work on billion devices is pretty powerful and a potential succesor to HTML would have to beat that, to be adopted.

The best thing to happen, that I can see, is that a sane subset crystalizes, that people start to use dominantly, with the rest becoming legacy, only maintained to have it still working.

But I do dream of a fresh rewrite of the web since university (and the web was way slimmer back then), but I got a bit more pragmatic and I think I understood now the massive problem of solving trusted human communication better. It ain't easy in the real world.

3 months ago

6 replies

But do we need e.g serial port or raw USB access straight from a random website? Even WebRTC is a bit of a stretch. There is a lot of cruft in modern browsers that does little except increase attack surface.

This all just drives a need to come up with ever more tacked-on protection schemes because browsers have big targets painted on them.

3 months ago

2 replies

How else am I going to make a game in the browser that be controlled with a controller?

3 months ago

1 reply

Every decent host OS already has a dedicated driver stack to provide game controller input to applications in a useful manner. Why the heck would you ship a reimplementation of that in JS in a website?

3 months ago

2 replies

So that you can take input from countrollers that haven't been invented yet and won't fit the HID model.

3 months ago

If it hasn't been invented yet we don't know the implications of giving a website access to it either.

And that's before realizing it's already a bad idea with existing devices because they were never designed for giving untrusted actors direct access.

3 months ago

If it hasn't been invented yet, you don't need driver software for it, do you? ;)

Anyway, in your scenario the controller would be essentially a one off and you'd be better off writing a native app to interface with it for the one computer this experiment will run on.

3 months ago

1 reply

You don't, that's the point: not everything needs to be crammed into a browser.

3 months ago

Unlikely. The convenience incentives are far too high to leave features on the table.

Not unlike the programming language or the app (growing until it half-implements LISP or half-implements an email client), the browser will grow until it half-implements an operating system.

For everyone else, there's already w3m.

com2kid

3 months ago

2 replies

Itch.io games and controller support.

You have sites now that let you debug microcontrollers on your browser, super cool.

Same thing but with firmware updates in the browser. Cross platform, replaced a mess of ugly broken vendor tools.

3 months ago

Just because you can do something doesn't mean you should.

Your micro-controllers should use open standards for their debugging interface and not force people to use the vendor website.

progval

3 months ago

While that's pretty convenient, I'm worried about what happens when the vendor shuts down the website. "Ugly broken vendor tools" can be run forever in a VM of an old system, but a website would be gone forever unless it's purely client-side and someone archived it.

3 months ago

1 reply

> Even WebRTC is a bit of a stretch

You remove that, and videoconferencing (for business or person to person) has to rely on downloading an app, meaning whoever is behind the website has to release for 10-15 OSes now. Some already do, but not everyone has that budget so now there's a massive moat around it.

> But do we need e.g serial port or raw USB access straight from a random website

Being able to flash an IoT (e.g. ESP32) device from the browser is useful for a lot of people. For the "normies", there was also Stadia allowing you to flash their controller to be a generic Bluetooth/usb one on a website, using that webUSB. Without it Google would have had to release an app for multiple OSes, or more likely, would have just left the devices as paperweights. Also, you can use FIDO/U2F keys directly now, which is pretty good.

Browsers are the modern Excel, people complain that they do too much and you only need 20%. But it's a different 20% for everyone.

3 months ago

1 reply

I'll flip that around on you: why oh why do we need to browsers to carry these security holes in them? The Stadia flasher is a good example: how do I know that a website doesn't contain a device flasher that will turn one of my connected devices into a malicious actor that will attempt to take over whatever machine it's plugged into?

3 months ago

2 replies

You know because there is an explicit permission box that pops out and asks if you want to give this website access to a device, and asks you to select that device.

Same as your camera/microphone/location.

3 months ago

But that still gives completely unvetted direct access to the device to a website! People have been pointing to Itch.io games that supposedly require direct USB access. How hard is it to hide a script in there that reprograms a controller into something malicious?

3 months ago

And we all know that non-technical users never just click Yes to make the annoying popup go away.

phatskat

3 months ago

> But do we need e.g serial port or raw USB access straight from a random website?

But do we need audio, images, Canvas, WebGL, etc? The web could just be plain text and we’d get most of the “useful” content still, add images and you get a vast majority of it.

But the idea that the web is a rich environment that has all of these bells and whistles is a good thing imo. Yes there’s attack surface to consider, and it’s not negligible. However, the ability to connect so many different things opens up simple access to things that would otherwise require discrete apps and tooling.

One example that kind of blew my mind is that I wanted a controller overlay for my Twitch stream. After a short bit of looking, there isn’t even a plugin needed in OBS (streaming software). Instead, you add a Web View layer and point it to GamePad Viewer[1] and you’re done.

Serial and USB are possibly a boon for very specific users with very specific accessibility needs. Also, iirc some of the early iPhone jailbreaks worked via websites on a desktop with your iPhone plugged into usb. Sure these are niche, and could probably be served just as well or better with native apps, and web also makes the barrier to entry so much lower .

[1]: https://gamepadviewer.com/

hulitu

3 months ago

> But do we need e.g serial port or raw USB access straight from a random website?

Yes. Regards, CIA, Mossad, FSB etc.

https://en.wikipedia.org/wiki/WebUSB

3 months ago

WebRTC I use since many years and would miss it a lot. P2P is awesome.

WebUSB I don't use or would miss it right now, but .. the main potential use case is security and it sounds somewhat reasonable

"Use in multi-factor authentication

WebUSB in combination with special purpose devices and public identification registries can be used as key piece in an infrastructure scale solution to digital identity on the internet."

smaudet

3 months ago

2 replies

> Having all those APIs in a sandbox that mostly just work on billion devices is pretty powerful and a potential succesor to HTML would have to beat that, to be adopted.

I think the giant major downside, is that they've written a rootkit that runs on everything, and to try to make up for that they want to make it so only sites they allow can run.

It's not really very powerful at all if nobody can use it, at that point you are better off just not bothering with it at all.

The Internet may remain, but the Web may really be dead.

3 months ago

1 reply

> to try to make up for that they want to make it so only sites they allow can run

What do you mean, you can run whatever you want on localhost, and it's quite easy to host whatever you want for whoever you want too. Maybe the biggest modern added barrier to entry is that having TLS is strongly encouraged/even needed for some things, but this is an easily solved problem.

lkjdsklf

3 months ago

The blog post and several anecdotes in the comments prove otherwise

3 months ago

"It's not really very powerful at all if nobody can use it"

But people do use it, like the both of us right now?

People also use maps, do online banking, play games, start complex interactive learning environments, collaborate in real time on documents etc.

All of that works right now.

ngold

3 months ago

Not sure if it counts but I've been enjoying librewolf. I believe just a stripped down firefox.

nemothekid

3 months ago

2 replies

>A part of the issue is IMO that browsers have become ridiculously bloated everything-programs.

I don't see how that solves the issue that PSL tries to fix. I was a script kiddy hosting neopets phishing pages on free cpanel servers from <random>.ripway.com back in 2007. Browsers were way less capable then.

3 months ago

1 reply

PSL and the way cookies work is just part of the mess. A new approach could solve that in a different way, taking into account all the experience we had with scriptkiddies and professional scammers and pishers since then. But I also don't really have an idea where and how to start.

3 months ago

2 replies

And of course, if the new solution completely invalidates old sites, it just won't get picked up. People prefer slightly broken but accessible to better designed but inaccessible.

motorest

3 months ago

> People prefer slightly broken but accessible to better designed but inaccessible.

It's not even broken as the edge cases are addressed by ad-hoc solutions.

OP is complaining about global infrastructure not having a pristine design. At best it's a complain over a desirable trait. It's hardly a reason to pull the Jr developer card and mindlessly advocate for throwing everything out and starting over.

friendzis

3 months ago

> People prefer slightly broken but accessible to better designed but inaccessible.

We live in world where whatever faang adopts is de facto a standard. Accessible these days means google/gmail/facebook/instagram/tiktok works. Everything else is usually forced to follow along.

People will adopt whatever gives them access to their daily dose of doomscrolling and then complain about rather crucial part of their lives like online banking not working.

> And of course, if the new solution completely invalidates old sites, it just won't get picked up.

Old sites don't matter, only high-traffic sites riddled with dark patterns matter. That's the reality, even if it is harsh.

SunlitCat

3 months ago

2007 you say and less capable you say?!

Try 90s! We had to fight off ActiveX Plugins left and right in the good olde Internet Explorer! Yarr! ;-)

Kim_Bruning

3 months ago

2 replies

Are you saying we should make a <Unix Equivalent Of A Browser?> A large set of really simple tools that each do one thing really really really pedantically well?

This might be what's needed to break out of the current local optimum.

3 months ago

1 reply

I haven't thought of it that way, but that might be a solution.

magackame

3 months ago

There was an attempt in that direction.

https://www.uzbl.org/

acka

3 months ago

Maybe it's time to revive something like the uzbl[1] project, or start something similar.

[1] https://www.uzbl.org/

sefrost

3 months ago

You are right from a technical point, I think, but in reality - how would one begin to make that change?

jadengeller

3 months ago

1 reply

I'd probably say we ought to use DNS.

asplake

3 months ago

1 reply

And while we’re at it, 1) mark domains as https-only, and 2) when root domains map to a subdomain (eg www).