Show HN: Veritas – Detecting Hidden Bias in Everyday Writing
Effectively, if you're going to attempt to detect bias, you have to handle the Paradox of Tolerance. Otherwise, for instance, efforts to detect intolerance will be accused of being biased against intolerance, and people who wish to remain intolerant will push you to "fix" it.
Another test case: test to ensure your detector does not detect factual information on evolution or climate change as being "biased" because there's a "side" that denies their existence. Not all "sides" are valid.
How we’re tackling it:
Model Training: We train Veritas on examples that draw a hard line between factual but unpopular truths and genuinely biased framing. For issues like climate change or evolution, the model is designed to recognize them as evidence-based consensus, not “opinions with two sides.” We also run expert reviews on edge cases so it doesn’t mistake denialism for a valid counterpoint.
User Education: Every analysis Veritas produces comes with context — not just a yes/no label. It explains why something is or isn’t bias, referencing categories like gendered language, academic elitism, or cultural assumptions. We’re also preparing orientation guides for testers, so they know up front this is an academic tool, not a political scorekeeper.
The Paradox of Tolerance is real, and our stance is this: Veritas doesn’t silence perspectives, but it will highlight when language is exclusionary, misrepresentative, or factually distorted.
Two things I’d love your input on:
What’s the most effective way to show users that “unpopular facts ≠ bias” — would examples, quick demos, or documentation be strongest?
Do you think it’s helpful for us to explicitly tag certain topics as “consensus facts,” or is it better to just let the model’s handling speak for itself?
I think you'd need several of those.
You may want to have a general introduction to the basic idea of "if you're trying to model the world, your model should match the world in a fashion that has predictive value". Giving a short version of Carl Sagan's "The Dragon in My Garage" might help, for instance, as an example for showing how people might attempt to make their view unfalsifiable rather than just recognize that it's false.
If you want to get people passionately interested in your tool, you could take a tactic of "help your users learn to convince people of correct things", in addition to helping them learn for themselves. The advantage of that would be that many people care more about convincing others, and are more self-aware about the need for that, than they are self-aware about needing to be correct themselves. The disadvantage would be that you might not want that framing.
For some people, it might help to have a more advanced version that cites things like Newtonian mechanics (imperfect but largely accurate within its domain for practical everyday purposes) and relativity (more accurate but unnecessary for everyday purposes, but needed for e.g. GPS). But unfortunately, those kinds of examples don't have comparable impacts or resonance with everyone.
I'd suggest giving examples, but choosing those examples from things where 1) there's an obvious objectively correct answer and 2) anyone who reacts to that example with anger rather than learning is very obviously outside your target audience. That is, for instance, why I cited evolution as an example. I don't know what process would reliably help a young-earth creationist understand that their model does not match reality and will not help them understand or operate in the world, but it probably isn't your tool. And there are fewer people who will react to such an example with anger, which is important because that anger gets in the way of processing and understanding reality.
Perhaps, when someone has seen a bunch of examples they agree with first, they might be more capable of hearing an example that's further outside their comfort zone.
> Do you think it’s helpful for us to explicitly tag certain topics as “consensus facts,” or is it better to just let the model’s handling speak for itself?
No matter what you do, you're going to make people angry who are not interested in truth or in having their BS called out. When someone has a vested interest in believing, or convincing others, of something that's at odds with the world, ultimately the very concepts of correctness and epistemology become their enemy, because they cannot be correct by any means other than invalidating the concept of "correct" and trying to operate in a world in which words are just vibes that produce vibes in other people.
Whatever you do, if you do a good job, you're going to end up frustrating such people. Hopefully you frustrate such people very effectively. In an ideal world, there'd be a path to convincing people of the merit of choosing to be correct rather than incorrect. If you can find a way to do that, please do, seriously, but it would be understandable if you cannot. Frankly, if you substantially moved the needle on that problem you'd deserve Nobel Prizes.
Trying to be fair to AI here: one of the ways AI might be able to help is that it's time-consuming to systematically invalidate bad arguments (correct arguments and deconstructing why other arguments are invalid are harder than vibing and gish gallops), and it's also time-consuming to provide the level of detail and nuance needed to be accurate and correct. (e.g. "vaccines have been proven to work" is short but imprecise, "vaccines substantially reduce viral load, reduce the severity of infection, decrease the likelihood of spread and the viral load passed on to others, and with sufficient efficacy and near-universal immunization they can decrease spread enough to lead to eradication" is precise and doesn't fit in a tweet.) If your AI is capable of going "this is incorrect, here is a detailed explanation of why it's incorrect", and only doing that when something is actually incorrect rather than helping people convince anyone of anything, that might help.
With that in mind: you'd want to make sure your training data has some clear examples of correct things that people nonetheless try to argue against, and the types of ways people fight against them, and invalidation of the ways those arguments often progress. And for things that are more subjective, you'd want clear identification of perspectives. But also, you don't want to overfit the AI to the data; it needs to learn to identify bad arguments and cluster perspectives for things it hasn't seen.
Happy to talk about this further, including the branching tree of directions this could take; please feel free to reach out by email.
All of my contact info is on my profile.