What Free Privacy-Focused AI Chatbots Don’t Use My Data for Training?

  • Thread starter Thread starter WWGD
  • Start date Start date
Click For Summary
Finding a free, privacy-focused AI chatbot that does not use user data for training is challenging, with concerns raised about the privacy policies of existing services like OpenAI and Bing's Copilot. Users are advised to consider running local language models (LLMs) to maintain control over their data, as many companies have faced scrutiny for data breaches and privacy violations. Specific cases, such as OpenAI's legal battles over user chat retention, highlight the risks associated with trusting public statements about data privacy. Additionally, the discussion emphasizes the importance of implementing robust input and output sanitization to prevent data leaks and prompt injections. Ultimately, users seeking privacy should carefully evaluate their options and consider local solutions for better data security.
WWGD
Science Advisor
Homework Helper
Messages
7,742
Reaction score
12,931
Hi, I am trying to develop an idea , and in that respect, it helps me to bounce it against an AI ( whose advice/input may be incorrect ) . But I've had trouble finding the privacy policies of Sider, Bing's Copilot and others. does anyone know of any AI, preferably free, which will not use my input towards LLMs or LRMs?
 
  • Like
Likes Greg Bernhardt
Computer science news on Phys.org
You can download various LLM's and run them locally. That's probably your best bet.
 
I wiuldn’t trust any of their posts or public statements on privacy concerns.

If they violate it how will you know?

There are a few cases that illustrate the privacy issue:

- Open AI was accused by the New York Times of plagiarizing their writing. Their lawyer got approval from the court to request all user chat from Open AI and further Open AI going forward should tetain all chats. Open AI said no and tried to fight it but lost. Finally it was amended to include chats from April to September 2025. This means anyone of us may be in that select subset of chats and there goes your privacy.

- data breaches from companies that say they protect your data and privacy but then use terrible methods to keep private.

- Marisa Mayer gave an interview when she was the executive in charge of search products at Google. She said Google follows the mantra Don’t be evil. She talked about how google tested their product with user data until the reporter asked is that user data and she replied yes. Oops so much for data privacy.
 
jedishrfu said:
I wiuldn’t trust any of their posts blic statements or privacy concerns.

If they violate it how will you know?
Back in the early early days of chatgpt it was possible to prompt it to spit out active AWS keys. Secrets are only secret if you don't share them. :)
 
QuarkyMeson said:
You can download various LLM's and run them locally. That's probably your best bet.
Thank you. How would I technically implement that?
 
QuarkyMeson said:
Back in the early early days of chatgpt it was possible to prompt it to spit out active AWS keys. Secrets are only secret if you don't share them. :)
Thanks. Has that been addressed? I understand there was an effort to this effect, of sanitizing both input and output, to restrict prompt injection. Just filter the input to restrict and disallow troublesome requests.
 
jedishrfu said:
I wiuldn’t trust any of their posts blic statements or privacy concerns.

If they violate it how will you know?

There are a few cases that illustrate the privacy issue:
- Open AI was accused by the New York Times of plagiarizing their writing. Their lawyer got approval from the court to request all user chat from Open AI and further Open AI going forward should tetain all chats. Open AI said no and tried to fight it but lost. Finally it was amended to include chats from April to September 2025. This means anyone of us may be in that select subset of chats and there goes your privacy.

- data breaches from companies that say they protect your data and privacy but then use terrible methods to keep private.

- Marisa Mayer gave an interview when she was the executive in charge of search products at Google. She said Google follows the mantra Don’t be evil. She talked about how google tested their product with user data until the reporter asked is that user data and she replied yes. Oops so much for data privacy.
There have been cases of search string injection. Other than sanitizing input and output, don't know if something has been or can be done to prevent it. The fact they don't filter SEO s doesn't seem to abide by their mantra (Or is it "womantra" now? ;) ) either. Edit: Kind of depressing how just about every aspect of searching, web searching, can be made subject to injection, though a good chunk of injection can be done away by designing things more carefully.
 
WWGD said:
Thank you. How would I technically implement that?
Here is a link to git repo I've used to play around with this: llama.cpp

This is basically the framework you'll need to run GGUF files locally, which are basically the models themselves. If you're familiar with docker getting up and running can be easy, if not there is a bit of a learning curve.

WWGD said:
Thanks. Has that been addressed? I understand there was an effort to this effect, of sanitizing both input and output, to restrict prompt injection. Just filter the input to restrict and disallow troublesome requests.

I still sometimes see secrets in generated code, I don't know if they still work. I think after bitcoin miners basically were using AWS keys regenerated from chatgpt to spin up a bunch of miners on someone's dime that Amazon did something on their end to prevent that, not sure what (this happened to the company I was working for at the time, fraudulent charges were 50k and 80k). I do know one change Amazon did: they basically crawl git and the web for any mention of active keys now and disable them. (Accidently posting secrets to a public repo is not the most uncommon occurrence in the world.)
 
From Lumo itself:
Lumo and ChatGPT are both conversational AI assistants, but they’re built with different priorities and ecosystems in mind. Below is a balanced look at the main areas where they differ, which can help you decide which one aligns better with your needs.

1. Core Philosophy

AspectLumoChatGPT
Privacy‑first designZero‑access encryption means Proton’s servers never see the raw content of your chats. Your data stays encrypted end‑to‑end.OpenAI retains conversation data (unless you opt out via the API settings) for model improvement and safety monitoring.
Integration with Proton servicesSeamlessly connects to Proton Mail, Drive, VPN, Pass, and other Proton products, letting you manage email, files, passwords, and secure browsing from the same interface.No native ties to Proton services; integrations rely on third‑party plugins or custom API work.
Model routingUses several specialized sub‑models that are automatically selected based on the task (e.g., summarisation, code assistance, creative writing). This can give sharper results for certain domains.Primarily a single large language model (GPT‑4/3.5) that handles all tasks, which works well overall but isn’t tuned per‑task.

2. Feature Set

FeatureLumoChatGPT
Multilingual support11 languages with full UI localisation.Over 30 languages, but quality varies more widely across less‑common languages.
Voice inputBuilt‑in voice entry on iOS (widgets) and Android.Voice is available only through the official app or third‑party wrappers.
File handlingDirect file uploads (PDF, CSV, code snippets) with in‑chat extraction, summarisation, and analysis.Can reference uploaded files in the web UI, but the depth of analysis is more limited.
Web search (Lumo Plus)Optional “Lumo Plus” tier adds live web search, unlimited usage, and extended features.ChatGPT Plus (or Enterprise) gives priority access and higher usage caps, but the free tier already includes web‑search‑augmented responses via the browsing tool (if enabled).
PricingFree – basic encrypted chat, limited usage.• Lumo Plus – $12.99 / month (or $9.99 / month annual) for unlimited usage + web search.• Included with Proton Visionary/Lifetime plans.Free tier – limited daily usage.• ChatGPT Plus – $20 / month for GPT‑4 access and faster response times.• Enterprise plans for higher volume and dedicated instances.

3. Performance & Capabilities

  • General conversation & creativity – Both models produce fluent, human‑like text. ChatGPT (especially GPT‑4) tends to have a slight edge in breadth of world knowledge because its training data extends later into 2023 and includes more diverse internet sources.
  • Domain‑specific tasks – Lumo’s task‑routing can yield sharper results for things like code debugging, data‑frame summarisation, or legal‑style drafting, because each sub‑model is fine‑tuned for that niche.
  • Safety & bias mitigation – Both systems employ extensive moderation layers. Lumo’s tighter integration with Proton’s privacy policies means fewer data‑driven personalization signals, which can reduce inadvertent bias stemming from user profiling.

4. Ecosystem & Extensibility

  • Proton ecosystem – If you already use Proton Mail, Drive, VPN, or Pass, Lumo becomes a natural hub. You can, for example, ask Lumo to draft an encrypted email, attach a PDF from Drive, and encrypt the whole thread—all without leaving the chat.
  • OpenAI ecosystem – ChatGPT benefits from a massive developer community, many third‑party plugins (e.g., Wolfram Alpha, Zapier), and easy API access for custom integrations.

5. When One Might Suit You Better

SituationPrefer Lumo if…Prefer ChatGPT if…
You prioritize end‑to‑end privacy✔️❌ (data retained unless opted out)
You’re deep in the Proton suite✔️❌
You need the absolute latest knowledge (post‑2023 events)May need Lumo Plus + web search (or enable web search)✔️ (with browsing enabled)
You want the most polished, broad‑knowledge model out‑of‑the‑boxGood, but ChatGPT‑4 may feel slightly richer✔️
You need specialized assistance (code, data tables, legal drafts) without extra prompting✔️ (task‑routing)❌ (requires more explicit prompts)
You want a large library of third‑party pluginsLimited to Proton‑centric extensions✔️ (many community plugins)

6. Bottom Line

Both assistants are strong, but they serve different philosophies:
  • Lumo shines when privacy, security, and tight integration with Proton services are top priorities. Its modular model approach can give you a crisp experience for specific tasks, and the free tier already offers solid encrypted chat.
  • ChatGPT excels when you need the widest knowledge base, extensive plugin ecosystem, and the most up‑to‑date information (especially with browsing enabled). It’s also a good fit if you’re building custom applications that rely on OpenAI’s API.

What would you like to explore next?

  • A deeper dive into a particular feature (e.g., how Lumo handles file analysis)?
  • Guidance on setting up Lumo Plus for web‑search‑enhanced answers?
  • A side‑by‑side demo of a typical workflow in each assistant?
Let me know which angle interests you!
 
  • #10
jack action said:
From Lumo itself:
Excellent, Thank you so much @jack action .
 
Last edited:
  • #11
Well heres a counterpoint in the news as of yesterday for OpenAI vs NY Times and other plaintiffs:

https://arstechnica.com/tech-policy...ver-20-million-private-chatgpt-conversations/

Open AI is ordered to hand over 20 million full chat sessions down from 120 million all of which were covered by OpenAI’s public promise of privacy.

So so much for Lumo in a future court confrontation over plagiarism. Unless they discard chats after a week or month.

—-

This reminds me of the Chinese Wall theory when reverse engineering a public API for accuracy to avoid plagiarism challenges.

One engineer writes API spec looking at the source code to understand how the API functions. They convey that to development engineers who have no access to the source code so they are free to develop a comparable API library in their own way.

In IBM, we did something similar with the XPG/4 internationalization API that was owned by AT&T which was available in source for Unix and used on IBM’s AIX based RISC/6000.

IBM wanted a royalty free version of XPG/4 for OS/2 and had to demonstrably show that their implementation was free of any AT&T code.

Programmers would sign a OCO declaration for the code and IBM would run code scans to make there was compliance and no shared code.

OCO meant “original code only.”

The hardest API was the printf() function where XPG/4 allowed a format to reorder variables when internationalizing the printf() text.

We might code printf(“Today’s date: %2d / %2d / %4d “, month, day, year);

But other locales would need day, month, year. So the developer would promote the format string to an editable message string stored in a local specific messages file.

C:
printf( DATEMSG, month, day , year);

In EN_US DATEMSG would be:

“Today’s date: %1$2d / %2$2d / %3$04d”

vs a German locale:

“ Heutiges Datum: %2$2d . %1$2d . %3$04d”

NOTE: the use of parameters %1 = month and %2 = day so the XPG/4 printf can properly reorder the arguments to fit the message.

One further note was XPG/4 actually provided a more flexible and efficient datetime-based api for this common case.

My example above is for illustration purposes only.
 
Last edited:
  • #12
jedishrfu said:
So so much for Lumo in a future court confrontation over plagiarism. Unless they discard chats after a week or month.
The Lumo's chat are encrypted, thus only the users have access to them. The only thing a court can force them to do is to share the encrypted texts and let the other party have fun decoding them.
 
  • #13
jack action said:
The Lumo's chat are encrypted, thus only the users have access to them. The only thing a court can force them to do is to share the encrypted texts and let the other party have fun decoding them.
WhatsApp was also end-to-end encrypted, but they were hacked.
 
  • #14
jack action said:
The Lumo's chat are encrypted, thus only the users have access to them. The only thing a court can force them to do is to share the encrypted texts and let the other party have fun decoding them.

I know they say that, but do you really believe it?

Encryption and decryption degrade performance, and, as happens in companies, someone may decide to skip a step. Also, claims of encryption might actually be valid only over the internet, not internally.

At one time, I had considered an app that worked on your data locally but kept an encrypted version in the cloud for easy access by all devices. The devices had the decryption keys, not the host.

I felt users would feel more secure knowing that. The trouble was that it hampered the design and the overall performance when so much work was done locally.

Your data has to be decrypted at some point to be processed. Or as in the Google case, keep customer data for future testing after they publicly stated that wasn’t done.

When things go south, the failure might be traced to a specific block of data. Engineers would study your data to understand what went wrong. They might use it to test their fix, and it would likely become part of their future regression test suite, preserved for all posterity.
 
  • #15
WWGD said:
WhatsApp was also end-to-end encrypted, but they were hacked.
A quick search returns nothing of the sort, do you have a source? I found individual accounts that were hacked with spyware, but not the encryption on the servers' data.

jedishrfu said:
At one time, I had considered an app that worked on your data locally but kept an encrypted version in the cloud for easy access by all devices. The devices had the decryption keys, not the host.
jedishrfu said:
Your data has to be decrypted at some point to be processed.
I know. I did such an app on a website. The problem was that I was doing the encryption/decryption locally with Javascript. It all relies on the adminstrator's good nature: One sneaky change in the Javascript code and they can download a copy of the decrypted messages without the user's knowledge. With an app coming from a "store" (Google Play, App Store, Microsoft Store, etc.), the problem is less important has any change/update to the Javascript files must be "signified" to the users. This "flaw" was already discussed about Protonmail (and everyone else), where its mobile version app is considered "safer" than its website version.

But with a browser anything goes. They should be some sort of key management and encryption/decryption processes handled by the browser but not accessible via Javascript. Something similar to email encryption.
 
  • #16
Yes, but end-to-end doesn't mean host.

Some companies will hide behind and only after a data breach will they sheepishly admit to bad IT practices.

Sadly, every company has a Dennis Nedry but very have any Dilophosaurus pets.
 

Similar threads

Replies
10
Views
4K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
3
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 9 ·
Replies
9
Views
5K
  • · Replies 1 ·
Replies
1
Views
3K
  • Sticky
  • · Replies 2 ·
Replies
2
Views
502K