Announcement

**Chris In Milwaukee** · 09-25-2025, 12:46 PM

A recent study by OpenAI and Georgia Tech study suggests that large language models hallucinate because when they’re being created, they’re incentivized to guess rather than admit they simply don’t know the answer. So until that anomaly is resolved, I also do not believe it to be reliable for our use case.

OpenAI Realizes It Made a Terrible Mistake

https://futurism.com/openai-mistake-hallucinations

OpenAI claims to have figured out what's driving "hallucinations," or AI models' strong tendency to make up answers that are incorrect.

https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf

Now, for general knowledge questions like, "how much does aircraft fuel weigh per gallon", or "what are the torque specs on an AN4 bolt", where there is a solid known-quantity dataset, it can provide much value in my opinion (with verified answers).

~Chris

**svyolo** · 09-25-2025, 02:20 PM

I have had good luck asking extremely esoteric technical questions. I always preface the question with "don't guess, respond only with confirmed data".

**Battson** · 10-02-2025, 07:11 PM

Originally posted by Chris In Milwaukee View Post

A recent study by OpenAI and Georgia Tech study suggests that large language models hallucinate because when they’re being created, they’re incentivized to guess rather than admit they simply don’t know the answer. So until that anomaly is resolved, I also do not believe it to be reliable for our use case.

OpenAI Realizes It Made a Terrible Mistake

https://futurism.com/openai-mistake-hallucinations

OpenAI claims to have figured out what's driving "hallucinations," or AI models' strong tendency to make up answers that are incorrect.

https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf

Now, for general knowledge questions like, "how much does aircraft fuel weigh per gallon", or "what are the torque specs on an AN4 bolt", where there is a solid known-quantity dataset, it can provide much value in my opinion (with verified answers).

~Chris

Even today - you can already train and configure this stuff out, right. In future, forget about such basic problems.

We use these tools at work for compliance based stuff, and we have some leading AI engineers involved. You can program the models to admit they are wrong or accept they don't know, instead of guessing. It's important to separate the "great unwashed" models like GPT-5 which are aimed at general populace and general question / answer work - where close enough is good enough - from specific AI tools trained on specific subjects, which are configured specifically to meet those needs.

Even today, if the AI tool could direct you to the relevant reference documents, it's already saving hours and hours of time in a given week of full-time building.

We're in the days of emergence, for AI. In future, these teething issues will quickly be resolved.

There will be those who adapt and adopt, then there will be those who are left behind.

**jaredyates** · 10-02-2025, 07:34 PM

I hope you're right, but I could kind of see it going either way. There was a time when Google searches were quite useful, and then people got greedy and spent a lot of effort on search engine optimization so that they could sell more, and basically killed most of the good function of the Google search. It was an escalating cat and mouse game. Just try to look up a recipe with Google if you want to see what I mean. You'll find all kinds of garbage and maybe a recipe hidden somewhere in the midst of it all. People have a great pattern of making technological tools less usable.

I was using ChatGPT the other day to research some dishwasher repair parts and told it repeatedly that I only wanted information about that one specific model in lots of different ways, and even told it that I would probably die if it gave me wrong information. But yet it continued to give me wrong information based on dishwashers generally and not that specific dishwasher. So next I asked it to help me write my obituary about how I died because terrible AI had led me astray. It was great for that.

The problem that I haven't been able to resolve is that I don't feel like we're big enough of a community to be able to create the kinds of resources that you're encountering at work. At least not without a financial benefactor or a generous effort-donor. And attempting to deploy the consumer-facing resources that are available, at least the ones that I know of, hasn't been able to yield the quality result that I think we need for this type of mission.

I use LLMs daily for all kinds of things, and I am always seeing a need to better find Bearhawk information. It's not that I'm not adopting, but rather that this is a very consequential use case, and we need a high quality product.

In my experience, LLM use still regularly requires higher brain function from the human user to be able to maintain dominance as the thought leader, using the computer as the thought partner. It's still very easy for folks to get that part reversed.

**gregc** · 2025-10-03T03:29:14

Originally posted by Battson View Post

We're in the days of emergence, for AI. In future, these teething issues will quickly be resolved.
There will be those who adapt and adopt, then there will be those who are left behind.

That's not really how technology matures. Here is one generally accepted model - there are others. Many technologies never make it to actual productivity.

ai-hype-2025.jpg

**gregc** · 2025-10-03T03:54:53

Originally posted by jaredyates View Post

The problem that I haven't been able to resolve is that I don't feel like we're big enough of a community to be able to create the kinds of resources that you're encountering at work. At least not without a financial benefactor or a generous effort-donor.

Money isn't the primary requirement, it's data. And it simply doesn't exist in sufficient quantity to train a LLM on home built aircraft construction. A reasonable estimate for the data required to train a large language model specifically for experimental aviation (a much broader topic) would likely fall in the range of hundreds of gigabytes to several terabytes of high-quality, domain-specific text data.

An average technical document is around 1 MB (500-page PDF with text). If we gather 100,000 such documents, that's 100 GB. For Bearhawk construction, I know of 3 or 4 documents that might qualify. If anyone can point me to the other 99,997 sources, I'll get started. ;-)

**jaredyates** · 2025-10-03T04:44:12

Greg, is there a path where the training data isn't exclusively homebuilding content, but rather, is it possible to deploy a larger, more broadly-trained LLM that can actually follow instructions, telling us when it deviates from the predefined sources?

Perhaps one ideal final destination is a model that can take all of the text that we know about Bearhawks and then synthesize new instructions and ideas. Meanwhile, I'd be happy with a chatbot that can search a specified dataset, like this forum (plus, say VAF), the Beartracks archive, 43.13, manufacturer documentation (lights, magnetos, props, engines, avionics) and that sort of thing. Imagine if we could have a list of sources under the search bar, with check boxes that allowed selection of each source, plus citations in the answers about where the sources are coming from. If we wanted to search about how to rivet, that could rightfully draw from a much broader data source than something very Bearhawk-specific, like if we want to know how to connect up our elevator trim cables. We could select to include results from the Dynon documentation or the Garmin documentation. The human searcher could specify how broadly they want to seek the answers based on what the human knows about the question's context. Failing that, at least the human could be made aware when the source might not be applicable, since the answers could be cited and contextualized. For example, maybe if we were searching about fuel flow and knew we were getting answers from VAF, at least we would know that their low-wing planes might have totally different considerations and we should keep that in mind when knowing the answers were from VAF.

When I started trying to tinker with this, the path forward seemed to be a RAG and/or custom vector database that would have to be created based on the training data. Basically, taking that which is presented in HTML, and turning it into a source that the LLM could more readily work with. My first stumbling block was automating that process, especially for forum content. Some sort of robot would need to periodically revisit the data source to collect the fresh material. Maybe daily would be acceptable? In any case, I knew I wasn't going to be doing it manually. I do know that ChatGPT has periodically scraped our forum on its own, because it gives me answers and links that point back here. I'm not sure how often that happens or how fresh the training data is in that case. But for a purpose-built tool I think the expectation would be high for recent data.

The second problem I found was that even if we made a custom database, using the larger model to access it was going to require some type of paid API access. The cost of the usage would vary depending on how much it was used. I don't have the computer nerd chops to ensure that this usage is only truly human Bearhawk people. Between other robots, builders of other types, and perhaps even human content creators, I didn't want to make something so useful that it attracted a lot of non-Bearhawk users in a pay-per-use situation. This might be on the order of pennies per query, which doesn't sound like much at first, but thousands of pennies are hundreds of dollars.

I don't want to say that either of these two hurdles were insurmountable, rather just that even if I did invest all of the effort to overcome them and it became a super-awesome Bearhawk tool, how big is the user community in the end? This is what I mean about our community being too small to support a custom-made solution. If you are Target and you want to make a chatbot for your millions of users, or even if you are Vans or Doug and want to make one for your tens of thousands of users, it's a different scope than our dozens or at most, half-thousands.

I'd love to have more discussion about what a really-useful AI based information management and sourcing tool could look like for Bearhawk people. I can't promise that it would ever end up being deployed, but if we have a few folks with more know-how, it is possible that we could make something better than what we currently have.

There are third-party options too, like DocBot, but I'm wary of counting on more third parties, especially in such a dynamic landscape. It's bad enough having to count on vbulletin, wordpress, and the others that we already use.

Announcement

GPT Builder Assistant

Comment

Comment

Comment

Comment

Comment

Comment

Comment