What You Should Know About LLMs

What You Should Know About LLMs

So, let’s start with the steps that they have to go through for ChatGPT, for example, to give you an answer to a question. Again, like search engines, they have to first gather the data.

Then they need to save the data in a format that they’re able to access, and then they need to give you an answer at the end, which is kind of like ranking. If we start with gathering the data, this is the bit that’s closest to the search engines that we know and love. So they’re basically accessing web pages, crawling the internet, and if they haven’t visited a web page or gotten another source for a piece of information, they just don’t know that answer. They’re kind of at a disadvantage here because search engines have been doing this, have been recording this information for decades, whereas they’ve kind of only just started.

So they’ve got a lot of catching up to do. There are a lot of different corners of the internet that they haven’t really been able to visit. One of the things that they can do, a piece of information that they can gather that other search engines can’t access, is chat data. So when you are using the platforms, they are gathering data about what you’re putting in and how you’re interacting with it, and that feeds into their training model.

So that’s one thing for you to be aware of when you’re working with platforms like ChatGPT is that if you’re putting in private data in there, it’s not necessarily private after you’ve done that. So you might want to look at your settings or look at using the APIs because they tend to promise they don’t train on API data. If we move on to the second stage, saving that information, this is kind of what we refer to as indexing in search, and this is where things diverge a little bit, but there’s still quite a lot of parallels.

So in the early days of search engines, actually the index, the data that they had saved wasn’t updated live the way we’re used to it. It wasn’t as soon as something came out onto the internet we could kind of be sure that it would appear in a search engine somewhere. It was more that they would update once every few months because it was very expensive. It was costly in terms of time and money for them to do those index updates. We’re in a similar situation with large language models at the moment.

You may have noticed that every so often they say, “Okay, we’ve updated things.” The information that it’s got is now live up till April or something like that. That’s because when they want to put more information into the models, they actually have to retrain the whole thing. So again, it’s very costly for them to do. Both of those limitations kind of feed into the answers that you’re getting at the end.

I’m sure you’ve seen this. You might be working with ChatGPT, and it hasn’t happened to see the information that you’re asking about, or the information it does have is out of date.

Source link

post a comment