back to all articles

How to make your data sources AI-ready: Step-by-step

Generative AI has revolutionized chatbot training. What once took hours is now completed in minutes. BUT, (there's always a but), the effectiveness of a Generative AI-trained chatbot heavily depends on the quality of its data sources. So, what constitutes a "good" data source for a GenAI chatbot, and what measures can be taken to prepare? Let's find out.

From manual grind to AI that shine

Have you ever trained an NLP chatbot from scratch? If so, you're familiar with the grind 🥱.

It's a looong, time-consuming process where you have to train the chatbot on each scenario individually, crafting numerous training phrases for every query, only to achieve a performance level that is OK at its best.

Enter generative AI 🌟

This game-changing technology has not only made chatbots more intelligent and conversational but has also introduced a streamlined approach to chatbot training. By being able to directly connect it to one or more data sources, it's like downloading a universe of knowledge directly into the chatbot's "brain".

However, this new method of training chatbots depends heavily on the quality of the chosen data sources. Whether you opt for a chatbot that leverages GenAI through integration with GPT-4 or choose a chatbot provider with an in-house genAI model, the chatbot's responses will only be as intelligent and accurate as the information it's trained upon.

Fortunately, we've built many chatbots using GenAI, and we know preparing your data sources is key.

So, if your business is just at the threshold of acquiring a GenAI chatbot, following these steps will significantly enhance your readiness to deploy a chatbot that will amaze your customers.

Disclaimer: The tips shared in this blog post apply specifically to how our in-house generative AI model EbbotGPT handles data sources in the Ebbot platform. While a lot of the advice is pretty universal, some of it might not line up exactly with how other AI models handle data sources.

Step 1: Inventory your assets. 🧠

The first step is to identify your potential data sources.

Your website, help center, FAQ section, knowledge base, or other documents could be gold mines of information for training a chatbot.

A wise approach is to consider the resources your human service agents use when helping customers. Compile a list and prioritize these sources by importance.

Your goal should be to compile various data sources that mirror the breadth and complexity of queries your chatbot will encounter. It's not just about the amount of data; the relevance and quality of the information are key.

Keep in mind 1 👇

When two sources offer conflicting information on the same topic, the chatbot won’t be able to tell which one is correct. It's essential to organize and clean up your sources.

For example: If one page says your store closes at 6 PM and another says 8 PM, the chatbot might get confused and give out the wrong closing time. A quick review and update of your information can prevent this mix-up.

Keep in mind 2 👇

The more context, the better. This means that the richer the details you provide, the more likely it is that the chatbot will pinpoint and use the right info to answer questions.

The second example in the image lays out more context for the chatbot, increasing the likelihood that it will find and utilize the correct information when forming its answers.

Step 2: Identify knowledge gaps 🔎

With your list of potential data sources in hand, the next step is to ensure they cover all the bases. This means verifying that the content is not only accurate but comprehensive.

Tips! Transform this step into a team activity. Using your data sources, attempt to answer the top 10 most frequent customer questions and pinpoint any missing information.

Step 3: Exclude the excess 🚫

Choosing the right sources is just as important as deciding which information the chatbot should NOT learn from.

Similar to identifying knowledge gaps, comb through your materials to filter out content that could hinder the chatbot's efficiency. This can be excluding certain web pages, help articles, or complex technical content not suited for your target audience.

Good to know 💡

Keep in mind that the chatbot pulls information from every page it can access, not just the ones that are actively linked or currently in use. It cannot "know" if there are pages that you have stopped using but are still published. So, you'll need to go through ALL published pages to make sure the chatbot has the right info to work with.

Step 4: Pick the right formats 📄

This step is more of a "good to know" information, and that is that GenAI models handle some formats better than others.

Formats like PDFs and tables can be challenging for AI to interpret correctly, while plain text, CSV, or JSON formats are more AI-friendly.

When using a website as a data source, the structure of the site also matters. AI models can more easily locate the right information if it's organized clearly.

Step 5: Regularly update your data sources 🔁

It's a common oversight to think the job is done once the chatbot is up and running. But, like a human service agent, your chatbot's "brain" (i.e., your data sources) need regular reviews and updates.

Ready, set, prep!

Generative AI has indeed made chatbot training simpler.

But, and here's the final "but," as groundbreaking as the technology is, it doesn't cut out the need for manual effort, especially in the early stages of your chatbot project.

Much like priming a surface before painting, reviewing your data sources before embarking on a GenAI chatbot project will ensure you're well-prepared, ultimately leading to a far superior outcome.

About Ebbot 🤖

Ebbot is a platform for service automation at scale with generative AI — fast, easy and secure.

In the Ebbot platform, you can easily create chatbots that use generative AI to generate answers based on the client's data, powered by our in-house LLM EbbotGPT, which is fine-tuned for service. Integrate the chatbot with the clients other systems to fully automate complex tasks without human involvement. When required human agents can step in to live-chat and collaborate seamlessly with the chatbot for an excellent customer experience that leaves no customer unserved.

Today Ebbot is used by 200+ companies to create excellent service experiences, both externally to serve customers and internally to support employees.

Matilda Elfman
April 11, 2024