Best Practices for Training Your AI

This page highlights the best approaches for training your AI.

Section A: Files(PDF, TXT, DOCX)

This section explains how to format and structure your content in PDF, Word (DOCX), and Text (TXT) formats so it can be properly trained into your AI bot for high-quality responses.

🧱 Structuring Your Content

1. One Topic per Paragraph

  • Stick to one idea per paragraph

  • Keep it under 800 characters (including spaces)

  • Always use complete sentences and correct punctuation

✅ Good:

“The pricing model includes three plans: Basic ($29), Pro ($99), and Enterprise ($199). Each offers different features depending on user needs.”

❌ Bad:

“Pricing has options. Go to website.”

2. ✂️ Splitting Long Topics

If a topic is too long, break it into two or more logical parts using the same heading or theme.

✅ Good:

Main Heading (Part1): First part

(around 400 characters) with one context

Main Heading (Part2): Second part (around 400 characters) with a different but related context

❌ Bad:

Main Heading: Very long explanation covering multiple ideas in one block without splitting or clarity.

📄 Format-Specific Guidelines

✅ PDF, DOCX, TXT – Formatting Rules

  • Use clear section headings to define topics

  • Avoid decorative styles (fonts/colors don’t help the AI)

  • Use simple formatting: paragraphs, bullet lists, numbered points

❌ Don’t Include:

  • Images

  • Tables with merged cells

  • Decorative graphics or background colors

  • Page numbers or headers/footers that repeat unnecessarily


📏 Writing Style Guidelines

Rule
Description
Example

Be Specific

Mention exact processes, values, names

“We offer 24/7 live chat support.”

Use Examples

Add clear examples wherever possible

“For example, the ‘Basic’ plan includes...”

Write Directly

Use short, clear sentences

“To reset your password, click ‘Forgot Password’.”

Make Things Clear

Never write “etc.” or “many things”

“Supports up to 100 users.” not “Supports many users.”


❌ What to Avoid

Mistake
Why It's a Problem

Mixing topics in one paragraph

Reduces precision during training

Listing unrelated items together

Affects context accuracy

Including images or charts

AI doesn’t process them

Embedding FAQs in the main file

FAQs must be added separately for best training


🧠 What to Do When You're Not Sure About Format

When unsure about structure, follow this fallback rule:

“One paragraph = One point = Under 800 characters”

If you have longer paragraphs, the system will break into parts them. Just make sure they’re well-written.


🪜 How to Make Answers Better

Keep in mind that each paragraph becomes a reference block. It’s better to have 30 well-written, focused paragraphs than 5 long paragraphs.

✅ Clear and focused content → Fast and accurate answers

❌ Bulky, unstructured content → Confused or generic bot responses


Section B: FAQ's Training

This section explains how to prepare FAQs in a structured and clear format for training AI bots or building knowledge bases. It ensures your questions and answers are professional, consistent, and easy for the system to understand.

🧱 Structure & Format Guidelines:

  • Question Format:

    • Maximum length: 200 characters

    • Always write the full question clearly (not just a keyword or phrase)

    • Avoid starting with lowercase or always write the full form first

  • Answer Format:

    • Maximum length: 500 characters

    • Use complete, grammatically correct sentences

    • Keep it direct, factual, and useful


📊 Quick Summary Table:

Aspect
Guideline

Max Question Length

200 characters

Max Answer Length

500 characters

Writing Tone

Clear, professional, factual

Abbreviation Use

Full form first, abbreviation in parentheses

File Upload Limit

Max 200 FAQs per category (split if needed)


❌ What to Avoid:

  • Using only abbreviations (e.g., "CEO") without explanation

  • Mixing more than one question or topic in a single entry

  • Using vague or generic questions

  • Submitting over 200 FAQs in one CSV without splitting

  • Including emojis, incomplete sentences, or slang


✅ Examples of Good and Bad Practices:

  • Who is ceo of sml isuzu?

  • Who is the Chief Executive officer of sml isuzu?

  • Who is the Chief Executive Officer (CEO) of SML Isuzu?


📔 Two Ways to Train Using FAQ:

1. Type Manually in Dashboard

You can directly enter questions and answers into the dashboard interface.

2. Upload via CSV File

Use a .csv file in the following format to upload multiple FAQs at once:

The format of the CSV file should be as such:

Question

Answer

What is BotPenguin?

BotPenguin is a....

What is the use of BotPenguin?

BotPenguin can be used for....


⚠️ Limitation on Number of FAQs per Category:

  • Upload a maximum of 200 FAQs per category per file.

  • For more than 200 FAQs, split them into multiple CSV files or categories.

This ensures faster processing and avoids system errors.

Here’s a suggested FAQ section to add at the end of your document:


Section C: Website Training

🧱 Structure & Format Guidelines:

  1. Use Webpages With Good Content

    Pages should have informative, structured, and complete content. Avoid empty, template-only, or irrelevant pages.

  2. Avoid Duplicate or Repeated Pages

    Each page must have unique value. No duplicates or near-duplicates.

  3. robots.txt Must Allow Scraping

    Confirm that the URL is not blocked by robots.txt. Scraping is only allowed if permitted.

  4. Add sitemap.xml in robots.txt

    Make sure your robots.txt includes a sitemap.xml link. This helps bots find your important pages faster.

  5. Skip Unnecessary Categories

    Avoid submitting categories that are not helpful to the chatbot’s goal.

    🔧 Tip: During the training setup, you can exclude irrelevant categories manually.


📊 Quick Summary Table:

Checklist Item
Guideline Description

Content quality

Use clear and structured web pages

Duplicate pages

Avoid duplicates or highly similar content

robots.txt permission

Must be allowed in robots.txt

sitemap.xml reference

Should be included in robots.txt

Irrelevant categories

Do not include; exclude during selection or upload


❌ What to Avoid:

  • Pages with little or poor-quality content

  • Pages blocked in robots.txt

  • Duplicate pages with only minor differences

  • Pages unrelated to the bot's function or scope

    Don’t include extra pages or categories that are not useful for the chatbot.

Example: Exclude categories like “Chatbot Templates”, “Chatbot Features” , “Platform Features” etc., as they are not relevant to the current use case.


Section D: CSV and Google Sheets

🧱 Structure & Format Guidelines

✅ General Rules

  • Use Flat Tables: Each row should be one complete record.

  • No Merged Cells: Avoid merged headers or cells.

  • Label Every Cell: No empty cells where values should be explicitly stated.

  • Avoid Contradictions: All column data must logically align.

  • Be Consistent: Keep column types and formats uniform.


📋 File-Specific Formatting

CSV Files (Flat Table Format)

User ID,Name,Email,City,Registration Date
101,Alice Smith,[email protected],New York,2024-01-15
102,Bob Jones,[email protected],Los Angeles,2024-03-22
  • Comma-separated, no nested headers

  • Each column = one field/property

  • Avoid trailing commas or inconsistent row lengths


📊 XLSX & Google Sheets

Ideal Format

User ID
Name
Email
City
Registration Date

101

Alice Smith

New York

2024-01-15

102

Bob Jones

Los Angeles

2024-03-22

Bad Format (Merged/Grouped Headers)

A
B
C

Category

Subcategory

Item

Electronics

Phones

iPhone 13

Samsung S21

Corrected Format

Category
Subcategory
Item

Electronics

Phones

iPhone 13

Electronics

Phones

Samsung S21


✅ Good Table Format (Simple and Clean)

Use a clear, single-record-per-row format to keep data clean and ready for training:

Model
Price
Engine Displacement

Model A

99999

100cc

Model B

88999

125cc


❌ Bad Format Example – Confusing Data in Same Row

Avoid mixing multiple products or entities in one row:

Model
Price
Product
Engine
Mileage

Model A

99999

Model B

125cc

50

Model B

88999

Model C

150cc

40


✅ Fixed Format – One Row per Product

Every product now has its own row, with all corresponding data:

Model
Price
Engine
Mileage

Model A

999999

125cc

50

Model B

889999

150cc

40

Model C

788899

100cc

60


✍️ Writing Style Guidelines for Tables

  • Use clear column headers (e.g., “User ID” not “UID”)

  • Fill all values—no blanks in categories/subcategories

  • Repeat parent values for child entries instead of leaving cells empty

  • Use consistent date formats (e.g., YYYY-MM-DD)

  • No contradictory data (e.g., conflicting models/prices in same row)

  • Avoid multiple entities per row


⚡ Quick Summary Table

Use Case
What to Do
Example/Table

User Database

Use flat structure, one user per row

✅ CSV/XLSX example

Product Categories

Repeat parent-child structure explicitly in each row

✅ Category/Subcategory table

Model Specs

Avoid mixing models in a single row

✅ Model/Price table

Bad Formats

Never use merged cells or leave categories blank

❌ Bad nested row table


FAQ's:

Here are some common FAQ's asked when we train our AI:

What if I have lots of short sections (e.g., 200 characters)?

No problem. If your file contains many short pieces (around 200 characters), the system can handle them.

However: Short content may not have enough context for the best results. We use internal techniques to improve accuracy, but make sure every short paragraph is:

  • Clear

  • Complete

  • Related to one topic only

What if my page is allowed in robots.txt but has no useful content?

Don’t include it. Only submit pages that add value to training.

Is it okay to submit the same URL twice?

No. Repeating the same URL may lead to redundancy and errors.

Can I include formulas in cells?

You can, but it’s recommended to flatten formula outputs before upload for consistent behavior.

What happens if there are contradictions?

The bot might give inconsistent answers or ignore data entirely. Always keep one logical entity per row.

Can I skip repeating the same category/subcategory values?

No. Always fill each row completely—even if data repeats. Empty cells cause confusion.

Can I use merged headers in Excel?

No. All headers must be single-row and flat. Merged headers break parsing logic.

Can I use emojis in my answers?

No. Avoid emojis, slang, or overly casual tone. Keep the language clear and professional.

Is there a limit to how many FAQs I can upload at once?

Yes. You can upload up to 200 FAQs per category in one CSV file. For more, split them into multiple files.

Can I use abbreviations like CEO or FAQ in the question?

If you need to include abbreviations, always provide their full form first, followed by the abbreviation in parentheses. For example: Chief Executive Officer (CEO).

Do I need technical knowledge to follow these steps?

Not necessarily. Just follow this guide, and ask your tech team to help check robots.txt and sitemaps if needed.


If everything is correctly formatted but you're still having trouble, reach out to our support team: [email protected]

Last updated

Was this helpful?