# Best Practices for Training Your AI

## Interactive Guide

{% embed url="<https://app.arcade.software/share/6ir8iJcPaoA6Qp2iu2tm>" %}

## Section A: Files(PDF, TXT, DOCX)

This section explains how to format and structure your content in PDF, Word (DOCX), and Text (TXT) formats so it can be properly trained into your AI bot for high-quality responses.

#### 🧱 Structuring Your Content

**1. One Topic per Paragraph**

* Stick to one idea per paragraph
* Keep it **under 800 characters** (including spaces)
* Always use complete sentences and correct punctuation

> ✅ Good:
>
> “The pricing model includes three plans: Basic ($29), Pro ($99), and Enterprise ($199). Each offers different features depending on user needs.”

> ❌ Bad:
>
> “Pricing has options. Go to website.”

**2. ✂️ Splitting Long Topics**

If a topic is too long, break it into two or more logical parts using the same heading or theme.

> ✅ Good:
>
> **Main Heading (Part1):** First part
>
> (around 400 characters) with one context
>
> **Main Heading (Part2):** Second part (around 400 characters) with a different but related context

> ❌ Bad:
>
> **Main Heading:** Very long explanation covering multiple ideas in one block without splitting or clarity.

#### 📄 Format-Specific Guidelines

**✅ PDF, DOCX, TXT – Formatting Rules**

* Use **clear section headings** to define topics
* Avoid decorative styles (fonts/colors don’t help the AI)
* Use simple formatting: paragraphs, bullet lists, numbered points

**❌ Don’t Include:**

* Images
* Tables with merged cells
* Decorative graphics or background colors
* Page numbers or headers/footers that repeat unnecessarily

***

#### 📏 Writing Style Guidelines

| Rule              | Description                            | Example                                                |
| ----------------- | -------------------------------------- | ------------------------------------------------------ |
| Be Specific       | Mention exact processes, values, names | “We offer 24/7 live chat support.”                     |
| Use Examples      | Add clear examples wherever possible   | “For example, the ‘Basic’ plan includes...”            |
| Write Directly    | Use short, clear sentences             | “To reset your password, click ‘Forgot Password’.”     |
| Make Things Clear | Never write “etc.” or “many things”    | “Supports up to 100 users.” not “Supports many users.” |

***

#### ❌ What to Avoid

| Mistake                          | Why It's a Problem                              |
| -------------------------------- | ----------------------------------------------- |
| Mixing topics in one paragraph   | Reduces precision during training               |
| Listing unrelated items together | Affects context accuracy                        |
| Including images or charts       | AI doesn’t process them                         |
| Embedding FAQs in the main file  | FAQs must be added separately for best training |

***

#### 🧠 What to Do When You're Not Sure About Format

When unsure about structure, follow this fallback rule:

> “One paragraph = One point = Under 800 characters”

If you have longer paragraphs, the system will break into parts them. Just make sure they’re well-written.

***

#### 🪜 How to Make Answers Better

Keep in mind that each paragraph becomes a reference block. It’s better to have **30 well-written, focused paragraphs** than 5 long paragraphs.

> ✅ Clear and focused content → Fast and accurate answers
>
> ❌ Bulky, unstructured content → Confused or generic bot responses

***

## Section B: FAQ's Training

This section explains how to prepare FAQs in a structured and clear format for training AI bots or building knowledge bases. It ensures your questions and answers are professional, consistent, and easy for the system to understand.

#### 🧱 Structure & Format Guidelines:

* **Question Format:**
  * Maximum length: 200 characters
  * Always write the full question clearly (not just a keyword or phrase)
  * Avoid starting with lowercase or always write the full form first
* **Answer Format:**
  * Maximum length: 500 characters
  * Use complete, grammatically correct sentences
  * Keep it direct, factual, and useful

***

#### 📊 Quick Summary Table:

| Aspect              | Guideline                                    |
| ------------------- | -------------------------------------------- |
| Max Question Length | 200 characters                               |
| Max Answer Length   | 500 characters                               |
| Writing Tone        | Clear, professional, factual                 |
| Abbreviation Use    | Full form first, abbreviation in parentheses |
| File Upload Limit   | Max 200 FAQs per category (split if needed)  |

***

#### ❌ What to Avoid:

* Using only abbreviations (e.g., "CEO") without explanation
* Mixing more than one question or topic in a single entry
* Using vague or generic questions
* Submitting over 200 FAQs in one CSV without splitting
* Including emojis, incomplete sentences, or slang

***

#### ✅ Examples of Good and Bad Practices:

* ❌ *Who is ceo of sml isuzu?*
* ❌ *Who is the Chief Executive officer of sml isuzu?*
* ✅ *Who is the Chief Executive Officer (CEO) of SML Isuzu?*

***

#### 📔 Two Ways to Train Using FAQ:

#### 1. Type Manually in Dashboard

You can directly enter questions and answers into the dashboard interface.

<figure><img src="/files/1Wnixve0YwI0qAdb6jhU" alt=""><figcaption></figcaption></figure>

#### 2. Upload via CSV File

Use a `.csv` file in the following format to upload multiple FAQs at once:

<figure><img src="/files/SKKZ2tr3GJp1kvwDfh2X" alt=""><figcaption></figcaption></figure>

The format of the CSV file should be as such:

|                                |                                |
| ------------------------------ | ------------------------------ |
| Question                       | Answer                         |
| What is BotPenguin?            | BotPenguin is a....            |
| What is the use of BotPenguin? | BotPenguin can be used for.... |

***

#### ⚠️ Limitation on Number of FAQs per Category:

* Upload a **maximum of 200 FAQs per category** per file.
* For more than 200 FAQs, **split them into multiple CSV files** or categories.

This ensures faster processing and avoids system errors.

Here’s a suggested FAQ section to add at the end of your document:

***

## Section C: Website Training

#### 🧱 Structure & Format Guidelines:

1. **Use Webpages With Good Content**

   Pages should have informative, structured, and complete content. Avoid empty, template-only, or irrelevant pages.
2. **Avoid Duplicate or Repeated Pages**

   Each page must have unique value. No duplicates or near-duplicates.
3. **robots.txt Must Allow Scraping**

   Confirm that the URL is not blocked by `robots.txt`. Scraping is only allowed if permitted.
4. **Add sitemap.xml in robots.txt**

   Make sure your `robots.txt` includes a `sitemap.xml` link. This helps bots find your important pages faster.
5. **Skip Unnecessary Categories**

   Avoid submitting categories that are not helpful to the chatbot’s goal.

   > 🔧 Tip: During the training setup, you can exclude irrelevant categories manually.

***

#### 📊 Quick Summary Table:

| Checklist Item        | Guideline Description                              |
| --------------------- | -------------------------------------------------- |
| Content quality       | Use clear and structured web pages                 |
| Duplicate pages       | Avoid duplicates or highly similar content         |
| robots.txt permission | Must be allowed in robots.txt                      |
| sitemap.xml reference | Should be included in robots.txt                   |
| Irrelevant categories | Do not include; exclude during selection or upload |

***

#### ❌ What to Avoid:

* Pages with little or poor-quality content
* Pages blocked in `robots.txt`
* Duplicate pages with only minor differences
* Pages unrelated to the bot's function or scope

  ```
  Don’t include extra pages or categories that are not useful for the chatbot.
  ```

> Example: Exclude categories like “Chatbot Templates”, “Chatbot Features” , “Platform Features” etc., as they are not relevant to the current use case.

<figure><img src="/files/6ThadmvSdxruxmL13ZvS" alt=""><figcaption></figcaption></figure>

***

## Section D: CSV and Google Sheets

#### 🧱 **Structure & Format Guidelines**

#### ✅ General Rules

* **Use Flat Tables:** Each row should be one complete record.
* **No Merged Cells:** Avoid merged headers or cells.
* **Label Every Cell:** No empty cells where values should be explicitly stated.
* **Avoid Contradictions:** All column data must logically align.
* **Be Consistent:** Keep column types and formats uniform.

***

#### 📋 File-Specific Formatting

**CSV Files (Flat Table Format)**

```
User ID,Name,Email,City,Registration Date
101,Alice Smith,alice@example.com,New York,2024-01-15
102,Bob Jones,bob@example.com,Los Angeles,2024-03-22

```

* Comma-separated, no nested headers
* Each column = one field/property
* Avoid trailing commas or inconsistent row lengths

***

#### 📊 **XLSX & Google Sheets**

✅ **Ideal Format**

| User ID | Name        | Email               | City        | Registration Date |
| ------- | ----------- | ------------------- | ----------- | ----------------- |
| 101     | Alice Smith | <alice@example.com> | New York    | 2024-01-15        |
| 102     | Bob Jones   | <bob@example.com>   | Los Angeles | 2024-03-22        |

❌ **Bad Format (Merged/Grouped Headers)**

| A           | B           | C           |
| ----------- | ----------- | ----------- |
| Category    | Subcategory | Item        |
| Electronics | Phones      | iPhone 13   |
|             |             | Samsung S21 |

✅ **Corrected Format**

| Category    | Subcategory | Item        |
| ----------- | ----------- | ----------- |
| Electronics | Phones      | iPhone 13   |
| Electronics | Phones      | Samsung S21 |

***

**✅ Good Table Format (Simple and Clean)**

Use a clear, single-record-per-row format to keep data clean and ready for training:

| Model   | Price | Engine Displacement |
| ------- | ----- | ------------------- |
| Model A | 99999 | 100cc               |
| Model B | 88999 | 125cc               |

***

**❌ Bad Format Example – Confusing Data in Same Row**

Avoid mixing multiple products or entities in one row:

| Model   | Price | Product | Engine | Mileage |
| ------- | ----- | ------- | ------ | ------- |
| Model A | 99999 | Model B | 125cc  | 50      |
| Model B | 88999 | Model C | 150cc  | 40      |

***

**✅ Fixed Format – One Row per Product**

Every product now has its own row, with all corresponding data:

| Model   | Price  | Engine | Mileage |
| ------- | ------ | ------ | ------- |
| Model A | 999999 | 125cc  | 50      |
| Model B | 889999 | 150cc  | 40      |
| Model C | 788899 | 100cc  | 60      |

***

#### ✍️ **Writing Style Guidelines for Tables**

* Use **clear column headers** (e.g., “User ID” not “UID”)
* Fill all values—**no blanks** in categories/subcategories
* **Repeat parent values** for child entries instead of leaving cells empty
* Use consistent **date formats** (e.g., YYYY-MM-DD)
* **No contradictory data** (e.g., conflicting models/prices in same row)
* Avoid multiple entities per row

***

#### ⚡ Quick Summary Table

| Use Case           | What to Do                                           | Example/Table                |
| ------------------ | ---------------------------------------------------- | ---------------------------- |
| User Database      | Use flat structure, one user per row                 | ✅ CSV/XLSX example           |
| Product Categories | Repeat parent-child structure explicitly in each row | ✅ Category/Subcategory table |
| Model Specs        | Avoid mixing models in a single row                  | ✅ Model/Price table          |
| Bad Formats        | Never use merged cells or leave categories blank     | ❌ Bad nested row table       |

***

## FAQ's:

**Here are some common FA**Q's asked when we train our AI:

<details>

<summary><strong>What if I have lots of short sections (e.g., 200 characters)?</strong></summary>

No problem. If your file contains many short pieces (around 200 characters), the system can handle them.

However: Short content may not have enough context for the best results. We use internal techniques to improve accuracy, but make sure every short paragraph is:

* Clear
* Complete
* Related to one topic only

</details>

<details>

<summary><strong>What if my page is allowed in robots.txt but has no useful content?</strong></summary>

Don’t include it. Only submit pages that add value to training.

</details>

<details>

<summary><strong>Can I include category pages with just links?</strong></summary>

Not recommended. Pages with only navigation links and no real content should be excluded.

</details>

<details>

<summary><strong>Is it okay to submit the same URL twice?</strong></summary>

No. Repeating the same URL may lead to redundancy and errors.

</details>

<details>

<summary><strong>Can I include formulas in cells?</strong></summary>

You can, but it’s recommended to **flatten formula outputs** before upload for consistent behavior.

</details>

<details>

<summary><strong>What happens if there are contradictions?</strong></summary>

The bot might give inconsistent answers or ignore data entirely. Always keep one logical entity per row.

</details>

<details>

<summary><strong>Can I skip repeating the same category/subcategory values?</strong></summary>

No. Always fill each row completely—even if data repeats. Empty cells cause confusion.

</details>

<details>

<summary><strong>Can I use merged headers in Excel?</strong></summary>

No. All headers must be single-row and flat. Merged headers break parsing logic.

</details>

<details>

<summary><strong>Can I use emojis in my answers?</strong></summary>

No. Avoid emojis, slang, or overly casual tone. Keep the language clear and professional.

</details>

<details>

<summary><strong>Is there a limit to how many FAQs I can upload at once?</strong></summary>

Yes. You can upload up to **200 FAQs per category** in one CSV file. For more, split them into multiple files.

</details>

<details>

<summary><strong>Can I use abbreviations like CEO or FAQ in the question?</strong></summary>

If you need to include abbreviations, always provide their full form first, followed by the abbreviation in parentheses. For example: *Chief Executive Officer (CEO)*.

</details>

<details>

<summary><strong>Do I need technical knowledge to follow these steps?</strong></summary>

Not necessarily. Just follow this guide, and ask your tech team to help check `robots.txt` and sitemaps if needed.

</details>

***

If everything is correctly formatted but you're still having trouble, reach out to our support team: [**support@botpenguin.com**](mailto:support@botpenguin.com)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.botpenguin.com/bots/whatsapp-bot/train-your-ai-chatbot/best-practices-for-training-your-ai.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
