# Best Practices for Training Your AI

## Interactive Guide

{% embed url="<https://app.arcade.software/share/6ir8iJcPaoA6Qp2iu2tm>" %}

## Section A: Files(PDF, TXT, DOCX)

This section explains how to format and structure your content in PDF, Word (DOCX), and Text (TXT) formats so it can be properly trained into your AI bot for high-quality responses.

#### 🧱 Structuring Your Content

**1. One Topic per Paragraph**

* Stick to one idea per paragraph
* Keep it **under 800 characters** (including spaces)
* Always use complete sentences and correct punctuation

> ✅ Good:
>
> “The pricing model includes three plans: Basic ($29), Pro ($99), and Enterprise ($199). Each offers different features depending on user needs.”

> ❌ Bad:
>
> “Pricing has options. Go to website.”

**2. ✂️ Splitting Long Topics**

If a topic is too long, break it into two or more logical parts using the same heading or theme.

> ✅ Good:
>
> **Main Heading (Part1):** First part
>
> (around 400 characters) with one context
>
> **Main Heading (Part2):** Second part (around 400 characters) with a different but related context

> ❌ Bad:
>
> **Main Heading:** Very long explanation covering multiple ideas in one block without splitting or clarity.

#### 📄 Format-Specific Guidelines

**✅ PDF, DOCX, TXT – Formatting Rules**

* Use **clear section headings** to define topics
* Avoid decorative styles (fonts/colors don’t help the AI)
* Use simple formatting: paragraphs, bullet lists, numbered points

**❌ Don’t Include:**

* Images
* Tables with merged cells
* Decorative graphics or background colors
* Page numbers or headers/footers that repeat unnecessarily

***

#### 📏 Writing Style Guidelines

| Rule              | Description                            | Example                                                |
| ----------------- | -------------------------------------- | ------------------------------------------------------ |
| Be Specific       | Mention exact processes, values, names | “We offer 24/7 live chat support.”                     |
| Use Examples      | Add clear examples wherever possible   | “For example, the ‘Basic’ plan includes...”            |
| Write Directly    | Use short, clear sentences             | “To reset your password, click ‘Forgot Password’.”     |
| Make Things Clear | Never write “etc.” or “many things”    | “Supports up to 100 users.” not “Supports many users.” |

***

#### ❌ What to Avoid

| Mistake                          | Why It's a Problem                              |
| -------------------------------- | ----------------------------------------------- |
| Mixing topics in one paragraph   | Reduces precision during training               |
| Listing unrelated items together | Affects context accuracy                        |
| Including images or charts       | AI doesn’t process them                         |
| Embedding FAQs in the main file  | FAQs must be added separately for best training |

***

#### 🧠 What to Do When You're Not Sure About Format

When unsure about structure, follow this fallback rule:

> “One paragraph = One point = Under 800 characters”

If you have longer paragraphs, the system will break into parts them. Just make sure they’re well-written.

***

#### 🪜 How to Make Answers Better

Keep in mind that each paragraph becomes a reference block. It’s better to have **30 well-written, focused paragraphs** than 5 long paragraphs.

> ✅ Clear and focused content → Fast and accurate answers
>
> ❌ Bulky, unstructured content → Confused or generic bot responses

***

## Section B: FAQ's Training

This section explains how to prepare FAQs in a structured and clear format for training AI bots or building knowledge bases. It ensures your questions and answers are professional, consistent, and easy for the system to understand.

#### 🧱 Structure & Format Guidelines:

* **Question Format:**
  * Maximum length: 200 characters
  * Always write the full question clearly (not just a keyword or phrase)
  * Avoid starting with lowercase or always write the full form first
* **Answer Format:**
  * Maximum length: 500 characters
  * Use complete, grammatically correct sentences
  * Keep it direct, factual, and useful

***

#### 📊 Quick Summary Table:

| Aspect              | Guideline                                    |
| ------------------- | -------------------------------------------- |
| Max Question Length | 200 characters                               |
| Max Answer Length   | 500 characters                               |
| Writing Tone        | Clear, professional, factual                 |
| Abbreviation Use    | Full form first, abbreviation in parentheses |
| File Upload Limit   | Max 200 FAQs per category (split if needed)  |

***

#### ❌ What to Avoid:

* Using only abbreviations (e.g., "CEO") without explanation
* Mixing more than one question or topic in a single entry
* Using vague or generic questions
* Submitting over 200 FAQs in one CSV without splitting
* Including emojis, incomplete sentences, or slang

***

#### ✅ Examples of Good and Bad Practices:

* ❌ *Who is ceo of sml isuzu?*
* ❌ *Who is the Chief Executive officer of sml isuzu?*
* ✅ *Who is the Chief Executive Officer (CEO) of SML Isuzu?*

***

#### 📔 Two Ways to Train Using FAQ:

#### 1. Type Manually in Dashboard

You can directly enter questions and answers into the dashboard interface.

<figure><img src="https://1745791824-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FAPDb8cKQtGlIAfgHjcsQ%2Fuploads%2FttmmlDsz3GtTUQOV5iTr%2Fimage.png?alt=media&#x26;token=01d0c65a-328a-451a-917b-91ad906a54ca" alt=""><figcaption></figcaption></figure>

#### 2. Upload via CSV File

Use a `.csv` file in the following format to upload multiple FAQs at once:

<figure><img src="https://1745791824-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FAPDb8cKQtGlIAfgHjcsQ%2Fuploads%2FD4s2TjKKHI07Xfp5MzIh%2Fimage.png?alt=media&#x26;token=fa6bc916-5dd7-4500-b4dd-9d2a85352310" alt=""><figcaption></figcaption></figure>

The format of the CSV file should be as such:

|                                |                                |
| ------------------------------ | ------------------------------ |
| Question                       | Answer                         |
| What is BotPenguin?            | BotPenguin is a....            |
| What is the use of BotPenguin? | BotPenguin can be used for.... |

***

#### ⚠️ Limitation on Number of FAQs per Category:

* Upload a **maximum of 200 FAQs per category** per file.
* For more than 200 FAQs, **split them into multiple CSV files** or categories.

This ensures faster processing and avoids system errors.

Here’s a suggested FAQ section to add at the end of your document:

***

## Section C: Website Training

#### 🧱 Structure & Format Guidelines:

1. **Use Webpages With Good Content**

   Pages should have informative, structured, and complete content. Avoid empty, template-only, or irrelevant pages.
2. **Avoid Duplicate or Repeated Pages**

   Each page must have unique value. No duplicates or near-duplicates.
3. **robots.txt Must Allow Scraping**

   Confirm that the URL is not blocked by `robots.txt`. Scraping is only allowed if permitted.
4. **Add sitemap.xml in robots.txt**

   Make sure your `robots.txt` includes a `sitemap.xml` link. This helps bots find your important pages faster.
5. **Skip Unnecessary Categories**

   Avoid submitting categories that are not helpful to the chatbot’s goal.

   > 🔧 Tip: During the training setup, you can exclude irrelevant categories manually.

***

#### 📊 Quick Summary Table:

| Checklist Item        | Guideline Description                              |
| --------------------- | -------------------------------------------------- |
| Content quality       | Use clear and structured web pages                 |
| Duplicate pages       | Avoid duplicates or highly similar content         |
| robots.txt permission | Must be allowed in robots.txt                      |
| sitemap.xml reference | Should be included in robots.txt                   |
| Irrelevant categories | Do not include; exclude during selection or upload |

***

#### ❌ What to Avoid:

* Pages with little or poor-quality content
* Pages blocked in `robots.txt`
* Duplicate pages with only minor differences
* Pages unrelated to the bot's function or scope

  ```
  Don’t include extra pages or categories that are not useful for the chatbot.
  ```

> Example: Exclude categories like “Chatbot Templates”, “Chatbot Features” , “Platform Features” etc., as they are not relevant to the current use case.

<figure><img src="https://1745791824-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FAPDb8cKQtGlIAfgHjcsQ%2Fuploads%2FKjI9UJb2xe3nv89wv9Ce%2Fimage.png?alt=media&#x26;token=21faef78-df49-4ab1-81de-460bddc37b7c" alt=""><figcaption></figcaption></figure>

***

## Section D: CSV and Google Sheets

#### 🧱 **Structure & Format Guidelines**

#### ✅ General Rules

* **Use Flat Tables:** Each row should be one complete record.
* **No Merged Cells:** Avoid merged headers or cells.
* **Label Every Cell:** No empty cells where values should be explicitly stated.
* **Avoid Contradictions:** All column data must logically align.
* **Be Consistent:** Keep column types and formats uniform.

***

#### 📋 File-Specific Formatting

**CSV Files (Flat Table Format)**

```
User ID,Name,Email,City,Registration Date
101,Alice Smith,alice@example.com,New York,2024-01-15
102,Bob Jones,bob@example.com,Los Angeles,2024-03-22

```

* Comma-separated, no nested headers
* Each column = one field/property
* Avoid trailing commas or inconsistent row lengths

***

#### 📊 **XLSX & Google Sheets**

✅ **Ideal Format**

| User ID | Name        | Email               | City        | Registration Date |
| ------- | ----------- | ------------------- | ----------- | ----------------- |
| 101     | Alice Smith | <alice@example.com> | New York    | 2024-01-15        |
| 102     | Bob Jones   | <bob@example.com>   | Los Angeles | 2024-03-22        |

❌ **Bad Format (Merged/Grouped Headers)**

| A           | B           | C           |
| ----------- | ----------- | ----------- |
| Category    | Subcategory | Item        |
| Electronics | Phones      | iPhone 13   |
|             |             | Samsung S21 |

✅ **Corrected Format**

| Category    | Subcategory | Item        |
| ----------- | ----------- | ----------- |
| Electronics | Phones      | iPhone 13   |
| Electronics | Phones      | Samsung S21 |

***

**✅ Good Table Format (Simple and Clean)**

Use a clear, single-record-per-row format to keep data clean and ready for training:

| Model   | Price | Engine Displacement |
| ------- | ----- | ------------------- |
| Model A | 99999 | 100cc               |
| Model B | 88999 | 125cc               |

***

**❌ Bad Format Example – Confusing Data in Same Row**

Avoid mixing multiple products or entities in one row:

| Model   | Price | Product | Engine | Mileage |
| ------- | ----- | ------- | ------ | ------- |
| Model A | 99999 | Model B | 125cc  | 50      |
| Model B | 88999 | Model C | 150cc  | 40      |

***

**✅ Fixed Format – One Row per Product**

Every product now has its own row, with all corresponding data:

| Model   | Price  | Engine | Mileage |
| ------- | ------ | ------ | ------- |
| Model A | 999999 | 125cc  | 50      |
| Model B | 889999 | 150cc  | 40      |
| Model C | 788899 | 100cc  | 60      |

***

#### ✍️ **Writing Style Guidelines for Tables**

* Use **clear column headers** (e.g., “User ID” not “UID”)
* Fill all values—**no blanks** in categories/subcategories
* **Repeat parent values** for child entries instead of leaving cells empty
* Use consistent **date formats** (e.g., YYYY-MM-DD)
* **No contradictory data** (e.g., conflicting models/prices in same row)
* Avoid multiple entities per row

***

#### ⚡ Quick Summary Table

| Use Case           | What to Do                                           | Example/Table                |
| ------------------ | ---------------------------------------------------- | ---------------------------- |
| User Database      | Use flat structure, one user per row                 | ✅ CSV/XLSX example           |
| Product Categories | Repeat parent-child structure explicitly in each row | ✅ Category/Subcategory table |
| Model Specs        | Avoid mixing models in a single row                  | ✅ Model/Price table          |
| Bad Formats        | Never use merged cells or leave categories blank     | ❌ Bad nested row table       |

***

## FAQ's:

**Here are some common FA**Q's asked when we train our AI:

<details>

<summary><strong>What if I have lots of short sections (e.g., 200 characters)?</strong></summary>

No problem. If your file contains many short pieces (around 200 characters), the system can handle them.

However: Short content may not have enough context for the best results. We use internal techniques to improve accuracy, but make sure every short paragraph is:

* Clear
* Complete
* Related to one topic only

</details>

<details>

<summary><strong>What if my page is allowed in robots.txt but has no useful content?</strong></summary>

Don’t include it. Only submit pages that add value to training.

</details>

<details>

<summary><strong>Can I include category pages with just links?</strong></summary>

Not recommended. Pages with only navigation links and no real content should be excluded.

</details>

<details>

<summary><strong>Is it okay to submit the same URL twice?</strong></summary>

No. Repeating the same URL may lead to redundancy and errors.

</details>

<details>

<summary><strong>Can I include formulas in cells?</strong></summary>

You can, but it’s recommended to **flatten formula outputs** before upload for consistent behavior.

</details>

<details>

<summary><strong>What happens if there are contradictions?</strong></summary>

The bot might give inconsistent answers or ignore data entirely. Always keep one logical entity per row.

</details>

<details>

<summary><strong>Can I skip repeating the same category/subcategory values?</strong></summary>

No. Always fill each row completely—even if data repeats. Empty cells cause confusion.

</details>

<details>

<summary><strong>Can I use merged headers in Excel?</strong></summary>

No. All headers must be single-row and flat. Merged headers break parsing logic.

</details>

<details>

<summary><strong>Can I use emojis in my answers?</strong></summary>

No. Avoid emojis, slang, or overly casual tone. Keep the language clear and professional.

</details>

<details>

<summary><strong>Is there a limit to how many FAQs I can upload at once?</strong></summary>

Yes. You can upload up to **200 FAQs per category** in one CSV file. For more, split them into multiple files.

</details>

<details>

<summary><strong>Can I use abbreviations like CEO or FAQ in the question?</strong></summary>

If you need to include abbreviations, always provide their full form first, followed by the abbreviation in parentheses. For example: *Chief Executive Officer (CEO)*.

</details>

<details>

<summary><strong>Do I need technical knowledge to follow these steps?</strong></summary>

Not necessarily. Just follow this guide, and ask your tech team to help check `robots.txt` and sitemaps if needed.

</details>

***

If everything is correctly formatted but you're still having trouble, reach out to our support team: [**support@botpenguin.com**](mailto:support@botpenguin.com)
