Introduction

Right now I want to create article about “Building Chatbot API Using FastAPI and GPT-2 Model”. This article just to be my documentation for this project. This chatbot is designed for completing messages based on user input.

Project Overview

The chatbot API contains of the following components:

  • FastAPI: A modern Python web framework for building APIs.
  • GPT-2: A pre-trained transformer-based language model
chatbot-api/
├── app/
│   ├── api/
│   │   └── v1/
│   │       ├── __init__.py
│   │       └── chat.py
│   ├── models/
│   │   ├── __init__.py
│   │   └── gpt2.py
│   ├── schemas/
│   │   ├── __init__.py
│   │   └── chat.py
│   ├── services/
│   │   ├── __init__.py
│   │   └── chat_service.py
│   └── main.py
├── .gitignore
├── requirements.txt
├── Dockerfile
├── Readme.md
  1. main.py

This file is where FastAPI initialized and the chatbot API router

from fastapi import FastAPI
from app.api.v1.chat import router as chat_router

app = FastAPI()

app.include_router(chat_router, prefix="/api/v1")
  1. chat.py

Defines the API route for handling chatbot request

from fastapi import APIRouter, HTTPException
from app.schemas.chat import ChatRequest, ChatResponse
from app.services.chat_service import generate_response

router = APIRouter()

@router.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRequest):
    try:
        response_text = await generate_response(request.message)
        return ChatResponse(response=response_text)
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))
  1. schemas/chat.py

Defines request and response models using Pydantic

from pydantic import BaseModel

class ChatRequest(BaseModel):
    message: str

class ChatResponse(BaseModel):
    response: str
  1. gpt2.py

Implement the GPT-2 model for text completion. This code also configured to use GPU (MPS) instead of the CPU for faster computations on a Mac with ap Apple Silicon chip (M1/M2/M3). If running on a non-MPS-supported device, it will default to CPU

from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

class GPT2Model:
    def __init__(self):
        self.tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
        self.model = GPT2LMHeadModel.from_pretrained("gpt2")

        self.device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
        self.model.to(self.device)

    def generate(self, text: str) -> str:
        input_ids = self.tokenizer.encode(text, return_tensors="pt").to(self.device)
        outputs = self.model.generate(
            input_ids,
            max_length=200,
            do_sample=True,
            temperature=1.2,
            top_k=40,
            top_p=0.95,
            repetition_penalty=1.1,
        )
        response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        return response
  1. chat_services.py

Handles chatbot response generation

from app.models.gpt2 import GPT2Model

async def generate_response(user_input: str) -> str:
    gpt2 = GPT2Model()
    response = gpt2.generate(user_input)

    return response
  1. Dockerfile

Dockerfile to containerize the chatbot API

FROM python:3.10

WORKDIR /app

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Installation and Setup

Cloning and Manual Setup

  1. Clone the repository
git clone https://github.com/gemm123/chatbot-ai
cd chatbot-api
  1. Create virtual environment and install dependencies
python -m venv venv
source venv/bin/active
pip install -r requirements.txt
  1. Run the FastAPI application:
uvicorn app.main:app --reload
  1. Access the API documentation at http://127.0.0.1:8000/docs

Using Docker

  1. Build Docker image
docker build -t chatbot-api .
  1. Run the Docker container
docker run -p 8000:8000 chatbot-api
  1. Access the API documentation at http://127.0.0.1:8000/docs

Send a POST request to the /api/v1/chat endpoint with the following JSON body:

{
 "message": "Once upon a time in a mystical forest,"
}

The API will respond with a continuation of the message:

{
 "response": "Once upon a time in a mystical forest, only one form could stand strong…"
}

Conclusion

This chatbot API demonstrates how to integrate FastAPI with a pre-trained language model like GPT-2 to build an AI-driven application for message completion. You can see the project repository at https://github.com/gemm123/chatbot-ai/