Building Chatbot API Using FastAPI and GPT-2 Model

Introduction

Right now I want to create article about “Building Chatbot API Using FastAPI and GPT-2 Model”. This article just to be my documentation for this project. This chatbot is designed for completing messages based on user input.

Project Overview

The chatbot API contains of the following components:

FastAPI: A modern Python web framework for building APIs.
GPT-2: A pre-trained transformer-based language model

chatbot-api/
├── app/
│   ├── api/
│   │   └── v1/
│   │       ├── __init__.py
│   │       └── chat.py
│   ├── models/
│   │   ├── __init__.py
│   │   └── gpt2.py
│   ├── schemas/
│   │   ├── __init__.py
│   │   └── chat.py
│   ├── services/
│   │   ├── __init__.py
│   │   └── chat_service.py
│   └── main.py
├── .gitignore
├── requirements.txt
├── Dockerfile
├── Readme.md

main.py

This file is where FastAPI initialized and the chatbot API router

from fastapi import FastAPI
from app.api.v1.chat import router as chat_router

app = FastAPI()

app.include_router(chat_router, prefix="/api/v1")

chat.py

Defines the API route for handling chatbot request

from fastapi import APIRouter, HTTPException
from app.schemas.chat import ChatRequest, ChatResponse
from app.services.chat_service import generate_response

router = APIRouter()

@router.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRequest):
    try:
        response_text = await generate_response(request.message)
        return ChatResponse(response=response_text)
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

schemas/chat.py

Defines request and response models using Pydantic

from pydantic import BaseModel

class ChatRequest(BaseModel):
    message: str

class ChatResponse(BaseModel):
    response: str

gpt2.py

Implement the GPT-2 model for text completion. This code also configured to use GPU (MPS) instead of the CPU for faster computations on a Mac with ap Apple Silicon chip (M1/M2/M3). If running on a non-MPS-supported device, it will default to CPU

from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

class GPT2Model:
    def __init__(self):
        self.tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
        self.model = GPT2LMHeadModel.from_pretrained("gpt2")

        self.device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
        self.model.to(self.device)

    def generate(self, text: str) -> str:
        input_ids = self.tokenizer.encode(text, return_tensors="pt").to(self.device)
        outputs = self.model.generate(
            input_ids,
            max_length=200,
            do_sample=True,
            temperature=1.2,
            top_k=40,
            top_p=0.95,
            repetition_penalty=1.1,
        )
        response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        return response

chat_services.py

Handles chatbot response generation

from app.models.gpt2 import GPT2Model

async def generate_response(user_input: str) -> str:
    gpt2 = GPT2Model()
    response = gpt2.generate(user_input)

    return response

Dockerfile

Dockerfile to containerize the chatbot API

FROM python:3.10

WORKDIR /app

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Installation and Setup

Cloning and Manual Setup

Clone the repository

git clone https://github.com/gemm123/chatbot-ai
cd chatbot-api

Create virtual environment and install dependencies

python -m venv venv
source venv/bin/active
pip install -r requirements.txt

Run the FastAPI application:

uvicorn app.main:app --reload

Access the API documentation at http://127.0.0.1:8000/docs

Using Docker

Build Docker image

docker build -t chatbot-api .

Run the Docker container

docker run -p 8000:8000 chatbot-api

Access the API documentation at http://127.0.0.1:8000/docs

Send a POST request to the /api/v1/chat endpoint with the following JSON body:

{
 "message": "Once upon a time in a mystical forest,"
}

The API will respond with a continuation of the message:

{
 "response": "Once upon a time in a mystical forest, only one form could stand strong…"
}

Conclusion

This chatbot API demonstrates how to integrate FastAPI with a pre-trained language model like GPT-2 to build an AI-driven application for message completion. You can see the project repository at https://github.com/gemm123/chatbot-ai/