Getting Started with GPT-4o on Google Colab: A Step-by-Step Guide

5 min readMay 16, 2024

On May 13, 2024 OpenAI demonstrated its latest AI model, GPT-4o, showcasing advanced voice interaction capabilities with both text and images. Artificial Intelligence (AI) is transforming various industries, and OpenAI’s GPT-4o model is at the forefront of this revolution.

As measured on traditional benchmarks, GPT-4o achieves GPT-4 Turbo-level performance on text, reasoning, and coding intelligence, while setting new high watermarks on multilingual, audio, and vision capabilities.

This guide will help you through setting up and using GPT-4o on Google Colab. Whether you’re looking to solve simple math problems, generate complex text, or analyze images, this tutorial will help you get started.

1. Setup

Before starting coding , ensure you have a Google Colab account and access to OpenAI’s GPT-4o API. Here’s a step-by-step guide to setting up your environment and exploring some exciting features of GPT-4o.

Step 1: Install OpenAI Python Package

First, you need to install the OpenAI Python package. Open a new Colab notebook and run the following command:

!pip install --upgrade openai --quiet

Step 2: Import Libraries and Set Up API Key

Next, import the necessary libraries and set up your API key. You can store your API key securely using Google Colab’s userdata module.

import json
from openai import OpenAI
import os
from google.colab import userdata

MODEL = "gpt-4o"

client = OpenAI(api_key=userdata.get('openai'))

Step 3: Create Your First Completion

Let’s create a simple completion to get a feel for how GPT-4o works. We’ll ask the model to solve a basic math problem.

completion = client.chat.completions.create(
  model=MODEL,
  messages=[
    {"role": "system", "content": "You are a helpful assistant. Help me with my math homework!"},
    {"role": "user", "content": "Hello! Could you solve 4+5?"}
  ]
)

print("Assistant: " + completion.choices[0].message.content)

Step 4: Ask More Complex Questions

You can ask GPT-4o more complex questions to understand its capabilities better. For example, you can inquire about the model’s origins and training details.

completion = client.chat.completions.create(
  model=MODEL,
  messages=[
    {"role": "user", "content": "What is your name and who created you? What is your training cutoff date?"}
  ]
)

print("Assistant: " + completion.choices[0].message.content)

2. JSON Mode for Function Calling

GPT-4o can generate JSON responses, which are useful for structured data and function calling.

Step 1: Create a JSON Response

Let’s create a JSON response to generate a weekly workout routine.

completion = client.chat.completions.create(
  model=MODEL,
  response_format={"type": "json_object"},
  messages=[
    {"role": "system", "content": "You are a trainer who always responds in JSON"},
    {"role": "user", "content": "Create a weekly workout routine for me"}
  ]
)

print(completion.choices[0].message)
json.loads(completion.choices[0].message.content)

3. Image Understanding

GPT-4o can also understand and process images. We’ll explore how to work with images by encoding them in base64.

Step 1: Encode Image

First, encode an image to base64.

from IPython.display import Image, display
import base64

IMAGE_PATH = "/content/IMG-20240118-WA0023.jpg"

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

base64_image = encode_image(IMAGE_PATH)
display(Image(IMAGE_PATH))

Step 2: Analyze Image

Send the encoded image to GPT-4o for analysis.

response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant that responds in Markdown. Help me with my math homework!"},
        {"role": "user", "content": [
            {"type": "text", "text": "What is the colour of the flower?"},
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{base64_image}"}}
        ]}
    ],
    temperature=0.0,
)

print(response.choices[0].message.content)

Step 3: Analyze URL Image

You can also analyze images directly from URLs.

response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant that responds in Markdown. Help me with my math homework!"},
        {"role": "user", "content": [
            {"type": "text", "text": "What is the colour of the flower?"},
            {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/37/Ranunculus_repens_1_%28cropped%29.JPG/192px-Ranunculus_repens_1_%28cropped%29.JPG"}}
        ]}
    ],
    temperature=0.0,
)

print(response.choices[0].message.content)

4. Function Calling

GPT-4o can call predefined functions based on user input. This is particularly useful for integrating external data sources or services.

Step 1: Define a Function

Define a function to get the current score of an NBA game.

def get_nba_game_score(team):
    if "lakers" in team.lower():
        return json.dumps({"team": "Lakers", "score": "102", "opponent": "Warriors", "opponent_score": "98"})
    elif "bulls" in team.lower():
        return json.dumps({"team": "Bulls", "score": "89", "opponent": "Celtics", "opponent_score": "95"})
    else:
        return json.dumps({"team": team, "score": "N/A", "opponent": "N/A", "opponent_score": "N/A"})

Step 2: Initialize Conversation and Call Function

Create a conversation where the model can call this function.

def function_calling():
    messages = [{"role": "user", "content": "What's the score of the Lakers game?"}]

    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_nba_game_score",
                "description": "Get the current score of an NBA game for a given team",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "team": {"type": "string", "description": "The name of the NBA team, e.g. Lakers, Bulls"},
                    },
                    "required": ["team"],
                },
            },
        }
    ]

    response = client.chat.completions.create(
        model=MODEL,
        messages=messages,
        tools=tools,
        tool_choice="auto",
    )

    response_message = response.choices[0].message
    tool_calls = response_message.tool_calls

    if tool_calls:
        available_functions = {"get_nba_game_score": get_nba_game_score}
        messages.append(response_message)

        for tool_call in tool_calls:
            function_name = tool_call.function.name
            function_to_call = available_functions[function_name]
            function_args = json.loads(tool_call.function.arguments)

            function_response = function_to_call(team=function_args.get("team"))

            messages.append(
                {"tool_call_id": tool_call.id, "role": "tool", "name": function_name, "content": function_response}
            )

        second_response = client.chat.completions.create(
            model=MODEL,
            messages=messages,
        )

        return second_response

print(function_calling())

Congratulations!

You’ve now learned how to set up and use GPT-4o on Google Colab. This guide covered basic text completions, JSON responses, image processing, and function calling. These capabilities can be extended to build sophisticated AI applications across various domains. Happy coding!