Gemini AI API Simplified: A Beginner’s Guide in Python

RAHULRAJ P V
9 min readDec 14, 2023

In the ever-evolving landscape of AI and machine learning, Gemini AI API has emerged as a game-changer, empowering developers to create intelligent applications that can understand both text and images. Released just days ago on December 13, 2023, this API opens up exciting possibilities for enhancing user experiences, automating tasks, and solving complex problems. In this article, we’ll take you on a journey through the capabilities and usage of Gemini AI API, equipping you with the knowledge to harness its potential.

Using the Gemini AI API with Python is a straightforward process. You can interact with the API using Python code to generate text-based responses based on your prompts, which can include both text and image data. Below are step-by-step instructions on how to use the Gemini AI API using Python:

Note: Before you begin, make sure you have obtained access to the Gemini AI API and have an API key ready.

1. Install Required Libraries:

Start by installing the required libraries. You can use the google-generativeai package to interact with the Gemini AI API.

!pip install -q -U google-generativeai

2. Import Libraries and Configure API:

Next, import the necessary libraries

import pathlib
import textwrap

import google.generativeai as genai

# Used to securely store your API key
from google.colab import userdata

from IPython.display import display
from IPython.display import Markdown


def to_markdown(text):
text = text.replace('•', ' *')
return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

Let's break down the code and explain the purpose of each component:

  1. import pathlib: This line imports the pathlib module, which is used for working with file paths and directories. It provides a convenient way to manage file and directory paths.
  2. import textwrap: The textwrap module is imported, which is a part of the Python standard library. It's used for formatting text by wrapping long lines and adding line breaks.
  3. import google.generativeai as genai: This line imports the genai module from the google.generativeai package. This package provides access to the Gemini AI API, allowing you to use Google's generative AI models for various text and image generation tasks.
  4. from google.colab import userdata: This line imports the userdata module from the google.colab package. It's used for securely storing your API key. Colab is often used for running Python code in Google Colaboratory, and this module helps manage user data securely.
  5. from IPython.display import display: This import statement brings in the display function from the IPython.display module. This function is used to display various types of content, such as text, images, and Markdown, in Jupyter Notebook or IPython environments.
  6. from IPython.display import Markdown: Similar to the previous import, this line imports the Markdown class from the IPython.display module. It allows you to render text as Markdown-formatted content.
  7. def to_markdown(text): This line defines a Python function called to_markdown, which takes a text as input.
  8. text = text.replace('•', ' *'): Within the to_markdown function, this line replaces any occurrences of the bullet character ('•') with an asterisk ('*'). This is a specific text formatting task to create bullet points in Markdown.
  9. return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True)): This line uses the textwrap.indent function to add a greater-than symbol ('>') to each line of the text, effectively indenting it. This formatting is done to create block quotes in Markdown. Finally, the Markdown content is returned, which can be displayed using the display function mentioned earlier.

Now , we have to configure the API with your API key. You can get api key from the following website : Gemini-API

After opening this site, You can create your own API keys.

You can either choose to create Api key in new project option or use the option to obtain the Api key from the existing project as shown in the above image.

Then, you can copy and paste it in the secret key in Google colab.

import google.generativeai as genai

# Configure the API with your API key
genai.configure(api_key='YOUR_API_KEY')

Here, the ‘YOUR_API_KEY’ in the code can be used to store the secret key in the key icon on the left side of Google Colab. You can then use the name of the key, as specified in ‘YOUR_API_KEY,’ in the code above as shown in the image below.

I hope, now you entered your api key.

3. Choose Your Gemini Model:

Next is to select the Gemini model variation that suits your needs. For example, if you want to work with text and images, you can choose the gemini-pro-vision model:

model = genai.GenerativeModel('gemini-pro-vision')

Here, You can either use Gemini-pro or Gemini-pro-vision. For normal text generation the pro version of Gemini is enough . But for Image based prompts the vision version is needed. You can explore other model options based on your specific requirements.

4. Construct Your Prompt:

Your prompt is the input you provide to the model, and it can include both text and image data. Depending on your use case, you can create a prompt using Python variables. Here’s an example of a text prompt:

A. For Generating the text from text inputs

For text-only prompts, use the gemini-pro model:

model = genai.GenerativeModel('gemini-pro')

The `generate_content` method can accommodate a wide range of use cases, such as multi-turn chat and multimodal input, depending on the capabilities of the underlying model. The available models are designed to accept text and images as input and produce text as output. In the simplest scenario, you can provide a prompt string to the `GenerativeModel.generate_content` method:

response = model.generate_content("What is the Gemini.AI?")

This code generates a response using the Gemini AI model. The question, “What is the Gemini.AI?” is sent to the model, and the response is stored in the ‘response’ variable.

In simple cases, you can access the response text using `response.text`. To display the text in Markdown format, use the `to_markdown` function.

to_markdown(response. Text)

The code `to_markdown(response.text)` is using the `to_markdown` function to format the text contained in the `response` object as Markdown. It takes the text from `response.text`, processes it to apply Markdown formatting, and returns the formatted text. This can be useful when you want to display the response in a nicely formatted Markdown style. For this prompt, i got the following Output.

If the API didn’t provide a result, you can check the `GenerateContentResponse.prompt_feedback` to see if it was blocked due to safety concerns related to the prompt. This allows you to determine if the prompt you provided might have triggered safety measures, leading to the absence of a response.

response.prompt_feedback

Gemini AI has the capability to generate several potential responses for a given prompt. These potential responses are referred to as “candidates,” and you can examine them to choose the most appropriate one to use as the final response.

You can access and view these response candidates using the `GenerateContentResponse.candidates` attribute. This feature allows you to evaluate different response options and select the one that best fits your requirements or context.

response.candidates

B. Generate text from image and text inputs

Gemini offers a powerful multimodal model known as “gemini-pro-vision.” This model has the unique capability to process both text and image inputs simultaneously. To interact with this model and generate content, you can use the GenerativeModel.generate_content API. It's specifically designed to work seamlessly with multimodal prompts, where you provide both text and image inputs. The result you receive from this API will be in the form of text output, making it a versatile tool for a wide range of applications that require a combination of text and image understanding.

Here First we have to open an image. For that ,

import PIL.Image

image = PIL.Image.open('image.jpg')
image

Here, the Python Imaging Library (PIL) to open and load an image named ‘image.jpg’ into a variable called ‘image'. It will give us the image as shown below. You can see whatever image you want .

Now, to use the gemini-pro-vision model and pass an image to it using the generate_content method. For that,

model = genai.GenerativeModel('gemini-pro-vision')

Then , to use the image with prompt:

response = model.generate_content(image)

to_markdown(response.text)

The code `response = model.generate_content(image)` generates content using the `image` as input with the specified model. Then, `to_markdown(response.text)` is used to format the text within the `response` object as Markdown. This allows you to present the generated content in a structured and readable Markdown format. After running the code, for me I got the following answer, which describes correctly the image given as input to the model.

To provide both text and images in a prompt, pass a list containing the strings and images:

response = model.generate_content(["Write a short, description on what you see in the  the image ", image], stream=True)
response.resolve()

Here, the code response = model.generate_content(["Write a short description", image], stream=True) constructs a multimodal prompt by passing a list containing both a text string ("Write a short description") and an image (image) to the generate_content method of the specified model. The stream=True parameter indicates that the response should be streamed.

Then, response.resolve() is called to retrieve the final response from the model. This method resolves the response and provides you with the generated content, which can include text based on the prompt.

Why Stream is used ?

When stream=True is used as a parameter in the generate_content method, it enables streaming mode for the response. In this mode, the response is processed and delivered incrementally, allowing you to access and process parts of the response as they become available, rather than waiting for the entire response to be generated before accessing it.

This can be particularly useful in scenarios where the response may be large, and you want to start working with the data as soon as possible, without waiting for the entire content generation process to finish. Streaming mode can improve efficiency and reduce memory usage when dealing with large responses. So, when you set stream=True, it tells the API to provide the response in a streaming fashion, making it available in smaller portions as it's generated, rather than delivering the entire response at once.

Then we use :

to_markdown(response.text)

The code to_markdown(response.text) is used to format the text contained in the response object as Markdown. This allows you to present the generated text content in a structured and visually appealing way, making it suitable for various display purposes, such as in a Markdown document or on a website. Which give us an output :

By following these steps, you can effectively utilize the Gemini AI API in Python to generate intelligent and context-aware responses based on your prompts, which can include both text and image inputs. This technology opens up exciting possibilities for various applications and automation tasks.

I hope you understand how to use Gemini AI API for basic tasks. In the next article we will discuss about, Advanced use cases and it’s applications in detail.

Link to Github code: https://github.com/rahulrajpv/Gemini.git

Link to Colab code : https://colab.research.google.com/drive/1HJ43PKyNoalWHf_foWFQCVO7vvAPNY3j?usp=sharing

References :

  1. https://github.com/rahulrajpv/Gemini/blob/main/Gemini%20API%20-%20Python%20code.ipynb
  2. Gemini API: Quickstart with Python | Google AI for Developers
  3. Google AI Studio
  4. Gemini API Overview | Google AI for Developers

--

--

RAHULRAJ P V

DUK MTech CSE AI '24 | IIM Kozhikode Research Intern | CSIR NPL Former Project Intern | MSc Physics | PGDDSA | Generative AI Learner🧠 | Film Enthusiast