Introduction
Today, we live in a world where AI is used in almost every use case. From banking to healthcare applications, AI has its foot. After knowing the possibilities of ChatGPT, several other companies have started putting their effort into building a better transformer with improved accuracy. In this article, we will see how we can use Google’s Gemini Pro model to analyze an image and give a medical diagnosis. It is gonna be pretty exciting; let’s hop on.
Learning Objectives
- We will do a medical analysis on the uploaded image
- We will get hands-on experience by using Gemini Pro
- We will build a streamlit-based application to see the results in an interactive environment.
This article was published as a part of the Data Science Blogathon.
What is Gemini?
Gemini is a new series of foundational models that was built and introduced by Google. This is by far their largest set of models compared to PaLM and is built with a focus on multimodality from the ground up. This makes the Gemini models powerful against different combinations of information types, including text, images, audio, and video. Currently, the API supports images and text. Gemini has proven by reaching state-of-the-art performance on the benchmarks and even beating the ChatGPT and the GPT4-Vision models in many of the tests.
Configuring Gemini Pro Api Key
We will follow the below steps to create a Gemini Pro Api Key:
Step 1: Visit Google AI Studio and log in using your Google account.
Step 2: After logging in – you will see something like this. Click on ‘Create API key’
Step 3: After that you will see something like below. If you are creating a Google project for the first time – click on ‘Create an API key in the new project’
Once you click on that button, it will generate an API key that can be used for our project here.
In the folder structure, create a python file google_api_key.py like below to store the api key.
google_api_key='YOUR_API_KEY'
Configure the Gemini Pro Settings and Deploy as a Streamlit App
Before we start writing code, we need to understand the concept of a prompt. A prompt is a natural language request submitted to a language model to receive a response. Prompts can contain questions, instructions, contextual information, examples, and partial input for the model to complete or continue. After the model receives a prompt, it can generate text, embeddings, code, images, videos, music, and more, depending on the model being used.
We can find the detailed instructions here. We can also find some advanced strategies here. The key thing to remember is that if we want to build a better model – we need to provide better prompts for the Gemini Pro model to understand.
We will give the below prompt to our model:
"""
You are a domain expert in medical image analysis. You are tasked with
examining medical images for a renowned hospital.
Your expertise will help in identifying or
discovering any anomalies, diseases, conditions or
any health issues that might be present in the image.
Your key responsibilites:
1. Detailed Analysis : Scrutinize and thoroughly examine each image,
focusing on finding any abnormalities.
2. Analysis Report : Document all the findings and
clearly articulate them in a structured format.
3. Recommendations : Basis the analysis, suggest remedies,
tests or treatments as applicable.
4. Treatments : If applicable, lay out detailed treatments
which can help in faster recovery.
Important Notes to remember:
1. Scope of response : Only respond if the image pertains to
human health issues.
2. Clarity of image : In case the image is unclear,
note that certain aspects are
'Unable to be correctly determined based on the uploaded image'
3. Disclaimer : Accompany your analysis with the disclaimer:
"Consult with a Doctor before making any decisions."
4. Your insights are invaluable in guiding clinical decisions.
Please proceed with the analysis, adhering to the
structured approach outlined above.
Please provide the final response with these 4 headings :
Detailed Analysis, Analysis Report, Recommendations and Treatments
"""
We could add more instructions to improve the performance. However, this should serve as a good starting point for now.
Now, we will focus on the code for streamlit based deployment
Code:
import streamlit as st
from pathlib import Path
import google.generativeai as genai
from google_api_key import google_api_key
## Streamlit App
genai.configure(api_key=google_api_key)
# https://aistudio.google.com/app/u/1/prompts/recipe-creator
# Set up the model
generation_config = {
"temperature": 1,
"top_p": 0.95,
"top_k": 0,
"max_output_tokens": 8192,
}
safety_settings = [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
]
system_prompts = [
"""
You are a domain expert in medical image analysis. You are tasked with
examining medical images for a renowned hospital.
Your expertise will help in identifying or
discovering any anomalies, diseases, conditions or
any health issues that might be present in the image.
Your key responsibilites:
1. Detailed Analysis : Scrutinize and thoroughly examine each image,
focusing on finding any abnormalities.
2. Analysis Report : Document all the findings and
clearly articulate them in a structured format.
3. Recommendations : Basis the analysis, suggest remedies,
tests or treatments as applicable.
4. Treatments : If applicable, lay out detailed treatments
which can help in faster recovery.
Important Notes to remember:
1. Scope of response : Only respond if the image pertains to
human health issues.
2. Clarity of image : In case the image is unclear,
note that certain aspects are
'Unable to be correctly determined based on the uploaded image'
3. Disclaimer : Accompany your analysis with the disclaimer:
"Consult with a Doctor before making any decisions."
4. Your insights are invaluable in guiding clinical decisions.
Please proceed with the analysis, adhering to the
structured approach outlined above.
Please provide the final response with these 4 headings :
Detailed Analysis, Analysis Report, Recommendations and Treatments
"""
]
model = genai.GenerativeModel(model_name="gemini-1.5-pro-latest",
generation_config=generation_config,
safety_settings=safety_settings)
st.set_page_config(page_title="Visual Medical Assistant", page_icon="🩺",
layout="wide")
st.title("Visual Medical Assistant 👨⚕️ 🩺 🏥")
st.subheader("An app to help with medical analysis using images")
file_uploaded = st.file_uploader('Upload the image for Analysis',
type=['png','jpg','jpeg'])
if file_uploaded:
st.image(file_uploaded, width=200, caption='Uploaded Image')
submit=st.button("Generate Analysis")
if submit:
image_data = file_uploaded.getvalue()
image_parts = [
{
"mime_type" : "image/jpg",
"data" : image_data
}
]
# making our prompt ready
prompt_parts = [
image_parts[0],
system_prompts[0],
]
# generate response
response = model.generate_content(prompt_parts)
if response:
st.title('Detailed analysis based on the uploaded image')
st.write(response.text)
Here is the line-by-line interpretation:
Line 1-4 -> We import the necessary libraries and the google_api_key.
On line 7 -> we must pass the API Key created in step 2.
Lines 11-35 -> Here, we are defining the Gemini model’s basic configuration and safety settings. Don’t worry; you can visit Google AI Studio and click on get code to get all these code snippets.
Lines 37-71 -> Here, we are defining our prompt for the model.
Lines 73-76 -> Here, we are initializing our Gemini model.
Lines 78-81 -> Here, we are showing some texts on streamlit app
Lines 83-87 -> Notice how we store the uploaded image in the file_uploaded variable. We allow ‘png’,’jpg’,’jpeg’ image types. So, the upload will fail if you provide anything else. If the image is successfully uploaded, we will display it on the browser.
Lines 89-113 -> We have created a submit button with the text “Generate Analysis.” Once we click on that, the actual magic will happen. We pass the image and the prompt to our Gemini model. The Gemini model will return the response back to us.
Then, we will display the response back on the browser.
I have saved this file as app.py
Seeing it in Action
We need to open the Python terminal and execute the following to invoke a streamlit app. Make sure you change your directory to the same as app.py
streamlit run app.py
Output:
Now, we will upload some pictures and try to see the output. Let’s try seeing the analysis of a crooked image. I downloaded the same from Google.
Let us upload this image by clicking on the browse files button.
Once the image is uploaded, click the Generate Analysis button. You will see a detailed analysis below:
I understand that the image might be a bit difficult to read, so I’ll share zoomed-in images of each heading to make it easier to understand.
Image 1:
Image 2:
Image 3:
We can conduct an in-depth analysis of the potential medical diagnosis simply by examining the image. Additionally, given that it pertains to a dental issue, the suggested course of action is to consult an orthodontist and undergo some dental X-rays. Furthermore, several treatment options, such as wearing braces and retainers, appear to be sensible choices in such cases.
Let us look at how the process looks like (end to end)
Similarly, let us use another example. Here we will upload the below ankle swollen image and check the medical analysis.
After uploading the image and clicking the generated analysis, this is how the process will look like:
Let us look at the zoomed-in images of the headings:
Image 1:
Image 2:
Image 3:
So, we can see an in-depth detailed analysis of the possible medical diagnosis – just by looking at the image. We can see how the model can capture a swelling problem in the left foot. The model recommends consulting a doctor since it is hard to deduce much just by looking at this kind of swelling. However, we can see a few treatment options, like compression packs and elevating the left foot to reduce swelling, which seems logical in such scenarios.
We can play around and get more such images analysed.
Use cases
Such applications are incredibly useful in remote locations where doctors are inaccessible. They are also beneficial in areas where patients are far from the clinic or hospital. While we cannot rely entirely on these systems, they provide fairly accurate medical indicators and guidance. We can further refine our prompts and include home remedies as a segment. The Gemini Pro model can deliver state-of-the-art performance if we can define complex prompts.
Conclusion
In this article, we’ve explored the capabilities of Google’s Gemini Pro model for medical image analysis. We’ve demonstrated how to configure the API, create effective prompts, and deploy a Streamlit application for interactive results. The Gemini Pro model offers state-of-the-art performance, making it a powerful tool for remote medical diagnostics and clinical decision-making. While it shouldn’t replace professional medical advice, it provides valuable insights and can significantly enhance accessibility to medical evaluations in underserved areas. As AI technology advances, tools like Gemini Pro will play an increasingly critical role in healthcare innovation.
Key Takeaway
- In this article, we demonstrated how to use Gemini Pro to perform a medical examination of an image.
- We have discussed configuring the Gemini Pro API Key and how defining prompts can enhance model performance.
- Additionally, we have deployed the mini project using Streamlit, enabling us to experiment and observe the results.
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.
Frequently Asked Questions
A. Gemini is a series of foundational models from Google. It focuses on multimodality and supports text and images. It includes models of varying sizes (Ultra, Pro, Nano). Unlike previous models like PaLM, Gemini can handle diverse information types.
A. A prompt is a natural language request submitted to a language model to receive a response back. Prompts can contain questions, instructions, contextual information, examples, and partial input for the model to complete or continue. After the model receives a prompt, it can generate text, embeddings, code, images, videos, music, and more, depending on the model being used.
A: Such applications are very helpful in remote locations where doctors are inaccessible. They are also helpful in locations where the patient is far from the clinic or hospital.