How To Automate PowerPoint Slides From Excel Using Python and ChatGPT
Image source: @campaign_creators via Unsplash
PowerPoint has long been a staple for storytelling in professional settings. Whether you’re giving a TED Talk or a boardroom presentation at work, the popular presentation software can help you tell engaging stories that captivate your audiences.
Unfortunately, not every slide deck you create will be about telling a great story. This is especially true in the field of market research, where 100+ slide decks packed with charts and every possible cut of your dataset are all too common. In fact, we have a name for this kind of thing; a data dump.
These types of PowerPoint decks are common enough that even Forbes wrote about them.
“We’ve all sat through one of these “data dumps”—a data presentation packed with more facts and figures than a human being can absorb in a single sitting.”
- Brent Dykes via Forbes
Data dumps are generally not a good use of an analyst's or researcher's time and should be avoided as much as possible.
But sadly, many clients still ask for and expect this type of deliverable.
The good news is I have an epic automation to share with you today that will hopefully make your life a little easier if you’re ever tasked with working on a data dump in PowerPoint.
To achieve this miracle of automation, we’ll use a mix of AI (ChatGPT) and a handy Python script aptly named python-pptx. I’d also like to give a shout out to the script’s creator and maintainer, Steve Canny.
While Canny’s python-pptx library will provide the underlying code base to automate slide creation, we’ll also use ChatGPT to write an executable script that enables some additional functionality and customization. The final output will be a Python script that can automatically create a full slide deck with charts in seconds. Moreover, this slide deck can be tailored to your specific look and feel!
Some important caveats
Before we dive into the tutorial, I want to cover a few important caveats.
This automation shouldn’t replace the need for storytelling. There’s an art to giving great presentations, so you don’t want to overuse this. Not every presentation you give will be a data dump. So use this automation wisely, and ensure you know when and when not to use it.
Throughout this article, I’ve provided all the templates and scripts used as downloadable files. However, you will likely need to modify the PPTX template and code to suit your specific needs.
The coding side of this tutorial was achieved using GPT-4 on a paid OpenAI plan. However, you can probably achieve similar results on a free plan with GPT-3.5, or another LLM like CodeLlama.
Ok, let’s jump into the tutorial. And if you prefer to watch a tutorial, you can check out my video below. Otherwise, read on for the step by step walkthrough.
Step 1 - Preparation
To start, you’ll need to have Python installed locally on your computer. If you don’t have Python installed yet, check out this helpful article on DataCamp.
I also recommend using an IDE. I like to use Visual Studio Code and will use it throughout this tutorial. But if you prefer using the terminal on your machine or a different IDE, you won’t have any problem following along.
Finally, you must install the pandas and python-pptx Python libraries. You can install them by running the following command line in the terminal of your machine (if you’re running Python v3 just modify the below to pip3).
pip install python-pptx
pip install pandas
Step 2 - Prepare your dataset
For this tutorial, I’ll be working with a dataset I downloaded from Kaggle. It’s based on a survey of n=1,010 respondents and contains more than 150 questions ranging from music preferences to hobbies & interests.
However, we have to develop a standardized data structure for the auto-slides script to run. This means that every time you run the Python script to generate charts and slides, your data must always be in this format. The original Kaggle dataset is a raw, respondent-level format common in quantitative research. Unfortunately, this format isn’t ideal for what we’re trying to accomplish, so I needed to transform the original raw data into an aggregated structure.
There are a few ways you can streamline the process of aggregating this data, but I was able to get this done quickly by uploading the raw Kaggle dataset to ChatGPT and giving it the following prompt:
ChatGPT Prompt (using GPT-4)
The attached file is a raw respondent-level data file from a survey. In this file, each row (from row 2 onward) represents an individual respondent, while each column represents a variable (i.e. question) in the survey. The cell values for most questions are on a scale from 1 to 5.
The survey includes responses from n=1010 respondents and comprises 140 variables.
I want you to bulk-create aggregate tables of the sample proportion (as a % of total respondents) for every variable (from column B onward). For example, in the first variable (i.e. 'Music' in column B), the sample proportion (as a % of total respondents and ignoring blank values) is as follows:
1 - 0.80%
2 - 0.97%
3 - 3.63%
4 - 13.52%
5 - 81.07%I want you to help produce tables showing the sample proportion for all 140 variables in this data file. I also want the final output to be stacked, with each subsequent question placed below the previous one and all of the data spread across just two columns.
After ChatGPT helped me aggregate the data for all 140 questions, I created a simple six-column data structure, which you can view below.
Sample data structure for Python script
Here’s a summary of the data schema.
question_id - This unique identifier helps that script identify and separate questions in the dataset.
question_text - This is the question text from the survey. Note that if you're working with non-survey data, then you can use this field as a variable label.
response - This is the response option shown in the survey. Note that if you're using non-survey data, then you can use this field to cover different attributes of your data (e.g. date, month, country, etc)
value - This is the value for a particular data point. In the survey context, this is the selection rate (aka sample proportion) for the specific response option.
chart_type - This is where you can indicate the type of chart you want the script to use for the question. Note that there should only be one chart type per question_id. The script currently only supports bar (horizontal and vertical), line and pie charts, but the script can be modified to support more chart types.
chart_layout - This column tells the script which PPT custom layout to use. I've developed one custom slide layout that includes a chart placeholder, which I’ll cover in more detail later in this tutorial. Note that the script can be modified to support more layouts.
Later, when I write my prompt, I’ll describe this data schema to ChatGPT to recognize my dataset and import the variables into the correct placeholders within the PowerPoint template.
Step 3 - Prepare your PowerPoint template
Creating 100+ slides in seconds is great, but if the output isn’t in the template and format you need, you’ll just spend hours formatting your document. So, to make this automation truly valuable, we’ll need to templatize the output as well.
I downloaded this nice PPT template for free on Microsoft.com, but you can use your own template if you prefer.
Next, we’ll need to create a custom layout with a chart placeholder.
Open your template, click the View ribbon, and then Slide Master. On the left side, you’ll see one slide master containing 13 layouts.
From here, you can either create a new layout from scratch or duplicate an existing one. I like the second layout (called Title & Content), so I’ll duplicate it and rename it “Chart layout 1.” You can now edit this layout and make any changes you want, but you’ll need to insert a chart placeholder. To do this, delete the existing body text placeholder, click Insert Placeholder and select Chart.
Add a chart placeholder to your layout
After adding a chart placeholder, the slide layout should look something like this.
Now comes the tricky part. For the Python script to run, it needs to know exactly what slide layout and placeholder to port the data into from the Excel source file. The good news is that PowerPoint assigns a unique ID (known as an index value) to every placeholder in a PPT slide template. The bad news is that this information is not easily accessible within the PowerPoint interface. So, we’ll use a simple Python script to acquire the information we need.
Below is the script, you can also access the code on Github here. And take note of the line I’ve highlighted below:
from pptx import Presentation
# Load your presentation template
prs = Presentation("template.pptx")# Choose the slide layout index you're interested in
# Slide layouts are indexed starting from 0
layout_index = 0 # Change this to the index of the layout you're interested in# Get the layout
slide_layout = prs.slide_layouts[layout_index]# Print placeholder details for the chosen layout
print(f"Details for layout {layout_index}: {slide_layout.name}")
for placeholder in slide_layout.placeholders:
print(f"Placeholder index: {placeholder.placeholder_format.idx}, Type: {placeholder.placeholder_format.type}, Name: '{placeholder.name}'")
To run this script, name your PPT file “template.pptx” and make sure it’s in the same folder as the Python script. Now open VS Code, click File > Open Folder and navigate to the folder where the .py file and PPT template are. Then click Run Python File.
This script will only check one slide layout at a time. Since there are multiple layouts, you can manually adjust the value of layout_index (line 8) until you find the right layout. With its default value of 0, this returns the information of the first slide layout, called “Title Slide.”
If I adjust the layout index value to 1, it retrieves info for the 2nd slide layout, named “Title and Content.” Since I created my custom layout by duplicating the Title and Content slide (located directly after it), I can infer that the layout index value I’m looking for is 2.
I get the following output when I run this script for layout_index = 2.
Details for layout 2: Chart layout 1
Placeholder index: 0, Type: TITLE (1), Name: 'Title 1'
Placeholder index: 2, Type: DATE (16), Name: 'Date Placeholder 3'
Placeholder index: 3, Type: FOOTER (15), Name: 'Footer Placeholder 4'
Placeholder index: 4, Type: SLIDE_NUMBER (13), Name: 'Slide Number Placeholder 5'
Placeholder index: 10, Type: CHART (8), Name: 'Chart Placeholder 14'
Placeholder index: 11, Type: BODY (2), Name: 'Text Placeholder 13'
From the output above, I’m looking for the index value for the chart placeholder, which is Placeholder index: 10. Note this, as we’ll need this information later when we ask ChatGPT to write the script.
Now that we’ve created our PowerPoint template, it’s time to write the prompt.
Step 4 - Write and execute the prompt
Based on all my requirements, here’s the prompt I wrote.
ChatGPT Prompt (using GPT-4)
I want you to create an auto-slides generator script, written in Python, that automatically generates PowerPoint slides with charts from an Excel file. To do this, you can use the existing python-pptx library to automate the slide creation and the pandas library for chart creation and formatting.
When the auto-slides script is run, it should interrogate an XLS data file named “data.xls,” and a PowerPoint template file named “template.pptx.” Both of these files will be located in the same directory as the main Python script.
The “template.pptx” includes custom slide formatting and layouts to which the output file should adhere. I have created a custom slide layout named “Chart layout 1” which contains a chart placeholder. Note that the chart placeholder index value is 10.
The “data.xls” file contains the following 6-column structure:
1. question_id - This is a unique identifier for each variable (i.e. question) from the survey.
2. question_text - This is the original survey question shown to the respondents.
3. response - This shows the response option the respondent selected in the survey.
4. value - This is the sample proportion (aka selection rate) of all respondents in the survey.
5. chart_type - This is the chart type I want the data to be displayed as. Overall, I want the script to support three chart types: bar (both vertical and horizontal orientation), line, and pie charts.
6. chart_layout - This column determines which PowerPoint slide layout to use. I have created a custom slide layout named “Chart layout 1” which contains a chart placeholder. Note that the chart placeholder index value is 10.
Step 5 - Test the output
ChatGPT generated a simple Python script based on my prompt, and you can download it as well as the other working files below:
Save the Python script, PPTX template and XLS data file to a folder on your computer. Now open VS Code, click File > Open Folder and select the folder where all the working files are located. Then click Run Python File.
If everything is set up correctly, the script will generate a new PPTX file in your root directory called output_presentation.pptx.
Here’s what it will look like.
And there you have it! We’ve automatically generated 140 slides with charts.
However, based on the output I have a few ideas to make this even better. Here are some of the revisions I want to make.
1) Add support for a second slide layout
I want my script to support multiple custom layouts. To do this, I’ve created a second layout in my PPTX template called “Chart layout 2.” Then, in my data file, under column F (data_layout), I assigned some of the 140 variables to this new chart layout. Note that the placeholder index should be the same for both layouts. In my case, a placeholder index value of 10 will be valid for both layouts, so I don’t need to change this.
2) Add some custom formatting for the charts
The charts created from the script automatically inherited some formatting rules from the PPT template, such as the chart colours. However, I want to introduce my own custom formatting rules, including:
For bar charts, I want all bars to be one colour instead of multiple colours
For bar charts, I want to remove the chart axis
I want to remove the chart title (e.g. Series 1) for all chart types.
For all charts, I want to add data labels to each bar and format them as a whole number (e.g. show 50% as 50)
I want to remove the gridlines for all chart types
You can customize this list and create formatting rules to suit your needs.
3) Automatically generate simple text summaries for each slide
This one is a little more tricky. You can see from the screenshot above that I also included a text placeholder next to each chart. This can be useful if you want to add some observations about the data.
However, I was curious if I could also automate the text summaries for each slide and chart. I tried several different prompts, but ChatGPT struggled to give me something useful. Either the output was too structured and robotic, or ChatGPT would hallucinate some of the numbers mentioned in the summary.
I eventually found some success, but I had to give ChatGPT very specific instructions. Below is the prompt I used.
ChatGPT Prompt (using GPT-4)
The attached dataset contains the results of a survey. The data is aggregated, and the file contains the results for 140 questions, which are identified in column A (question_id). Each row corresponds to an individual response variable, so a single question can be spread out across multiple rows.
I would like you to help write a short one-sentence summary of the results for each question (140 in total).
I want you to follow the below format for the text summaries:
<value>% of respondents selected <most selected response> when asked <question text>
Here's an example for question_id = 1
20% of respondents selected strongly agree when asked, "I can't start my day without coffee."
When choosing which response option to use in the summary (e.g. strongly agree, very healthy, etc), please use the most selected response (i.e. the option with the highest value).
Please provide a CSV file with the text summaries of all 140 questions.
And here’s a snapshot of what ChatGPT gave me:
82% of respondents selected strongly agree when asked "I enjoy listening to music."
58% of respondents selected moderately paced music when asked "Do you prefer slow or fast paced music?"
32% of respondents selected neutral when asked "How much do you enjoy or not enjoy: Dance, Disco, Funk"
34% of respondents selected don't enjoy at all when asked "How much do you enjoy or not enjoy: Country"
28% of respondents selected neutral when asked "How much do you enjoy or not enjoy: Classical"
As you can see, this is fairly basic. I think with a little more effort, you might be able to get ChatGPT to generate more interesting and creative summaries. But I found this to be good enough for now.
You will also need to retrieve the index value for the text placeholder to which you want to port the summary. Simply follow the steps we used before to retrieve the index value for the chart placeholder. Based on my template, the index value for the text placeholder is 11. As before, we’ll need this value when prompting ChatGPT.
Now that I have my text summaries, I created a new column in my data.xls called text_summary, and I added the text summaries for all questions.
Updated schema for data.xls
Now, all I need to do is give ChatGPT the new requirements.
The follow-up prompt
Based on the new requirements listed above, here’s the follow-up prompt I gave ChatGPT.
ChatGPT Prompt (using GPT-4)
The script works well, but I want you to help add some new features.
1. I want to add support for a second chart layout. In the template.pptx, this layout is called "Chart layout 2," and in the data.xls file, the chart layout is indicated in the column titled "chart_layout." Note that this column determines which slide layout to use (i.e. Chart layout 1 or Chart layout 2). The chart placeholder index remains the same for both custom layouts (i.e. 10).
2. I want to add some custom formatting to the charts based on the following requirements:
2.1 For bar charts, I want all bars to be one colour instead of multiple colours. Let's use the colour HEX = #146082 as the default colour. And note that for Pie charts, you can continue to use multiple colours
2.2 For bar charts, remove the chart axis
2.3 Remove the chart title (e.g. Series 1) for all chart types.
2.4 For all charts, add data labels to each bar and format them as a whole number (e.g. show 50% as 50)
2.5 Remove the gridlines for all chart types3. I've added a new column to my data.xls file called "text_summary" in column G (see the attached data file). I would like to insert this as a text placeholder within the PowerPoint template. The placeholder index value I want the text_summary inserted into is 11.
Please update the Python script to incorporate these changes. Please do not make any unnecessary changes to the existing code unless such changes are needed to incorporate the latest requirements.
Based on this prompt, ChatGPT provided an updated script, which you can download below (as well as the updated template.pptx and data.xls).
You can also access all of the files via my Github repo here.
And here’s the updated slide output based on the revised code.
Conclusion
Using a combination of Python and AI, we automated the creation of a 140-slide PowerPoint deck, complete with charts and summaries, in a matter of seconds! So, if you’re in a bind and need to create a long, chart-heavy presentation, this automation may be just te thing you’re looking for.
But as I said earlier, there’s an art to giving great presentations, and that’s not something you can automate. So it’s important that you use this automation wisely, and that you understand when it’s appropriate to use.
I hope this article was helpful, and good luck!