EML to CSV via ChatGPT Python

Running Outlook on a Mac is ugly business.

For instance, I found out I couldn’t export a folder of emails in any functional way. I was able to drag/drop selected emails into a folder where they became eml files.

I wanted a CSV of data from the emails for reporting purposes. I figured I’d try chatGPT. My prompt was-

How can I convert a folder of eml files into a csv summary on a Mac?

It gave some instructions and the following code.

import os
import pandas as pd
from email import policy
from email.parser import BytesParser

# Folder containing .eml files
eml_folder = 'path/to/eml_folder'

# List to hold email data
email_data = []

# Loop through all .eml files in the folder
for filename in os.listdir(eml_folder):
    if filename.endswith('.eml'):
        filepath = os.path.join(eml_folder, filename)
        with open(filepath, 'rb') as file:
            msg = BytesParser(policy=policy.default).parse(file)
            sender = msg['From']
            subject = msg['Subject']
            date = msg['Date']
            content = msg.get_body(preferencelist=('plain')).get_content()

            # Append email data to list
            email_data.append({
                'Sender': sender,
                'Subject': subject,
                'Date': date,
                'Content': content
            })

# Create a DataFrame and save to CSV
df = pd.DataFrame(email_data)
df.to_csv('emails_summary.csv', index=False)

To make this work, I needed to understand how file paths worked, some basic terminal commands, and how to read the initial failure errors. It helped that I also understood how if/else statements work and that indent levels might impact whether certain programming languages will run.

It failed initially when some of the emails didn’t have body content (or there was some other complexity there). While I probably could have asked chatGPT to implement the fix, I did it myself and things worked the way I needed.

import os
import pandas as pd
from email import policy
from email.parser import BytesParser

# Folder containing .eml files
eml_folder = 'test'

# List to hold email data
email_data = []

# Loop through all .eml files in the folder
for filename in os.listdir(eml_folder):
    if filename.endswith('.eml'):
        filepath = os.path.join(eml_folder, filename)
        with open(filepath, 'rb') as file:
            msg = BytesParser(policy=policy.default).parse(file)
            sender = msg['From']
            subject = msg['Subject']
            date = msg['Date']
            if msg.get_body(preferencelist=('plain')):
                content = msg.get_body(preferencelist=('plain')).get_content()
            else:
                content = 'not found'
                # Append email data to list
            email_data.append({
                'Sender': sender,
                'Subject': subject,
                'Date': date,
                'Content': content
            })

# Create a DataFrame and save to CSV
df = pd.DataFrame(email_data)
df.to_csv('emails_summary.csv', index=False)

I don’t write in Python with any regularity, but I can read and alter simple things like this easily enough. It’s also where I feel like the LLMs are probably most helpful. I know some stuff. I am not trying anything really complex. I’m just trying to get a simple task done quickly.

Leave a Reply