Okay, so I was messing around with tennis data the other day, trying to get a handle on how the game’s changed over the years. I decided to look at stuff from 1968, which is kind of a big deal year ’cause it’s when the Open Era started. You know, when professionals could finally play in the big tournaments.

First thing I did was try to find some data. I mean, where do you even start? I poked around a bunch of different websites and finally found some CSV files with match stats. It was a bit of a mess, to be honest, different formats, missing info, all that fun stuff.
Next, I pulled the data into Python. I use Python for pretty much everything data-related. I used the Pandas library, it’s awesome for working with tables of data. I had to do a lot of cleaning. I mean, a lot. There were rows with missing values, weird date formats, you name it. I spent a good chunk of time just getting the data into a usable state.
Here’s some of the things I checked:
- Average match length over time. I figured this would show if matches were getting longer or shorter.
- The number of aces per match. Has serving gotten way more powerful?
- Win percentages of the top players. Were the top players more dominant back in the day, or is it more competitive now?
- The distribution of match scores (like, how often do we see tiebreaks?).
- Surface changes,like are there less match on grass surface?
I used Matplotlib and Seaborn to make some graphs. Those libraries are great for visualizing data, and make some nice looking charts and the like.
It was actually pretty interesting to see how some things really changed. Like you can definitely see the average match times are different, and the number of aces definitely goes up over time. Some things, surprisingly, don’t change that much.
Problems I hit
The biggest pain was just getting all the data together and making sure it was consistent. It took way longer than I expected. Also, some of the older data was just plain incomplete. There are some stats I just couldn’t track all the way back because they weren’t recorded reliably.
I’m still playing around with it, and I’m thinking of putting together some more polished visualizations and writing up my full analysis. For now, it’s just cool to dig into the history of the game and see it all laid out in the numbers. I did it!