Okay, here’s how I tackled that “jose reyes christina sanchez” thing I was messing around with.

Alright, so it all started when I stumbled across a dataset with those names, Jose Reyes and Christina Sanchez. I honestly don’t even remember exactly where I found it, some open data portal probably. I just thought, “Hey, those are pretty common names, I wonder what I can do with this?”
First thing I did was fire up my Jupyter Notebook. Classic, right? I imported Pandas because, you know, gotta wrangle that data. I loaded the data into a DataFrame and took a peek. It was a mess, like most raw data is. Missing values all over the place, inconsistent formatting… the usual suspects.
Data Cleaning 101:
- I started by dealing with the missing data. For numerical columns, I usually fill them with the mean or median. For categorical stuff, it’s often the mode or just a placeholder value like “Unknown.” In this case, I think I ended up using a combination of both depending on the column.
- Next up was standardizing the text. You know, making sure all the “Jose Reyes” entries were consistent – no random capitalization or extra spaces. I used the `.*()` and `.*()` methods for that.
Okay, so the data was slightly less of a headache now. Then, I wanted to find out some insights from these two individuals, Jose Reyes and Christina Sanchez. My first goal was to filter out the full dataset with records only for the two individuals.
Filtering and Exploring:
-
I created two new DataFrames, one for each person. I simply filtered the main DataFrame: `df_jose = df[df[‘name’] == ‘jose reyes’]` and `df_christina = df[df[‘name’] == ‘christina sanchez’]`. Super basic.
After that, I started actually digging into the data that was specific to each person. I looked at the distribution of their data across different categories, looking for patterns or anything interesting. For example, If the dataset contains info about locations, I used `value_counts()` to see where Jose and Christina are most often located and compare this to any other people in the dataset. Maybe Jose is always in one place and Christina somewhere else?
Visualization Time:

-
I used Matplotlib and Seaborn to make some plots. Nothing fancy, just bar charts, histograms, and maybe a scatter plot or two if it made sense. Visualizing the data is a must. It really help you spot the trends that the raw numbers just hide.
Honestly, after all that, there wasn’t really any HUGE revelation. It was more of a practice run, a way to brush up on my Pandas skills and play around with data manipulation. But hey, every little bit helps, right?
Finally, once I was “done”, I saved the cleaned data back to a CSV file and also saved the Jupyter Notebook as a PDF for future reference. Who knows, maybe I’ll revisit this project later with fresh eyes or a better idea of what I want to achieve.