Colin Villamana Projects

My Work

Portfolio

‍Objective:
This project employs Machine Learning algorithms, specifically K-Means Clustering and Linear Regression, to analyze the used car market. The goal is to understand the factors influencing car prices and use these insights to make a more informed purchasing decision following a recent car breakdown. The project is divided into three parts:

‍Step 1: Data Cleaning
Data is cleaned and prepared for analysis by handling missing values, removing irrelevant data, and correcting inconsistencies.

‍Step 2: K-Means Analysis
A K-Means clustering class is implemented in Python for reuse and visualization. Clustering is performed on variable combinations such as Price and Mileage, Price Drop, and others to identify patterns and groupings in the data. During the analysis it was found that an analysis on car mileage and price drop reveals three natural groupings; the most intriguing group consists of cars with low mileage but a significant price drop.
‍

‍Step 3: Regression Analysis
A regression analysis class is created for efficient exploration and visualization. Regression analyses are conducted on models considering factors like Year, Mileage, and Accidents to assess their impact on car prices. A final regression model is trained to predict prices and compare these predictions with actual market values.
‍

Connecticut Fishing Data Analysis

Objective
The main goal of this project is to uncover new fishing spots and insights into local fishing trends. The project is divided into three key stages: Data Collection, Data Cleaning, and Data Visualization.

Step 1: Data Collection
During the Data Collection phase, publicly available data is gathered using a web scraper, which accesses and downloads all town-level data in Connecticut. The web scraper saves all the towns listed on the website to a list. It then iterates through the list via a for loop, searching for each town in the search bar. The downloaded data file is renamed and moved from the Windows download directory to the current working directory.
‍

Step 2: Data Cleaning
In the Data Cleaning stage, all the collected data is combined into a single dataframe. The data undergoes wrangling and cleaning processes to ensure its accuracy and consistency. Any blank/NA rows are removed. To manage the large volume of data, it is filtered to include only the first 36 fish species out of the 90. A snippet of the distribution of all species found within the data can be seen below.
‍

Step 3: Data Visualization
Finally, in the Visualization stage, the cleaned data is transformed into a dynamic dashboard, enabling users to observe local fishing trends effectively. The entire process is designed for reusability, facilitating the retrieval and analysis of future fish data.

Dashboard Webinar

In April 2023 I had the privilege of hosting a webinar on dashboard design. The idea behind the webinar started after I suggested it the concept to the CEO of Practical Data Solutions. The Webinar covers the ways to visually improve dashboards, along with the dashboard design process. The webinar had 30+ Attendees.

Competitive Cost Analysis

Just recently I finished working on an analytics personal project about affordable food. The question I addressed is: how can a consumer reduce grocery costs? To answer this, I conducted competitive price analysis on multiple local grocery stores. I developed a Python web scraper that utilizes the following packages: Pandas, Selenium, and Beautiful Soup to scrape publicly available grocery data. The scraper pulls data from multiple different stores on Instacart, and compares the prices of the same goods that exist in all the stores. I then modeled the data in Microsoft Power BI. (Please note that for ethical reasons, some information in the dashboard has been slightly modified.)

Dashboard Generator

Currently I am in the process of developing a Python program that automates the creation of excel dashboards. Enclosed is a snapshot illustrating a sample output from this ongoing project. The program's functionality revolves around soliciting user input regarding size dimensions and desired metrics for display. Leveraging Openpyxl, it constructs the framework of an Excel dashboard accordingly. The primary aim of this endeavor is to streamline and automate the creative process involved in dashboard design. This project has been particularly enjoyable for me, as I'm deeply intrigued by the convergence of design and technology!
‍