In this post, I’ll dive deeper into the parallels between this energy project and business operations, and how the various roles—such as product management1, business intelligence, and data engineering2—would collaborate to deliver a successful product. This project was not just about finding the right energy plan but about building a data-driven solution that mimicked real-world business decision-making processes.
I’ll also discuss how we (yes, I am referring to myself as we from now on) identified the problem, created a roadmap, and approached data collection, especially when reliable historical data wasn’t readily available. Plus, I’ll share how this project’s framework mirrors what a product manager would do when addressing a business problem.
The Challenge: No Historical Usage Data in a New Neighborhood
Like many new homeowners, I was told I had just a few days to choose an electricity provider. With over 60 options, each with its own contract terms, fees, and pricing structures, the decision felt overwhelming. On top of that, there was no historical energy usage data for my home because it was part of a new construction neighborhood. I couldn’t simply rely on neighborhood patterns to predict my household’s consumption.
Here’s where product management thinking came into play. I needed to assess the business problem: how can we choose the best energy plan with minimal historical data and a tight deadline? The issue was clear: I had to develop a strategy that would not only gather the necessary data but also ensure that this data would enable a well-informed, future-proof decision.
Trial Period: Data Collection with a No-Contract Plan
We decided on a no-contract energy plan from TXU Energy3, which gave us a two-month trial period to collect real usage data. Although this plan came with a higher cost, it provided the flexibility we needed to gather sufficient data before committing to a long-term contract.
From a business perspective, this is akin to conducting an exploratory phase in a product lifecycle. Instead of rushing to implement a solution based on incomplete information, we took a measured approach by investing in short-term flexibility to collect the data that would drive better long-term decisions.
This also established the roadmap for the project:
- Data Collection Phase: Use the no-contract plan to gather energy consumption data.
- Data Analysis Phase: Analyze the collected data and create a predictive model.
- Decision Phase: Choose the best long-term energy plan based on our findings.
Product Management in Action: Identifying Data Dependencies
At this stage, I needed to determine what factors would influence our home’s energy consumption. This part of the project aligned closely with the role of a product manager in a business setting, where identifying key dependencies is crucial for building a solid product.
We categorized our home’s energy usage into two types:
- Fixed Energy Usage: Appliances such as dishwashers, washing machines, and dryers that have consistent energy consumption patterns.
- Dependent Energy Usage: Usage driven by external factors, specifically weather (temperature), which influences the HVAC system.
I realized that separating the fixed and variable energy consumption was critical for building a predictive model. This segmentation was necessary because the supervised learning model we planned to use required accurate data that reflected true patterns of HVAC usage influenced by weather, not by appliance use.
Data Science Consultation: Choosing the Right Model
As part of the consulting phase, it became clear that this was a perfect case for a supervised learning model that would predict continuous variables—in this case, the kilowatt-hour (kWh) usage for any given hour. This model would rely on historical weather data to forecast energy usage.
We consulted with the data science team4 to confirm that the best approach would be to build a regression model, trained on real energy usage and weather data. However, the challenge wasn’t just to build a model that worked for the two-month trial period—we needed to forecast energy usage for an entire year. This meant gathering weather data for the rest of the year, beyond the trial period.
To accomplish this, we planned to scrape weather data from a public weather website5. The website provided hourly weather data, which we would scrape systematically, collecting temperature and other key factors over several months. This data would become the training set6 for the predictive model, while the test set would be a partition of the same data, ensuring the model could forecast energy usage accurately for the remaining months.
Trial by Error: Collecting Data for the Predictive Model
The data collection process required both real-time usage data and historical weather data. Thanks to TXU Energy’s dashboard, we were able to export hourly energy usage data (albeit limited in detail) for the trial period. Combined with the scraped weather data, we had enough information to start building our model.
But this was just the beginning. As a product manager would, I had to ensure the data collection process would cover all necessary bases. I needed to ensure that the ETL pipelines7 were designed to continuously pull in and process the weather data for every hour of every day—especially as the project moved forward. I collaborated closely with the data engineering team to ensure that the data collection and transformation processes were working efficiently.
Roadmap Execution: Data Engineering and ETL Pipeline Design
The project roadmap made it clear that the next step was building the ETL pipelines for data collection. Once I gathered the real data from the no-contract trial, we would need to develop a pipeline that would automate the process of continuously collecting weather and energy usage data. The weather data would need to cover an entire year, so the pipeline had to be robust and flexible.
At this stage, I worked closely with the data engineering team to ensure the pipelines were built in a way that would allow the model to scale. This phase was about communication: monitoring the development process and making sure all teams understood the dependencies and goals.
Conclusion: Lessons in Product Management and Data Science
This project is a prime example of how product management principles, data science, and data engineering come together to solve a complex business problem. From identifying the initial issue to building a roadmap and executing on it, the project required cross-functional collaboration and constant iteration to get things right.
By focusing on the right data and making sure the teams were aligned, we were able to build a predictive model that would make future energy decisions easier and more cost-effective. Whether it’s forecasting energy usage for a new home or building a scalable data solution for a business, the core principles of road mapping, data dependencies, and collaboration remain the same.
In my next post, I’ll cover how we monitored the ETL pipeline and ensured the model was deployed successfully. Stay tuned!
- https://www.atlassian.com/agile/product-management/product-manager ↩︎
- https://www.ibm.com/think/topics/data-engineering ↩︎
- https://residential.txu.com/Handlers/PDFGenerator.ashx?comProdId=CPXFLXFW0000BL&lang=en&formType=EnergyFactsLabel&custClass=3&tdsp=CENTERP ↩︎
- https://www.datasciencecentral.com/what-is-data-science-2/ ↩︎
- https://www.wunderground.com/history/daily/us/tx/houston/KIAH/date ↩︎
- https://machinelearningmastery.com/difference-test-validation-datasets/ ↩︎
- https://www.ibm.com/topics/etl ↩︎
Leave a reply to Proxy Management and Dynamic Web Scraping: The Data Engineering Journey – Analytics to Action Cancel reply