This initiates a proof-of-concept (POC) analysis of US electric energy generation data, a topic with ties to environmental as well as economic-infrastructure concerns. The study is internally-funded and uses public-domain data. Source code and results will be published under a GNU GPL license.
I have a long-standing interest in the area, dating back to my time as a chemical engineering student interested in solar-power generation. As a professional consultant, I have enjoyed working with economists who model electric power generation under different policy scenarios (e.g. carbon taxes). (If you are interested, I’ll be happy to refer you to experts for additional details on energy policy modeling.)
Scope of the proof-of-concept study:
- Data sources.The key data sources are the excellent data sets on electricity power generation and consumption maintained by the US Energy Information Administration (EIA).
- These data sets are not large by modern standards, but the data are heterogeneous and largely categorical in nature, so analysis is not entirely straightforward.
- Energy generation model. My model captures only the essential concepts of minimizing energy production cost, subject to generation constraints. It is a very simplified version of the NEMS model, also maintained by the US Energy Information Administration.
- I’m very candidly simplifying this aspect, as this proof-of-concept focuses on data analysis and reduction.
- Software tools. I am using SQL Server 2014/2016 as a database platform, and R as an analysis platform.
- Microsoft Excel, version 2010 or later. The EIA data sets are supplied in Microsoft Excel, and imported (nearly) as-is into Microsoft SQL Server.
- SQL Server 2012 or later. The free SQL Server Express will be sufficient if you would like to load the data. Of course there are other RDB platform choices – the data can also be imported into MySQL.
- R version 3.0.2 or later. The RODBC package is used exclusively for database access, I’ll call out the analysis packages later
- Microsoft Windows 7, 2008 R2, or later.
Goals for the proof-of-concept study:
- Better understand how electric power is currently generated. This has implications for economic infrastructure, as well as environmental concerns.
- For economic policy models:
- Reduce the size and complexity of electricity power-generation data, if this can be done without impacting model predictions.
- Provide an indication of the data variables (singly or in combination) that drive model outputs.
- Offer additional interpretations for model inputs (particularly generators) and outputs
- Document the process and time spent for the case study to show where effort is expended and where challenges arise in a practical analysis example.
- Explore different options for certain analytics problems such as mixed-effects modeling.
Usage and distribution: I’ll publish source code and results under a GNU GPL license. Essentially: you’re welcome to the results if you find them useful, you acknowledge the origins of source code and data, and don’t turn your work into a commercial enterprise.
Acknowledgement: my friend and colleague Vadim Koganov is providing database, design, and software expertise as part of this effort.