Miloš Švaňa

ai, python, software engineering, decision-making and more

Data analysis

Here you’ll find Jupyter Notebooks analyzing data I found interesting.

University rankings and GDP

I first answer some basic questions like what are the countries and continents with the best universities. Then create a simple regression model predicting GDP per capita from average university rankings and innovation index.

I use a version of this model to identify “heavy hitters” and countries with “untapped potential”. Heavy hitters are countries whose GDP per capita is higher than predicted from their university ranking and innovation index. Countries with untapped potential are those whose GDP per capita is lower than their university ranking and innovation index would suggest.

If I look at countries in my region, I see that Austria, Czechia, and Slovakia are all heavy hitters. Poland and Hungary have untapped potential.

The notebook also demonstrates how to you can combine multiple datasets to discover interesting insights.

Crossroad traffic in Ostrava

In this notebook I explore traffic on Crossroads in Ostrava. Most interesting findings include a discovery that Friday is actually the least busiest day of the week. I also discovered that out of the 4 analyzed crossroads, the traffic on the crossroad Novinářská x 28. října is the hardest to predict.

I also used a random forest regression model to predict traffic for the next day. The median prediction error was about 460 cars or 15%.

I experimented with using news from the ostrava.cz websites as an additional source of predictors. I turned the news texts into embeddings, used GPT to determine time intervals related to a given news article and then calculate a an embedding for each date as a weighed average of individual news embeddings. I then used PCA to get top 10 components which were then used as additional features. Unfortunatelly, this extension didn’t improve the model at all.