William Suzuki

Data and Software Engineering | Data Science | Machine Learning



Build a base image
for all your Dockerfiles

In my tests, I have to use different Dockerfiles for different types of tests and most of those Dockerfiles have the same beginning. Usually installing something using apt-get install ...

Homicides and
Random Forest

A dark forest

The objective of this article is to explore and model the dataset of homicides in the state of Santa Catarina. For each month, we use random forest to predict in a hexagon polygon data frame. Check out the maps inside.

Markov Chain Monte Carlo
from Scratch

In this Jupyter notebook we build Gibbs and Metropolis-Hastings algorithms using only mathematical packages. It is challenging to build MCMC algorithms even for simple models. Check out the chain graphs inside.

real estate and
gradient Boosting

In this article we apply boosting and other models to predict real estate prices in the metropolitan region of São Paulo. We make maps showing the distribution of prices, and how housing characteristics impact price.

Black Box model
Explanation

In this article we propose a method to summarize the predictions made by random forest and gradient boosting models. We do this for two applications made in previous articles.

Random Forest
Exploration I

We simulate some data generating processes. Then we fit random forests in samples from the simulated DGP. We explore a classification model.

Bootstrapping
Exploration I

Here we explore how bootstraps change when we change parameters. Some of the parameters that we change are number of bootstraps samples B, sample size m of each bootstrap sample and dispertion of orginal sample (variance σ2 of DGP).

CEP to lat-long:
Python and Google Maps

In this Jupyter notebook we build an algorithm that with a list of the 5 first digits of CEPs we find its geographic coordinate. We use python with selenium and beautiful soup.

SPSAS learning from
Data Report

São Paulo School of Advanced Sciences hosted the SPSAS Learning from Data on August 2019. This is my report on the presentations. Some of the presenters are Abu-Mostafa, Ulisses Neto, Ling Liu, and Željko Ivezić.

Master's in Economics
Thesis

This is master's thesis, it is about understanding the realtionship between political institutions and economic development in municipalities of Brazil. I use a local spatial method called Geographically Weighted Regressions to explore the heterogeneity of the process.