Disclaimer: Posts below are outdated and were archived for back compatibility: please use with care! These post do not reflect the author's current point of view and might deviate from the current best practices.

📦 [archived] Managing dependencies in packages

Managing usual dependencies of a package is clearly covered in R packages by Hadley Wickham. Typically, that would be the end of a tutorial or a post. However, teaching recently how to develop a package, I encountered a couple of super interesting and non-trivial questions that would not have a conventional solution. I guess this post would be a perfect place to share my thoughts on that meter, as well as a nice excuse to restart blogging.

Read More

🖊 [archived] R Coding Style Guide

Language is a tool that allows human beings to interact and communicate with each other. The clearer we express ourselves, the better the idea is transferred from our mind to the other. The same applies to programming languages: concise, clear and consistent codes are easier to read and edit. It is especially important, if you have collaborators, which depend on your code. However, even if you don’t, keep in mind that at some point in time, you might come back to your code, for example, to fix an error. And if you did not follow consistently your coding style, reviewing your code can take much longer, than expected. In this context, taking care of your audience means to make your code as readable as possible.

Read More

📁 [archived] Project-oriented workflow

Be honest with yourself, how many times have you wanted to restart an on-going project from scratch throwing away the current folder? Or how many times have you had to rename files and adjust folder structure to make your project simple and clear? Not to mention, all these thousands of versions of your scripts that are dangling around in your mail box. Tired of this? Then, get on board and read my comments on how to make your project reproducible, portable, and self-contained.

Read More

🌱 [archived] Setting a seed in R, when using parallel simulation

Generally speaking, if the code does any simulations, it is a good practice to set a seed to make the code reproducible. Setting a seed ensures that the same (pseudo-)random numbers will be generated each time the script is executed. Surprisingly, I found really few posts dedicated to any convention, best practice, or routine of setting a seed in R. Further, when using multiple cores (parallelisation) for simulations, things can get slightly more complicated.

Read More

💾 [archived] Installing MySQL on MacOS (and using it with R)

A couple of days ago I was asked to install MySQL on MacOS 10.13, and I was surprised that it was not a one-click installation, as in case of R. Unfortunately, even for me a documentation was a bit confusing, and I think it might be useful to have a guide of the installation process.

Read More

📈 [archived] Simulating Poisson process (part 1)

A couple of weeks ago a colleague of mine asked me for a help to estimate Gerber-Shiu function by Monte-Carlo methods. The function is used in ruin theory for risk processes. One can think about this function as of equialence to a moment generating function. That is if the function is known, it is easy to derive a certain measurments of interest, for instance, a ruin probability. My colleague wants to estimate this function for an extenssion of Cramér–Lundberg model that includes positive jumps (capital injections). From the first glance it seems as a trivial task, but when I started approaching it, this problem turned out to be not so easy to solve.

Read More

📊 [archived] Multinomial regression in R

In my current project on Long-term care at some point we were required to use a regression model with multinomial responses. I was very surprised that in contrast to well-covered binomial GLM for binary response case, multinomial case is poorly described. Surely, there are half-dozen packages overlapping each other, however, there is no sound tutorial or vignette. Hopefully, my post will improve the current state.

Read More

🔬 [archived] Dortmund real estate market analysis: neural networks

At every turn in a non-technical post about AI for broader audience an author deems their duty to mention a deep learning as panacea for all woes. Well, it’s not. Deep learning is just one of various models, which might or might not perform better then the other techniques. At the end of the day, in a nutshell, it’s just regular neural networks with multiple hidden layers between the input and output layers (well, it’s rather a oversimplification, but you got it right). In this post I am curious whether it’s possible for neural networks approach to beat our best model so far (GAM with response’s inverse Gaussian distribution).

Read More

🌳 [archived] Dortmund real estate market analysis: tree-based methods

In pervious posts traditional regression models were fitted to real estate data. In this post tree-based models, namely random forests and gradient boosting, are trained to predict prices of the rent. These methods typically outperform traditional regression models yielding smaller errors. Furthermore, tree-based methods are much more robust to overfitting, which makes them superior in terms of prediction. However, the main disadvantage (and the reason why there is no love in insurance industry) is difficulties with interpretability.

Read More

📐 [archived] Deming versus simple linear regression

All courses that somehow covered regression models were starting almost in the same way: given bunch of $y$’s and $x$’s points, one needs to predict a value of $y$ for a certain $x$. Sounds quite easy. Without utilizing any statistical assumptions, we can just find the line, which is in a way closest to those points (best fit line).

Read More

📣 [archived] Notes from R in insurance 2017

This year the fifth R in insurance conference was in ENSAE, Paris. The first impression was: “Wow, that’s a lot of people. Much more than the last year”, and I hope my estimation is not biased. Thanks to organizers that was, indeed, a true pleasure to be on both sides, as a presenter, as well as a speaker. I really love that unique atmosphere and mixed audience: not many conferences offer the feedback from the academic and industry perspective at the same time.

Read More

🎓 [archived] Insights from students data

Recently (well, a month ago) I had a discussion with a friend of mine about the modern tools and approaches in education. He is currently involed to the edX platform startup, and given that I am assistant at the university, we had several points to discuss.

Read More

💻 [archived] Bringing together R and Shell

I believe in our era of RStudio and interactive data analysis, R scripts rarely needed to be run from Shell. The same applies to the opposite: executing Shell commands from R is quite uncommon. However, some cases exist for which this is necessary.

Read More