Years of the project: 2015 - 2016
My master's thesis in Applied Mathematics, which I successfully defended on June 17, 2016.

Nocturnal hypoglycemia problem is one of the problems of AMMODIT project. I and my colleagues made mathematical models to solve this problem. We used several approaches (neural networks and decision trees) and compared models results. DirecNet CGM data from one of researches was used as reference data for modelling. Most modelling work was done in R (major part of data cleaning and filtering implementation, predictive models) and Python (minor part of data cleaning and filtering implementation).

What I have done during this project:
- Cleaned DirecNet's data from noise and outliers (ad-hoc cleaning procedure, which was implemented specially to filter wrong blood glucose level measurements);
- Filtered most significant measurements for each day (last before bed; before meal influence and peak meal influence measurements during breakfast, dinner and supper);
- Aggregated significant blood glucose level measurements with patient's physiological data (weight, height, age, gender, etc.);
- Tried several different formats of input data, compared predictive power of models and selected input data format that allowed to generate models with the best predictive power (we measured and compared predictive power of models using Matthews correlation coefficient);
- Implemented models generation using several decision trees approaches (C4.5, CART, Bagging with CART, Boosting with CART, Random Forests with CART).

Results of work:
The best decision tree based model's predictive power (measured using MCC) is 48% better for DirecNet data, if compared with results of the best existing model DIAppvisor_risk. This means, the best decision tree model, which I made, can predict 48% more nocturnal hypoglycemia events than DIAppvisor_risk.

Values of metrics can be found in the table below (larger is better):

MethodTPRTNRPPVNPVACCF1MCC
DIAppvisor_risk0.440.85 0.460.830.750.450.29
DIAppvisor0.74 0.550.330.870.590.450.24
CART0.570.81 0.44 0.870.76 0.50.34
C4.5 0.30.930.540.830.80.390.3
Random Forest0.30.960.670.840.820.420.36
AdaBoost0.39 0.950.690.850.830.50.43

Such results were achieved due to usage of additional physiological data, which DIAppvisor_risk ignores, and machine learning techniques instead of primitive predictors, which were used in DIAppvisor_risk.

During this project I learned machine learning techniques a bit better, especially decision trees based techniques.