[Profile picture of Ruben Verborgh]

Ruben Verborgh

Predicting train occupancies based on query logs and external data sources

by Gilles Vandewiele, Pieter Colpaert, Olivier Janssens, Joachim Van Herwegen, Ruben Verborgh, Erik Mannens, Femke Ongenae, and Filip De Turck

On dense railway networks—such as in Belgium—train travelers are frequently confronted with overly occupied trains, especially during peak hours. Crowdedness on trains leads to a deterioration in the quality of service and has a negative impact on the well-being of the passenger. In order to stimulate travelers to consider less crowded trains, the iRail project wants to show an occupancy indicator in their route planning applications by the means of predictive modelling. As there is no official occupancy data available, training data is gathered by crowd sourcing using the Web app iRail.be and the Railer application for iPhone. Users can indicate their departure & arrival station, at what time they took a train and classify the occupancy of that train into the classes: low, medium or high. While preliminary results on a limited data set conclude that the models do not yet perform sufficiently well, we are convinced that with further research and a larger amount of data, our predictive model will be able to achieve higher predictive performances. All datasets used in the current research are, for that purpose, made publicly available under an open license on the iRail website and in the form of a Kaggle competition. Moreover, an infrastructure is set up that automatically processes new logs submitted by users in order for our model to continuously learn. Occupancy predictions for future trains are made available through an API.

Full text BibTeX Mendeley

Published in 2017 in Proceedings of the 7th International Workshop on Location and the Web.

Keywords: Web, research

Read this paper online

Cite this paper in your publications

Discuss this paper