Поиск Почта Карты Маркет Новости Словари Блоги Видео Картинки
компания → интернет-математика
Войти

Task and Datasets

Task description

The goal of the contest Internet Mathematics 2010 is to predict the rate of traffic congestion based on previous observations. The data provided for participants in the contest are a graph of Moscow streets and observation information – the speed of traffic flow on segments of streets during one month. The task is to predict the rate of traffic congestion on the last day of the month.

Data sets

Data sets provided for the contest consist of two components: graph of streets and data on traffic flow.

Street graph

Moscow’s street intersections are represented by vertices, while sections of the city’s streets correspond to edges (a two-way street is represented by two bidirectional edges). This graph of streets differs from a 'natural' street graph in that some of its vertices and edges have been duplicated to allow for traffic regulations, prohibited turns or closed roads. Thus, this 'modified' graph allows for any route. There are three graph description files: vertices.txt (vertices description), edges.txt (edges) and edge_data.txt (edges-streets properties).

The vertices.txt file contains all vertices IDs (first column) together with their groups (second column). For example:

0 0
1 1
2 2
3 3
40 42
41 42
42 42

In this example, vertices 0, 1, 2, 3 are 'proper' vertices, while vertices 40, 41, 42 are parts of vertex 42, which means they represent one and the same road intersection. All in all, this file contains 146,625 vertex descriptors corresponding to 40,420 vertex groups (real-life road intersections).

Information in edges.txt is presented as:

317744 317744 42 44
317745 317744 41 44
317746 317746 46 40
317747 317746 45 40
317800 317800 135 136
317856 317856 224 226
317857 317856 222 226
317859 317859 229 221
317860 317859 227 221

The first column contains edge IDs, the second column provides edge group IDs (edges belonging to the same group correspond to one and the same 'real' street), the third has the starting vertex IDs, and the fourth has the ending vertex IDs. All in all, the graph has 206,289 edges corresponding to 86,228 real-life edges.

The edge_data.txt file contains edge (street) properties. For example:

317744 39.93 30.0 
317746 39.93 30.0 
317800 14.41 20.0 
317856 170.42 30.0 
317859 170.42 30.0

The first column contains edge group IDs (all edges in a group have the same properties, as all of them represent on the same street). The second column is length of a section of a street in meters. The third is the 'usual' speed of traffic flow (km/h) in this street, or 'traffic flow capacity'.

Traffic jams data sets

The jams.txt file contains observation data for 31 days. For the first 30 days the available information is the speed of traffic flow from 16:00 to 22:00 at a four-minute interval; for the remaining 31st day the traffic speed information is for the period from 16:00 to 18:00. File format:

317744 11 16:26 62
317744 11 16:30 62
317744 11 16:34 62
317744 11 16:40 63

The first column shows edge group IDs (all edges in a group have the same speed, as they are one and the same street). The second column has time stamps as 'day hour:minute' (days are numbered from 11 trough 41). The third column contains speed of traffic flow (km/h), where zero means traffic standing still.

The task is to replace '??' in task.txt with speed predictions for traffic flow on specific edges (streets) at specific times:

317744 41 18:22 ??
317744 41 18:26 ??
317744 41 18:30 ??
317744 41 18:34 ??

This file contains, in all, 691,641 lines, which means that participants have to make 691,641 speed predictions.

Evaluation

Result evaluation metrics formalizes two desiderata for the quality of traffic flow predictions: length of the street for which a prediction is made and prediction period. Predictions for longer streets and longer time periods score better in final evaluation.

Quality of results is evaluated according to this formula:

Formula, where

n – total number of predictions,
kl – 'length factor': length of a street relative to the average street length (120 m),
kt – 'time factor': 1 + 0.1*sequential number of a four-minute interval starting from 18:00 (for example, for 18:56 kt = 1 + 0.1*14 = 2.4),
v* – observed speed,
v – predicted speed.

Submitted results are evaluated in two stages: public evaluation based on 62,377 predictions and final evaluation based on the remaining 629,264 predictions. Public evaluation results are used to calculate the current rating. Final evaluation results are used to finalize the contest and choose the winner. The final rating may differ from the rating based on public evaluation of results.

Download data

Data is available only for personal use and exclusively for participation in the Internet Mathematics 2010 contest.

Download archive .zip (100 MB)