HuMob Challenge 2023

Final Ranking

HuMob Challenge Workshop (Nov 13 @ ACM SIGSPATIAL)

08:00-08:10	Opening Remarks, Takahiro Yabe, MIT
08:10-08:40	Description of HuMob Data, Toru Shimizu, Yahoo Japan Corporation
08:40-09:25	Session 1
08:40-08:55	Human Mobility Prediction Challenge: Next Location Prediction using Spatiotemporal BERT (Team #26, uclab2023) Haru Terashima, Nagoya University; Naoki Tamura, Nagoya University; Kazuyuki Shoji, Nagoya University; Shin Katayama, Nagoya University; Kenta Urano, Nagoya University; Takuro Yonezawa, Nagoya University; Nobuo Kawaguchi, Nagoya University
08:55-09:10	Modeling and generating human mobility trajectories using transformer with day encoding (Team #32, KDDIdbcg) Akihiro Kobayashi, KDDI Research, Japan; Naoto Takeda, KDDI Research, Japan; Yudai Yamazaki, KDDI Research, Japan; Daisuke Kamisaka, KDDI Research, Japan;
09:10-09:25	GeoFormer: Predicting Human Mobility using Generative Pre-trained Transformer (GPT) (Team #51, GeoFormer) Aivin Solatorio, The World Bank
09:25-09:35	Short Break
09:35-10:35	Session 2
09:35-09:50	Large-Scale Human Mobility Prediction Based on Periodic Attenuation and Local Feature Match (Team #09, 3S-CrMap) Xiaogang Guo, Wuhan University; Guangyue Li, Wuhan University; Zhixing Chen, Wuhan University; Huazu Zhang, Wuhan University; Yulin Ding, Wuhan University; Jinghan Wang, Wuhan University; Zilong Zhao, Wuhan University; Luliang Tang, Wuhan University
09:50-10:05	Personalized human mobility prediction for HuMob challenge (Team #29, mukumuku) Masahiro Suzuki, Sophia University, Japan; Shomu Furuta, Sophia University, Japan; Yusuke Fukazawa, Sophia University, Japan
10:05-10:20	Estimating future human trajectories from sparse time series data (Team #33, MOBB) Ryo Koyama, Independent, Japan; Meisaku Suzuki, Independent, Japan; Yusuke Nakamura, Independent, Japan; Tomohiro Mimura, Independent, Japan; Shin Ishiguro, University of Tokyo, Japan
10:20-10:35	Multi-perspective Spatiotemporal Context-aware Neural Networks for Human Mobility Prediction (Team #87, GIS4Fun) Chenglong Wang, Peking University; Zhicheng Deng, Peking University
10:35-11:00	Coffee Break
11:00-12:00	Session 3
11:00-11:15	Cell-Level Trajectory Prediction Using Time-embedded Encoder-Decoder Network (Team #62, AISTDPRT) Taehoon Kim, National Institute of Advanced Industrial Science and Technology, Japan; Kyoung-Sook Kim, National Institute of Advanced Industrial Science and Technology, Japan; Akiyoshi Matono, National Institute of Advanced Industrial Science and Technology, Japan
11:15-11:30	Forecasting Urban Mobility using Sparse Data: A Gradient Boosted Fusion Tree Approach (Team #56, Haoyu) Haoyu He, Northeastern University; Xinhua Wu, Northeastern University; Qi Wang, Northeastern University
11:30-11:45	Batch and negative sampling design for human mobility graph neural network training (Team #54, Jiaxin) Jiaxin Du, Texas A & M University; Xinyue Ye, Texas A & M University
11:45-12:00	Wrap up discussion – Future of Human Mobility Data
12:00-12:10	Awards Ceremony, Kota Tsubouchi, Yahoo Japan Corporation

Why do the Challenge?

Understanding, modeling, and predicting human mobility trajectories in urban areas is an essential task for various domains and applications, including transportation modeling, disaster risk management, and urban planning. Traditionally, travel surveys and census data have been utilized as the main source of data to understand such macroscopic urban dynamics. The recent availability of large-scale human movement and behavior data collected from (often millions of) mobile devices and social media platforms have enabled the development and testing of complex human mobility models, resulting in a plethora of methods published in computer science venues such as ACM SIGSPATIAL (see [1] for review).

However, human mobility prediction methods are trained and tested on different datasets due to the lack of open-source and large-scale human mobility datasets amid privacy concerns, making it difficult to make fair comparisons of other methods’ performances. The lack of large-scale open-source datasets has been one of the key barriers hindering the progress of human mobility model development. In this workshop, we will host a data challenge using a synthetic but realistic human mobility dataset of 100K individuals’ trajectories across 90 days in a metropolitan area provided by Yahoo Japan Corporation. Participants will develop and test methods to predict human mobility trajectories using the provided open-source dataset.

The Challenge

The challenge takes place in a mid-sized and highly populated metropolitan area, somewhere in Japan. The area is divided into 500 meters x 500 meters cells, which span a 200 x 200 grid, as shown in Figure 1. The human mobility datasets (‘task1_dataset.csv.gz’ and ‘task2_dataset.csv.gz’) contain the movement of a total of 100,000 individuals across a 90-day period, discretized into 30-minute intervals and 500-meter grid cells. The first dataset contains the movement of a 75-day business-as-usual period, while the second dataset contains the movement of a 75-day period during an emergency with unusual behavior.

There are 2 tasks in the Human Mobility Prediction Challenge, as shown in Figure 2. In task 1, participants are provided with the full time series data (75 days) for 80,000 individuals, and partial (only 60 days) time series movement data for the remaining 20,000 individuals (‘task1_dataset.csv.gz’). Given the provided data, Task 1 of the challenge is to predict the movement patterns of the individuals in the 20,000 individuals during days 60-74. Task 2 is a similar task but uses a smaller dataset of 25,000 individuals in total, 2,500 of which have the locations during days 60-74 masked and need to be predicted (‘task2_dataset.csv.gz’).

While the name or location of the city is not disclosed, the participants are provided with points-of-interest (POIs; e.g., restaurants, parks) data for each grid cell (85-dimensional vector) as supplementary information (which is optional for use in the challenge) (‘cell_POIcat.csv.gz’). The predicted human movement trajectories will be evaluated against the actual trajectories and the accuracy using the GEO-BLEU metric [2] as well as Dynamic Time Warping (DTW) [3]. Python implementations of the evaluation metrics will be provided on Yahoo Japan’s GitHub page (https://github.com/yahoojapan/geobleu).

Figure 1: Spatial layout of the grid cells of the target area

Figure 2. Schematic of the 2 tasks in the Data Challenge

Dataset download

The data challenge participants will be provided with 3 datasets – HuMob datasets #1 and #2 (which are derived from the original human mobility dataset), and the POI dataset which may be used to supplement the prediction of human mobility.

The data may be downloaded from https://zenodo.org/record/8111993. For teams to be granted access to the data, teams should request access via the Zenodo website by providing the name and email address of the lead investigator, and the following information in the ‘Justification’ box:

full list of members (name, institution, email address)
team name (alphabets and numbers only, keep it <10 characters)

Upon approval by the organizing team, the data will be available for download. If you do not receive the data approval within 24 business day hours, please contact humob2023@gmail.com with your information.

Details about the datasets can be found in: Yabe, T., Tsubouchi, K., Shimizu, T., Sekimoto, Y., Sezaki, K., Moro, E., Pentland, A. (2023). Metropolitan Scale and Longitudinal Dataset of Anonymized Human Mobility Trajectories. arXiv preprint arXiv:2307.03401. http://arxiv.org/abs/2307.03401

Please cite this document in your submissions and future work where the HuMob data is used.

Participants shall not carry out activities that involve unethical usage of the data, including attempts at re-identifying data subjects, harming individuals, or damaging companies. Participants will be allowed to submit full versions of their works to venues of their choice, upon approval by the organizers.

Provided Datasets

HuMob dataset #1 (task1_dataset.csv.gz): Contains the movement of 100,000 individuals in total during a business-as-usual scenario. 80,000 individuals’ movements are provided completely (for 75 days), and the remaining 20,000 individuals’ movements for days 61 to 75 are masked as ‘999’. The challenge is to use the provided data to predict the masked cell coordinates (i.e., replace the ‘999’s).
HuMob dataset #2 (task2_dataset.csv.gz): Contains the movement of 25,000 individuals in total during business-as-usual (60 days) and emergency (15 days) scenarios. 22,500 individuals’ movements are provided completely (for 75 days), and the remaining 2,500 individuals’ movements for days 61 to 75 are masked as ‘999’. Similar to task 1, the challenge is to use the provided data to predict the masked movement coordinates (i.e., replace the ‘999’s).
POI dataset (cell_POIcat.csv.gz): To aid the prediction task, we have prepared an auxiliary dataset that provides the count of different points-of-interest categories in each grid cell (e.g., restaurants, cafes, schools). However, to maintain anonymity of the location, we are not able to provide the actual category name that corresponds to each dimension. Therefore, each cell has an 85-dimensional vector.

This data contains movement information generated from user location data obtained from Yahoo Japan Corporation smartphone applications. It does not reveal the actual timestamp, latitude, longitude, etc., and does not identify individuals. This data can only be used for the purpose of participating in the HuMob Challenge 2023. Questions concerning this data should be sent to yjresearch-data@mail.yahoo.co.jp

Evaluation Metrics

GEO-BLEU [2], a metric with a stronger focus on local features, as in similarity measures for natural language sentences. Python implementation for the GEO-BLEU metric can be found at https://github.com/yahoojapan/geobleu.
- Paper with the details on the GEO-BLEU metric can be found in this paper: ACM SIGSPATIAL version | arXiv version
- Trajectories will be evaluated using GEO-BLEU day-by-day. Examples are shown on the GEO-BLEU github website (link above)
Dynamic Time Warping (DTW) [3] for evaluating the similarity of trajectories as a whole, with step-by-step alignment.

Submissions will be ranked for each metric, and the top 10 teams will be decided based on the two rankings. We recommend the teams try to optimize for both metrics.

[added Sep 7th] We will award the 1st prizes for 4 categories: GEOBLEU-task1, GEOBLEU-task2, DTW-task1, DTW-task2, and a Grand Prize for the team with the lowest total ranking (e.g., if team A is 1st, 3rd, 2nd, 1st in the 4 categories, the team’s total ranking is 1+3+2+1=7)

Submission Procedure and Rules

~~Prediction results for Tasks 1 and 2 should be uploaded to online storage (e.g., Dropbox, Box, Google Drive, etc.), and the download links should be sent to humob2023@gmail.com~~ (deleted Sep 8th)
[Updated Sep 8th] Prediction results should be uploaded to the Box links sent to the Team Leaders. If you haven’t received a link, please let us know at humob2023@gmail.com
- You do not need to send submission emails or anything else — just upload your predictions!
The attached files should be named as {teamname}_{task1,task2}_humob.csv.gz. For example, team name ‘dreamteam’ submitting their solutions for task 1 should submit their prediction as dreamteam_task1_humob.csv.gz [updated July 31]
Only 1 submission per team would be evaluated. The final submission before the deadline (September 15th 23:59 AOE) will be considered as the final submission.
The format of the submission should include the same 5 columns as the original dataset (user ID, day, timeslot, x, y). Separate the columns using commas (,) and include no redundant spaces, and save the file using the csv.gz format.
Only send the data for the predicted users. For Task 1, only users #80000 to #99,999, and for Task 2, only users #22500 to #24999.
[Added Sep 8th] Please use the Validator tool (below) to make sure your submissions formats are correct!
Cite the data descriptor document in your submissions and future work where the HuMob data is used:
- Yabe, T., Tsubouchi, K., Shimizu, T., Sekimoto, Y., Sezaki, K., Moro, E., Pentland, A. (2023). Metropolitan Scale and Longitudinal Dataset of Anonymized Human Mobility Trajectories. arXiv preprint arXiv:2307.03401. http://arxiv.org/abs/2307.03401
- Bibtex: @misc{yabe2023metropolitan, title={Metropolitan Scale and Longitudinal Dataset of Anonymized Human Mobility Trajectories}, author={Takahiro Yabe and Kota Tsubouchi and Toru Shimizu and Yoshihide Sekimoto and Kaoru Sezaki and Esteban Moro and Alex Pentland}, year={2023}, eprint={2307.03401}, archivePrefix={arXiv}, primaryClass={cs.SI}}

Submission File Validation Tool [updated Aug 23]

To avoid mistakes in submission format, we implemented a submission file data format validation code at https://github.com/yahoojapan/geobleu.
You can check whether your submission files matches the requirements with a standalone python program, validator.py. It takes the task id and submission file path as arguments and emits errors if it finds anything wrong regarding the number of columns, uid, and value ranges of d, t, x, and y. A submission file may begin with the header line “uid,d,t,x,y”, while omitting it is also acceptable.
For example, assuming your submission file for task 1 before compression is at foo/bar_task1_humob.csv, the command will be: python3 validator.py 1 foo/bar_task1_humob.csv
The line number in error messages is 0-indexed. If the tool doesn’t find anything, it will just say “Validation finished without errors!“. Please only submit files that have passed this validation step!

Midterm Evaluation [updated July 31, Aug 23]

To provide some feedback on how well your team is doing, we will have a midterm evaluation period. This is a good opportunity to check whether your submission format is correct and whether your algorithm is performing well compared to your peers, etc.
Predictions for only Task 1 will be evaluated for the Midterms, and similar to the final submission, we will only accept 1 submission (final one before deadline below) for the midterm evaluation.
The top 10 teams will be posted on the website. We will send you your scores individually as well via email.
You do not need to participate in the midterm evaluation to be considered for the final challenge. This is an optional activity to help you achieve higher scores.
- Midterm submission are due on August 29th (Tuesday) 23:59 AoE
- Leaderboard will be posted online by September 5th (Tuesday)
Submission links (a link to Box folder) will be sent to the team leaders within a few days. Please upload your solutions to the provided Box Folder.
Your submission file should be named “%teamname%_task1midterm_humob.csv.gz”, for example if my team name is “MIT”, the file should be named “MIT_task1midterm_humob.csv.gz”.
If you have not submitted your team name, use the first name of the team lead as the team name.

Midterm Evaluation Results [updated Sep 5, Sep 8]

Here are the top 10 teams for the Midterm Evaluation! Teams who submitted results which passed the validation check code but still had some issues were contacted via email individually. Please check your email! Rankings are created based on GEOBLEU scores.

Ranking	Team name	GEOBLEU (higher better)	DTW score (lower better)
1	MOBB	0.3212	40.74
2	VIPA1897	0.3109	43.36
3	GeoFormer	0.3037	29.07
4	NishioMob	0.2743	28.37
5	Haoyu	0.2513	62.13
6	Jiaxin	0.2470	54.87
7	mukumuku	0.2357	31.56
8	Y2JTX2GO	0.1996	71.80
9	ResCity	0.1911	57.42
10	chew	0.1128	76.08

Final Evaluation Results [updated Sep 22]

Here are the top teams (in alphabetical order) that were selected to attend the workshop in Hamburg, Germany on November 13th. Congratulations!! Please check your email! The grand winners and prize winners will be announced at the workshop. We plan to convene leading researchers working on human mobility prediction at SIGSPATIAL to discuss various methods, approaches, and algorithms that worked / did not work. We are looking forward to the workshop!!!

12j0

3S-CrMap

AISTDPRT

GeoFormer

GIS4Fun

Haoyu

Jiaxin

KDDIdbcg

MOBB

mukumuku

osushineko

uclab2023

Y2JTX2GO

Please follow the instructions listed in the email. Thank you for your participation!

Workshop at ACM SIGSPATIAL @ Hamburg, Germany

The top ~10 teams with the best predictions will be invited to submit a final report with details of the methods and to present their work at the HuMob 2023 Workshop held in conjunction with ACM SIGSPATIAL 2023 in Hamburg, Germany on November 13th, 2023.
We have prizes for the top 3 participants!

Important dates: [updated July 31]

June 15, 2023: data challenge announcement
July 10, 2023: data open
August 20, 2023: registration deadline
August 29, 2023: submission deadline for midterm evaluation (optional; 23:59 AoE)
September 5, 2023: top midterm scores will be posted on website
September 15, 2023: submission deadline for final predictions (23:59 AoE)
September 22, 2023: notification of top contestants
October 14, 2023: submission deadline of workshop papers for top 10 teams
October 20, 2023: camera-ready submission
November 13, 2023: presentation in the workshop

**Participants shall not carry out activities that involve unethical usage of the data, including attempts at re-identifying data subjects, harming individuals, or damaging companies. Participants will be allowed to submit full versions of their works to venues of their choice, upon approval by the organizers.**

FAQ [updated July 31, Aug 23]

Q. The datasets are not complete (i.e., location observations are not provided for all time steps for all individuals), is this an error?
- A. No, mobile phone location data is not complete — that makes human mobility prediction difficult! Please try to predict the locations of the individuals labeled ‘999’ in each dataset.
Q. The GEOBLEU code is computationally slow. How can we speed it up?
- A. Note that you need to compute the GEOBLEU scores for each day per user, not for the entire 15-day trajectory for each individual. If the computation is slow even in this case, please try multi-thread processing. We will post the example code on the GitHub page soon. Please watch out for updates at (https://github.com/yahoojapan/geobleu)
Q. We forgot to send our team names!
- A. If you forgot to do so, please send your team names to humob2023@gmail.com. Otherwise, we will use the first name of the team leader as the team name.
Q. What is our team number? We apparently need it to name our submission files.
- A. Please submit your predictions by naming your dataset ‘teamname_tasknumber_csv.gz’. The original instruction was to submit using the team number, but please use teamname instead. If your team name has special letters or spaces, please make it a simple one word so that it is easy for the machine to read.
Q. Is the DTW score normalized by taking the average score for all steps for each user, and then averaged across all users?
- A. DTW in the repo is not using normalization. This time we decided to calculate it in a more straightforward way. Essentially, we are treating each prediction step (rather than each user) equally in this competition.
Q. What is the typical range of values of GEOBLEU and DTW metrics? I just want to know whether the score is in the same ballpark.
- A. The results obtained by a simple baseline method was around 0.04 for GEOBLEU and around 60 for DTW, tested by the organizers. Details of the simple baseline method and results are outlined on the github readme “Baseline method and results” section, here: https://github.com/yahoojapan/geobleu
Q. Is it okay if we only submit our results for Task 1?
- A. Yes, that is fine. We will evaluate the winner for each task (1 and 2), so in that case you will be evaluated for only Task 1.

Organizers

Dr. Takahiro Yabe, MIT
Dr. Kota Tsubouchi, Yahoo Japan Corporation
Toru Shimizu, Yahoo Japan Corporation
Professor Yoshihide Sekimoto, University of Tokyo
Professor Kaoru Sezaki, University of Tokyo
Professor Esteban Moro, MIT
Professor Alex ‘Sandy’ Pentland, MIT

For general questions about the challenge: humob2023@gmail.com

————————————————————————————

References

[1] Luca, M., Barlacchi, G., Lepri, B., & Pappalardo, L. (2021). A survey on deep learning for human mobility. ACM Computing Surveys (CSUR), 55(1), 1-44.

[2] Shimizu, T., Tsubouchi, K., & Yabe, T. (2022). GEO-BLEU: similarity measure for geospatial sequences. In Proceedings of the 30th International Conference on Advances in Geographic Information Systems (pp. 1-4).

[3] Senin, P. (2008). Dynamic time warping algorithm review. Information and Computer Science Department University of Hawaii at Manoa Honolulu, USA, 855(1-23), 40.