Section IV. Data Collection

Section IV.A: Parallel corpus dataset

To study on SMT-based migration models, we collected a parallel corpus of 34,209 pairs of methods written in both Java and C#. Those methods were created manually by developers, and used in 9 open-source systems that were originally developed in Java and ported to C#. Download

Section IV.C: Ground truth Data

We conducted the human study for the translated results from all of our SMT-based migration models with a total of 2,250 manual assignments of semantic scores. Download
Samples of translated results: mppSMT Sample 1, lpSMT Sample 1 , mppSMT Sample 2, lpSMT Sample 2 , GNMT , p-mppSMT
Each score index is according to a line in the translated results.