Participants
- Senior Members: Marine Carpuat, Hal Daumé III, Alex Fraser and Chris Quirk
- Graduate Students: Fabienne Braune, Ann Clifton, Ann Irvine, Jagadeesh Jagarlamudi, John Moran, Majid Razmara and Aleš Tamchyna
- Undergraduate Students: Katharine Henry and Rachel Rudinger
Talks
- Domain Adaptation in Statistical Machine Translation, Final presentation at JHU summer workshop (Aug 2012) [Video]
- Domain Adaptation in Machine Translation: Findings from the 2012 Johns Hopkins University Summer Workshop, Invited Keynote talk at AMTA 2012 (Nov 2012)
Papers
- Domain Adaptation in Machine Translation: Final Report, 2012. Marine Carpuat, Hal Daumé III, Alexander Fraser, Chris Quirk, Fabienne Braune, Ann Clifton, Ann Irvine, Jagadeesh Jagarlamudi, John Morgan, Majid Razmara, Aleš Tamchyna, Katharine Henry and Rachel Rudinger. Technical report.
- SenseSpotting: Never let your parallel data tie you to an old domain, 2013. Marine Carpuat, Hal Daumé III, Katharine Henry, Ann Irvine, Jagadeesh Jagarlamudi and Rachel Rudinger. Proceedings of the Association for Computational Linguistics (ACL).
- Monolingual Marginal Matching for Translation Model Adaptation, 2013. Ann Irvine, Chris Quirk and Hal Daumé III. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
- Measuring Machine Translation Errors in New Domains, 2013. Ann Irvine, John Morgan, Marine Carpuat, Hal Daumé III and Dragos Munteanu. Transactions of the Association for Computational Linguistics (TACL).
Data and Code
- Preprocessed data for newswire, medical texts, movie subtitles and Hansards (old domain data), version released 13 Nov 2012; 954M, README, SHA256 (Note: this excludes the Science data because we have not worked out all copyright issues on that data. However, you get the "NRC" subset of that data, both raw and preprocessed, below.)
- Raw and preprocessed data for the NRC subset of the science data, version released 13 Nov 2012; 121M, README, SHA256
- WADE code, example input and README
- WADE outputs on our datasets (from which Table 6 in the 2013 TACL paper was generated; see the README)
- SenseSpotting code/data is hosted on github.
Acknowledgments
Many thanks to the entire DAMT team, plus George Foster (NRC) for his expertise, Dragos Munteanu (Language Weaver) for initial brainstorming, the whole JHU team (especially Sanjeev Khudanpur) for making the workshop happen, and the various funders who contributed to this work (including Google, ODNI, NSF, DARPA).
If you make use of any of this data, we only ask that you acknowledge us by citing:
@incollection{JHU-SummerWorkshop2012, author = {Marine Carpuat and Hal Daum\'e III and Alexander Fraser and Chris Quirk and Fabienne Braune and Ann Clifton and Ann Irvine and Jagadeesh Jagarlamudi and John Morgan and Majid Razmara and Ale\v{s} Tamchyna and Katharine Henry and Rachel Rudinger}, title = {Domain Adaptation in Machine Translation: Final Report}, booktitle = {2012 Johns Hopkins Summer Workshop Final Report}, year = {2012}, url = {http://hal3.name/damt/} }You can read the final report if you like!