ACL 2017 Report
Writers: Yuta Kikuchi, Sosuke Kobayashi
Preferred Networks (PFN) attended the 55th Annual Meeting of the
Association for Computational Linguistics (ACL 2017) in Vancouver, Canada. ACL is one of the largest conferences in the Natural Language Processing (NLP) field.
As in other Machine Learning research fields, use of deep learning in NLP is increasing. The most popular topic in NLP deep learning is sequence-to-sequence learning tasks. This model receives a sequence of discrete symbols (words) and learns to output a correct sequence conditioned by the input.
In ACL 2017, there were 302 accepted papers selected from 1,318 submissions (about 23% acceptance rate). Detailed and interesting statistics, such as hot topic or procrastination graph, are available at PC chairs blog of ACL 2017. In addition, there is an interesting visualization of paper titles that made by Scattertext presented as an ACL 2017 demo paper: Scattertext: a Browser-Based Tool for Visualizing how Corpora Differ
Some topics that were popular in ACL 2017:
Neural Machine Translation
There were many studies of Neural machine translation (NMT). Although machine translation has long been one of the most popular research topics, this new paradigm brought major impact in this field.
Almost all NMT studies are based on the neural network model called Encoder-decoder with Attention Mechanism. It consists of an encoder that converts the input sentence of source language to hidden vector representation and a decoder that converts the hidden vector to output sentence of target language. At each time-step, while decoding words sequentially, the attention mechanism decides which part of the source sentence should be focused on by the decoder. Many NMT studies show significant improvements compared with traditional machine learning paradigms, such as example-based or phrase-based machine translation.
The task NMT attempts to solve can be regarded as a more general problem, sequence-to-sequence (seq2seq) learning, which can be applied to many other tasks such as syntactic parsing, image captioning, or dialogue response generation. Hence, progress in NMT can be ported to other seq2seq tasks, and vise versa.
In this post, we select a few topics on NMT from ACL 2017.
Incorporate structure of sentences in sequence-to-sequence learning
Many papers this year incorporated sentence structure in sequence-to-sequence learning. Paper titles often had keywords such as syntax, tree, dependency, or chunk, which indicate some structural information of a sentence:
- Modeling Source Syntax for Neural Machine Translation
- Sequence-to-Dependency Neural Machine Translation
- Learning to Parse and Translate Improves Neural Machine Translation
- Towards String-To-Tree Neural Machine Translation
- Chunk-based Decoder for Neural Machine Translation
- Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder
- Chunk-Based Bi-Scale Decoder for Neural Machine Translation
Although various types of structures are explored, the benefits of these structures is often not clear, so it is difficult to determine which structures to use. These techniques can be transferred to other sequence-to-sequence tasks as mentioned above. It is also interesting to conduct a task crossing investigation that what kind of structural information impacts for each task.
We look forward to reports on benefits comparison between these structures and further discussion.
First NMT Workshops
ACL 2017 held their first workshops ever on NMT, with keynote talks from cutting-edge researchers and many interesting papers. The first best paper award of this workshop was given to Stronger Baselines for Trustable Results in Neural Machine Translation by Michael Denkowski and Graham Neubig. They conducted many comparisons among several new NMT techniques reported in recent years. The results show that your new NMT method should be compared with the simple but strong baseline method: ADAM with multiple restarts, sub-word translation via byte pair encoding, and ensemble decoding. Many other interesting studies were presented, which are available on the workshop page.
Although NMT has shown great success in language pairs which have a large amount of training data (pairs of input and output sentence), one big problem is how to deal with low-resource language pairs or domains.
Six Challenges for Neural Machine Translation, which is published at the first workshop of NMT, showed that the size of training data has great impact on the performance of NMT. In other words, without preparing enough training data, we cannot achieve good performance with NMT systems.
Studies which attempt to overcome this problem have been constantly published for long time before the introduction of NMT. ACL 2017 also had some studies that addressed this problem:
- A Teacher-Student Framework for Zero-Resource Neural Machine Translation
- Data Augmentation for Low-Resource Neural Machine Translation
- Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary
- Bayesian Modeling of Lexical Resources for Low-Resource Settings
- Learning bilingual word embeddings with (almost) no bilingual data
ACL held the first workshop on Language Grounding for Robotics this year. Using natural language as a means of human robot interaction is important to allow those can’t program well to control robots. The workshop had several interesting keynote talks from various fields including NLP, robotics, and machine learning.
There were a poster session and a lively panel discussion. Human robot communication and collaboration are challenging and interesting applications of NLP. Hence, we are hoping the future direction of this topic will be more active.
Here are some of the keynote slides at the workshop page.
Evaluating Natural Language Generation
Although natural language generation systems based on neural networks have grown rapidly, one big problem remaining is the lack of automatic evaluation metrics.
Dominant evaluation metrics are based on the degree of word overlap with human generated sentences (e.g. BLEU, ROUGE). However, obviously it cannot handle cases where word surfaces change as in paraphrasing or sentence generation.
Few weeks ago, the organizer of the Workshop on Asian Translation (WAT 2017) published the result of their shared task of translation.
The evaluation results showed that even models which obtained very high BLEU score could not obtain a high score from humans.
The problem is more serious in the dialogue response generation task because the possible output (response) is more flexible than other natural language generation (NLG) tasks. Liu et al. showed that existing automatic evaluation metrics correlate very poorly with human evaluation of response quality (Liu et al., 2016).
In ACL 2017, Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses presented their attempt to overcome this problem.
They trained an LSTM-based model which receives an input context, a human response, and a system-generated response, and outputs a scalar value that represents the quality of the system’s response given the context and the human response.
The results show that their method correlated significantly better with human evaluation score than the word overlap based metrics such as BLEU.
Although there are some concerns about using learning based evaluators in terms of reliability, reproducibility, or transportability to other domains, this is a very important direction for future NLG research.
Utilizing reinforcement learning for deep learning recently got plenty of attention from the NLP community. In ACL 2017, there were several papers which use reinforcement learning for their study.
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access and Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning focused on dialogue modeling, which has a challenge in handling a long-term strategy of conversation. Moreover, (task-oriented) dialogues often include non-differentiable actions such as retrieving database, sorting the candidates, or updating the database entities.
Coarse-to-Fine Question Answering for Long Documents combined models for document summarization (selecting important sentences) and answer generation to deal with longer documents in question answering (QA) task. Since there are no annotation of important sentences for QA dataset., they trained a sentence extraction module with reinforcement learning whose reward is the log probability of the correct answer given that sentence.
From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood tackled a kind of semantic parsing task, where a model selects a sequence of actions (program) to affect the environment. They argued the problem of spurious programs which get to the correct final state (output) in an incorrect way (a sequence of actions). To prevent this problem, they proposed an algorithm called RANDOMER, which contains two components: ε-greedy randomized beam search to reduce initialization sensitivity, and β-meritocratic parameter update rule for smoothing update weight of each action sequence.
Reinforcement learning has gained lots of attention in NLP. Next september, the Conference on Empirical Methods on Natural Language Processing (EMNLP 2017), which is also one of the flagship conferences in NLP, will be held in Copenhagen, Denmark. According to the accepted paper list of EMNLP, or arXiv preprint of them, the amount of studies who utilize reinforcement learning has increased further. Keep your eyes on EMNLP as well if you are interested in reinforcement learning.
Adversarial training comes under the spotlight in the area of machine learning. The most popular topic in this area is generative adversarial networks (GAN). Although challenging studies of GAN for natural language generation have also been appeared, ACL 2017 (and co-located CoNLL 2017) presented many papers using adversarial training for domain adaptation, multi-task learning, or multilingual learning in various NLP tasks:
- Adversarial Connective-exploiting Networks for Implicit Discourse Relation Classification
- Adversarial Multi-task Learning for Text Classification
- Adversarial Training for Unsupervised Bilingual Lexicon Induction
- Adversarial Adaptation of Synthetic or Stale Data
- Adversarial Multi-Criteria Learning for Chinese Word Segmentation
- Adversarial Training for Cross-Domain Universal Dependency Parsing
- Cross-language Learning with Adversarial Neural Networks
In this context, adversarial training is used to make learned feature distributions for multiple domains closer and make knowledge from each domain transferrable. Almost all the above studies are based on two related works: Unsupervised Domain Adaptation by Backpropagation from ICML 2015, and Domain Separation Networks from NIPS 2016.
These two papers proposed many key technologies in this topic: shared-private model, (adversarial) domain classifier, gradient reversal layer, orthogonal constraints, and reconstruction. We recommend reading more details of those two papers before reading the related papers in ACL 2017.
Here is a paper which is more unique among the adversarial papers in ACL 2017; Adversarial Training for Unsupervised Bilingual Lexicon Induction.
The motivation of this paper is to project words in two languages into the same vector space. Their method learns a linear transformation matrix G (generator) to map the word-vector space of one language to that of another without any supervision (i.e. manually created dictionary). They prepare a discriminator that classifies whether a given vector is a real word vector from the target language or a word vector mapped from the source language by G. The result shows their method obtained competitive results compared with related work which uses seed word translation pairs as supervision.