Write a Blog >>
PLDI 2020
Mon 15 - Fri 19 June 2020

In this paper, we propose a multi-modal synthesis technique for automatically constructing regular expressions ($\emph{regexes}$) from a combination of examples and natural language. Using multiple modalities is useful in this context because natural language alone is often highly ambiguous, whereas examples in isolation are often not sufficient for conveying user intent. Our proposed technique first parses the English description into a so-called $\emph{hierarchical sketch}$ that guides our programming-by-example (PBE) engine. Since the hierarchical sketch captures crucial hints, the PBE engine can leverage this information to both prioritize the search as well as make useful deductions for pruning the search space.

We have implemented the proposed technique in a tool called Regel and evaluate it on over three hundred regexes. Our evaluation shows that Regel achieves 80% accuracy whereas the NLP-only and PBE-only baselines achieve 43% and 26% respectively. We also compare our proposed PBE engine against an adaptation of AlphaRegex, a state-of-the-art regex synthesis tool, and show that our proposed PBE engine is an order of magnitude faster, even if we adapt the search algorithm of AlphaRegex to leverage the sketch. Finally, we conduct a user study involving 20 participants and show that users are twice as likely to successfully come up with the desired regex using Regel compared to without it.

Thu 18 Jun
Times are displayed in time zone: (GMT-07:00) Pacific Time (US & Canada) change

pldi-2020-papers16:00 - 16:20
Qiaochu ChenUniversity of Texas at Austin, USA, Xinyu WangUniversity of Michigan at Ann Arbor, USA, Xi YeUniversity of Texas at Austin, USA, Greg DurrettUniversity of Texas at Austin, USA, Isil DilligUniversity of Texas at Austin, USA
pldi-2020-papers16:20 - 16:40
DongKwon LeeSeoul National University, South Korea, Woosuk LeeHanyang University, South Korea, Hakjoo OhKorea University, South Korea, Kwangkeun YiSeoul National University, South Korea
pldi-2020-papers16:40 - 17:00
Pepe VilaIMDEA Software Institute, Spain, Pierre GantyIMDEA Software Institute, Spain, Marco GuarnieriIMDEA Software Institute, Spain, Boris KöpfMicrosoft Research, n.n.