Courses & TutorialsProgramming
Awesome Natural Language Processing Generation – Massive Collection of Resources
Contents
- Datasets
- Dialog
- Evaluation
- Grammar
- Libraries
- Narrative Generation
- Neural Natural Language Generation
- Papers and Articles
- Products
- Realizers
- Templating Languages
- Videos
Datasets
- Alex Context NLG Dataset – A dataset for NLG in dialogue systems in the public transport information domain.
- Box-score data – This dataset consists of (human-written) NBA basketball game summaries aligned with their corresponding box- and line-scores.
- E2E – This shared task focuses on recent end-to-end (E2E), data-driven NLG methods, which jointly learn sentence planning and surface realisation from non-aligned data.
- Neural-Wikipedian – The repository contains the code along with the required corpora that were used in order to build a system that “learns” how to generate English biographies for Semantic Web triples.
- WeatherGov – Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data.
- WebNLG – The enriched version of the WebNLG – a resource for evaluating common NLG tasks, including Discourse Ordering, Lexicalization and Referring Expression Generation.
- WikiBio – wikipedia biography dataset – This dataset gathers 728,321 biographies from wikipedia. It aims at evaluating text generation algorithms. For each article, we provide the first paragraph and the infobox (both tokenized).
- The Schema-Guided Dialogue Dataset – The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between a human and a virtual assistant.
- The Wikipedia company corpus – Company descriptions collected from Wikipedia. The dataset contains semantic representations, short, and long descriptions for 51K companies in English.
- YelpNLG – YelpNLG provides resources for natural language generation of restaurant reviews.
Dialog
- Chatito – Generate datasets for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
- NNDIAL – NNDial is an open source toolkit for building end-to-end trainable task-oriented dialogue models.
- Plato – This is the Plato Research Dialogue System, a flexible platform for developing conversational AI agents.
- RNNLG – RNNLG is an open source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue system application domains.
- TGen – Statistical NLG for spoken dialogue systems.
Evaluation
- compare-mt – A tool for holistic analysis of language generations systems.
- NLG-eval – Evaluation code for various unsupervised automated metrics for Natural Language Generation.
- VizSeq – A Visual Analysis Toolkit for Text Generation Tasks.
Grammar
- OpenCCG – OpenCCG library for parsing and realization with CCG.
- GrammaticalFramework – A programming language for multilingual grammar applications.
- EasyCCG – CCG: All combinators, common grammar format, parsing to logical form, parameter estimation for probabilistic CCG.
- CCG Lab – All combinators, common grammar format, parsing to logical form, parameter estimation for probabilistic CCG.
- CCGweb – A Web platform for parsing and annotation.
Libraries
- Cron Expression Descriptor – A .NET library that converts cron expressions into human readable descriptions.
- Number Words – Convert a number to an approximated text expression: from ‘0.23’ to ‘less than a quarter’.
Narrative Generation
- Random Story Generator – Using Natural Language Generation (NLG) to create a random short story.
- Tracery – A story-grammar generation library for JavaScript.
Neural Natural Language Generation
- aitextgen – A robust Python tool for text-based AI training and generation using GPT-2.
- graph-2-text – Graph to sequence implemented in Pytorch combining Graph convolutional networks and opennmt-py.
- Image Caption Generator – A Neural Network based generative model for captioning images using Tensorflow.
- PaperRobot: Incremental Draft Generation of Scientific Ideas – We present a PaperRobot who performs as an automatic research assistant.
- PPLM – Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.
- Question Generation using hugstransformers – Question generation is the task of automatically generating questions from a text paragraph.
- Texar – Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks.
- textgenrnn – Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.
- This Word Does Not Exist – This is a project allows people to train a variant of GPT-2 that makes up words, definitions and examples from scratch.
- Transformers – State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.
- Summary Generation From Structured Data – For converting information present in the form of structured data into natural language text.
Papers and Articles
- 2020: A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems
- 2020: Evaluating the state-of-the-art of End-to-End Natural Language Generation: The E2E NLG challenge
- 2020: How to generate text: using different decoding methods for language generation with Transformers
- 2020: Natural language generation: The commercial state ofthe art in 2020
- 2020: Turing-NLG: A 17-billion-parameter language model by Microsoft
- 2019: A Closer Look at Recent Results of Verb Selection for Data-to-Text NLG
- 2019: A Personalized Data-to-Text Support Tool for Cancer Patients
- 2019: Controlling Contents in Data-to-Document Generation with Human-Designed Topic Labels
- 2019: Generated Texts Must Be Accurate!
- 2019: Hotel Scribe: Generating High Variation Hotel Descriptions
- 2019: Revisiting Challenges in Data-to-Text Generation with Fact Grounding
- 2017: Survey of the State of the Art in NaturalLanguage Generation: Core tasks, applicationsand evaluation
- 2016: Natural Language Generation enhances human decision-making with uncertain information
Products
- Accelerated Text – Automatically generate multiple natural language descriptions of your data varying in wording and structure.
- RosaeNLG – An open-source library for node.js or client side (browser) execution, based on the Pug template engine, to generate texts in English, French, German and Italian.
- Twine – An open-source tool for telling interactive, nonlinear stories.
Realizers
- Genl – Surface realiser (part of a Natural Language Generation system) using Tree Adjoining Grammar.
- JSrealB – A JavaScript bilingual text realizer for web development.
- SimpleNLG – Java API for Natural Language Generation.
- SimpleNLG DE – German version of SimpleNLG 4.
- SimpleNLG-EnFr – SimpleNLG-EnFr 1.1 is a bilingual English/French adaption of SimpleNLG v4.2.
Templating Languages
- calyx – A Ruby library for generating text with recursive template grammars.
- nalgene – Natural language generation language.
- StringTemplate – Java template engine (with ports for C##, Objective-C, JavaScript, Scala) for generating source code, web pages, emails, or any other formatted text output.
Videos
- Data-To-Text: Generating Textual Summaries of Complex Data – Ehud Reiter
- Imitation Learning and its Application to Natural Language Generation
- Natural Language Generation (Introduction)
- Strata Data Conference | The future of natural language generation: 2017-2027
- The Quest for Automated Story Generation – Mark Riedl