Summary: Researchers in China have developed a new neural network that generates high-quality bird images from textual descriptions by using common sense knowledge to enhance the generated image at three different resolution levels, achieving competitive scores with other neural network methods. The network uses a generative adversarial network and was trained with a dataset of bird images and text descriptions, with the aim of promoting the development of text-image synthesis.
Source: Intelligent Computing
In an effort to generate high-quality images based on textual descriptions, a group of researchers in China built a generative adversarial network that integrates data representing common sense knowledge.
Their method uses common sense to clarify the starting point of image generation and also uses common sense to enhance different specific characteristics of the generated image at three different resolution levels. The network was trained using a database of bird images and textual descriptions.
The generated bird images scored competitively with those produced using other neural network methods.
The group’s research was published on February 20 in Intelligent Computinga partner scientific journal.
Given that “a picture is worth a thousand words”, the shortcomings of currently available text-to-image synthesis frameworks are hardly surprising. If you want to generate an image of a bird, the description you give to a computer might include its size, body color, and beak shape. To produce an image, the computer still has to decide many details about how to display the bird, such as the direction the bird is facing, what should be in the background, and whether its beak is open or farm.
If the computer had what we consider to be commonsense knowledge, it would make decisions about the representation of unspecified details more successfully. For example, a bird can stand on one leg or on two legs, but not on three.
When measured quantitatively against its predecessors, the authors’ image generation network achieved competitive scores using metrics that measure fidelity and distance from actual images. Qualitatively, the authors characterize the generated images as generally coherent, natural, sharp and vivid.
“We strongly believe that the introduction of common sense can greatly promote the development of text-image synthesis,” concludes the research paper.
The authors’ neural network for generating images from text consists of three modules. The first improves the textual description that will be used to generate the image. ConceptNet, a data source that represents general knowledge for language processing as a graph of related nodes, was used to retrieve common sense knowledge items to add to the textual description.
The authors added a filter to reject unnecessary knowledge and select the most relevant knowledge. To randomize the generated images, they added statistical noise. The input to the image generator therefore consists of the description of the original text, parsed as a sentence and as separate words, plus selected bits of commonsense knowledge from ConceptNet, plus noise.
The second module generates images in several steps. Each step corresponds to an image size, starting with a small 64 x 64 pixel image and increasing to 128 x 128 and then 256 x 256. The module builds on the “adaptive entity refinement” unit of the authors, which incorporates common sense knowledge of the detail needed for each image size.
The third module examines the generated images and rejects those that do not match the original description. The system is a “generative adversarial network” because it has this third part that verifies the work of the generator. As the network of authors is “guided by common sense”, they call their network CD-GAN.
The CD-GAN was trained using the Caltech-UCSD Birds-200-2011 dataset, which lists 200 bird species using 11,788 specially annotated images.
Guokai Zhang of Tianjin University performed the experiments and wrote the manuscript. Ning Xu of Tianjin University contributed to the study design. Chenggang Yan from Hangzhou Dianzi University performed the data analyses. Bolun Zheng from Hangzhou Dianzi University and Yulong Duan from the CETC 30th Research Institute contributed significantly to the analysis and preparation of the manuscript. Bo Lv from the 30th Research Institute of the CETC and An-An Liu from Tianjin University helped carry out the analysis with constructive discussions.
About this artificial intelligence research news
Author: Lucy Day Werts
Source: Intelligent Computing
Contact: Lucy Day Werts – Intelligent Computing
Picture: Image is in public domain
Original research: Free access.
“CD-GAN: Common Sense-Based Generative Adversarial Network with Hierarchical Refinement for Text-Image Synthesis” by Ning Xu et al. Intelligent Computing
CD-GAN: Common-sense-based generative adversarial network with hierarchical refinement for text-image synthesis
The synthesis of vivid images with descriptive texts is gradually emerging as a cross-domain generation task. However, it is obviously insufficient to accurately generate a high quality image with a single sentence due to the information asymmetry between the modalities, which requires external knowledge to balance the process.
Moreover, the limited description of the entities in the sentence cannot guarantee the semantic consistency between the text and the generated image, causing the lack of details in the foreground and in the background.
Here, we propose a commonsense-driven generative adversarial network to generate photorealistic images based on entity-related commonsense knowledge.
The Common Sense-Based Generative Adversarial Network contains 2 key common-sense-based modules: (a) Entity Semantic Augmentation is designed to enhance entity semantics with common sense to reduce asymmetry information, and (b) adaptive feature refinement is used to generate the high resolution. image guided by various knowledge of common sense in several steps to keep text-image consistency.
We have demonstrated numerous synthetic cases on the widely used CUB-birds (Caltech-UCSD Birds-200-2011) dataset, where our model performs competitively with other state-of-the-art models.
#Creating #Artificial #Avians #Neural #Network #Generates #Realistic #Bird #Images #Text #Common #Sense #Neuroscience #News