Abstract
This paper addresses the complex problem of learning from unbalanced datasets due to which traditional algorithms may perform poorly. Classification algorithms used for learning tend to favor the larger, less important classes in such problems. In this work, to handle unbalanced data problem, we synthesize data using variational autoencoders (VAE) on raw training samples and then, use various input sources (raw, combination of raw and synthetic) to train different models. We evaluate our method using multiple criteria on SVHN dataset which consists of complex images, and perform a comprehensive comparative analysis of popular CNN architectures when there is balanced and unbalanced data and determine which operates best in class imbalance problem. We found that data synthesis via VAE is reliable and robust, and can help to classify real data with higher accuracy than traditional (unbalanced) data. Our results demonstrate the strength of using VAE to solve the class imbalance problem.
| Original language | English |
|---|---|
| Title of host publication | Analysis of Images, Social Networks and Texts - 8th International Conference, AIST 2019, Revised Selected Papers |
| Editors | Wil M.P. van der Aalst, Vladimir Batagelj, Dmitry I. Ignatov, Valentina Kuskova, Sergei O. Kuznetsov, Irina A. Lomazova, Michael Khachay, Andrey Kutuzov, Natalia Loukachevitch, Amedeo Napoli, Panos M. Pardalos, Marcello Pelillo, Andrey V. Savchenko, Elena Tutubalina |
| Publisher | Springer |
| Pages | 270-281 |
| Number of pages | 12 |
| ISBN (Print) | 9783030395742 |
| DOIs | |
| State | Published - 2020 |
| Externally published | Yes |
| Event | 8th International Conference on Analysis of Images, Social Networks and Texts, AIST 2019 - Kazan, Russian Federation Duration: 17 Jul 2019 → 19 Jul 2019 |
Publication series
| Name | Communications in Computer and Information Science |
|---|---|
| Volume | 1086CCIS |
| ISSN (Print) | 1865-0929 |
| ISSN (Electronic) | 1865-0937 |
Conference
| Conference | 8th International Conference on Analysis of Images, Social Networks and Texts, AIST 2019 |
|---|---|
| Country/Territory | Russian Federation |
| City | Kazan |
| Period | 17/07/19 → 19/07/19 |
Bibliographical note
Publisher Copyright:© Springer Nature Switzerland AG 2020.
Keywords
- Convolutional Neural Network (CNN)
- Imbalanced data
- Variational autoencoder (VAE)
ASJC Scopus subject areas
- General Computer Science
- General Mathematics