Abstract
Artificial intelligence has shown great potential in a variety of applications, from natural language models to audio visual recognition, classification, and manipulation. AI Researchers have to work with massive amount of collected data for use in machine learning, raising some challenges in effectively managing and utilizing the collected data in the training phase to develop and iterate on more accurate, and more generalized models. In this paper we conducted a review on parallel and distributed machine learning methods and challenges. We also propose a distributed and scalable deep learning model architecture which can span across multiple processing nodes. We tested the model on the MIT Indoor dataset, to evaluate the performance and scalability of the model using multiple hardware nodes, and showed the scaling characteristics of the different model using different model sizes. We find that distributed training is 80% faster using 2 GPUs than 1 GPU. We also find that the model keeps the benefits of distributed training such as speed and accuracy regardless of its size or training batch size.
Original language | English |
---|---|
Title of host publication | ICFNDS 2023 - 2023 The 7th International Conference on Future Networks and Distributed Systems |
Publisher | Association for Computing Machinery |
Pages | 283-291 |
Number of pages | 9 |
ISBN (Electronic) | 9798400709036 |
DOIs | |
State | Published - 21 Dec 2023 |
Event | 7th International Conference on Future Networks and Distributed Systems, ICFNDS 2023 - Dubai, United Arab Emirates Duration: 21 Dec 2023 → 22 Dec 2023 |
Publication series
Name | ACM International Conference Proceeding Series |
---|
Conference
Conference | 7th International Conference on Future Networks and Distributed Systems, ICFNDS 2023 |
---|---|
Country/Territory | United Arab Emirates |
City | Dubai |
Period | 21/12/23 → 22/12/23 |
Bibliographical note
Publisher Copyright:© 2023 ACM.
Keywords
- Deep Learning
- Distributed Deep Learning
- Distributed Systems
- Image Classification
- Machine Learning
ASJC Scopus subject areas
- Human-Computer Interaction
- Computer Networks and Communications
- Computer Vision and Pattern Recognition
- Software