Skip to main navigation Skip to search Skip to main content

Towards a scalable HDFS architecture

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

41 Scopus citations

Abstract

Cloud computing infrastructures allow corporations to reduce costs by outsourcing computations on-demand. One of the areas cloud computing is increasingly being utilized for is large scale data processing. Apache Hadoop is one of these large scale data processing projects that supports data-intensive distributed applications. Hadoop applications utilize a distributed file system for data storage called Hadoop Distributed File System (HDFS). HDFS architecture, by design, has only a single master node called NameNode, which manages and maintains the metadata of storage nodes, called Datanodes, in its RAM. Hence, HDFS Datanodes' metadata is restricted by the capacity of the RAM of the HDFS's single-point-of-failure NameNode. This paper proposes a fault tolerant, highly available and widely scalable HDFS architecture. The proposed architecture provides a distributed NameNode space eliminating the drawbacks of the current HDFS architecture. This is achieved by integrating the Chord protocol into the HDFS architecture.

Original languageEnglish
Title of host publicationProceedings of the 2013 International Conference on Collaboration Technologies and Systems, CTS 2013
Pages155-161
Number of pages7
DOIs
StatePublished - 2013

Publication series

NameProceedings of the 2013 International Conference on Collaboration Technologies and Systems, CTS 2013

Keywords

  • Chord
  • Cloud Computing Platform
  • Distributed NameNode
  • HDFS
  • Hadoop

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Towards a scalable HDFS architecture'. Together they form a unique fingerprint.

Cite this