TY - GEN
T1 - Towards a scalable HDFS architecture
AU - Azzedin, Farag
PY - 2013
Y1 - 2013
N2 - Cloud computing infrastructures allow corporations to reduce costs by outsourcing computations on-demand. One of the areas cloud computing is increasingly being utilized for is large scale data processing. Apache Hadoop is one of these large scale data processing projects that supports data-intensive distributed applications. Hadoop applications utilize a distributed file system for data storage called Hadoop Distributed File System (HDFS). HDFS architecture, by design, has only a single master node called NameNode, which manages and maintains the metadata of storage nodes, called Datanodes, in its RAM. Hence, HDFS Datanodes' metadata is restricted by the capacity of the RAM of the HDFS's single-point-of-failure NameNode. This paper proposes a fault tolerant, highly available and widely scalable HDFS architecture. The proposed architecture provides a distributed NameNode space eliminating the drawbacks of the current HDFS architecture. This is achieved by integrating the Chord protocol into the HDFS architecture.
AB - Cloud computing infrastructures allow corporations to reduce costs by outsourcing computations on-demand. One of the areas cloud computing is increasingly being utilized for is large scale data processing. Apache Hadoop is one of these large scale data processing projects that supports data-intensive distributed applications. Hadoop applications utilize a distributed file system for data storage called Hadoop Distributed File System (HDFS). HDFS architecture, by design, has only a single master node called NameNode, which manages and maintains the metadata of storage nodes, called Datanodes, in its RAM. Hence, HDFS Datanodes' metadata is restricted by the capacity of the RAM of the HDFS's single-point-of-failure NameNode. This paper proposes a fault tolerant, highly available and widely scalable HDFS architecture. The proposed architecture provides a distributed NameNode space eliminating the drawbacks of the current HDFS architecture. This is achieved by integrating the Chord protocol into the HDFS architecture.
KW - Chord
KW - Cloud Computing Platform
KW - Distributed NameNode
KW - HDFS
KW - Hadoop
UR - https://www.scopus.com/pages/publications/84883275740
U2 - 10.1109/CTS.2013.6567222
DO - 10.1109/CTS.2013.6567222
M3 - Conference contribution
AN - SCOPUS:84883275740
SN - 9781467364027
T3 - Proceedings of the 2013 International Conference on Collaboration Technologies and Systems, CTS 2013
SP - 155
EP - 161
BT - Proceedings of the 2013 International Conference on Collaboration Technologies and Systems, CTS 2013
ER -