Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of commodity hardware. It was created in 2005 and is designed to reliably handle large volumes of data and complex computations in a distributed fashion. The core of Hadoop consists of Hadoop Distributed File System (HDFS) for storage and Hadoop MapReduce for processing data in parallel across large clusters of computers. It is widely adopted by companies handling big data like Yahoo, Facebook, Amazon and Netflix.