Apache Hadoop 2.0: Operations Management with the Hortonworks Data Platform
This four-day Apache Hadoop 2.0 training course is designed for administrators who deploy and manage Apache Hadoop 2.0 clusters. Through a combination of lecture and hands-on exercises you will learn how to install, configure, maintain and scale your Hadoop 2.0 environment. At the end of this course you will have a solid understanding of how Hadoop works with Big Data and through the hands-on exercises will have completed the Hadoop deployment lifecycle for a multi-node cluster.
This was was formerly known as 'Administering Apache Hadoop'
The Hadoop Administration course spans four days and provides a solid foundation for management of your Hadoop clusters.. A full outline is below:
In this course you will learn the best practices for Apache Hadoop 2.0 administration as experienced by the developers and architects of core Apache Hadoop.
- How to size and deploy a cluster
- How to deploy a cluster for the first time
- How to configure Hadoop and the supporting frameworks
- How to perform ongoing maintenance to nodes in the cluster
- How to balance and performance tune a cluster
- How to move and manage data within a cluster
- How to integrate status and health checks into your existing monitoring tools (single pane of glass)
- How to add and remove DataNodes
- How to Implement a high available solution
- Best practices for deploying Hadoop clusters
This course utilizes a Linux environment. Attendees should know how to navigate and modify files within a Linux environment. Existing knowledge of Hadoop is not required.
This course is designed for IT administrators and operators responsible for installing, configuring and supporting an Apache Hadoop 2.0 deployment in a Linux environment.
Day 1: Foundation, Planning and Installation
- Introduction to Hortonworks Data Platform & Hadoop 2.0
- Hadoop Storage: HDFS Architecture
- Installation Prerequisites
- HDP Management: Ambari
- Ambari and the Command Line
- Hadoop Operating System (YARN) & MapReduce
Day 2: Configuration / Data Management
- Configuring Services
- Configuring HDFS
- Configuring Hadoop Operating System (YARN) & MapReduce
- Configuring HBase
- Configuring ZooKeeper
- Configuring Schedulers
- Data Integrity
- Extract-Load-Transform (ELT) Data Movement
- Copying Data Between Clusters
Day 3: Data Management / Hortonworks Data Platform (HDP) 2.0 Operations
- HDFS Web Services
- Apache Hive Data Warehouse
- Transferring data with Sqoop
- Moving Log Data with Flume
- Setting up the HDFS NFS Gateway
- Workflow Management: Oozie
- Data Lifecycle Management with Falcon
- Monitoring HDP 2.0 Services
- Commissioning and Decommissioning a Nodes and Services
Day 4: Hortonworks Data Platform (HDP) 2.0 Operations
- Rack Awareness and Topology
- NameNode Federation Architecture
- NameNode High-Availability (HA) Architecture
- Backup & Recovery
- Tuning & Benchmarking
All necessary equipment and infrastructure required to perform lab exercises are provided.
Unlimited teas, coffees & soft drinks provided.
Cancellation & Reschedule Policy
You must provide a written notice to Big Data Partnership at least 2 weeks' prior to the start of the class if you cannot attend this class. Big Data Partnership will transfer your registration to a future class of equal or lesser value.
Students who fail to cancel within 2 weeks' and/or do not attend the class, will not receive a refund and will be charged the full amount.
Big Data Partnership can cancel or reschedule at any time at our discretion. In the event that the class is cancelled or rescheduled, we will work with you to apply your registration to another date or refund your fee in full. Big Data Partnership is not responsible for non-refundable travel or other expenses incurrred by the student.
If you have any questions concerning this class, please do not hesitate to contact firstname.lastname@example.org.
When & Where
Big Data Partnership
Big Data Partnership is the leading European-based big data service provider.
Our team has deep expertise across a wide range of big data technologies and data science techniques.
Our recent projects have included:
- the Apache Hadoop ecosystem,
- Apache Spark,
- Apache Cassandra
And a range of other NoSQL databases & search technologies.
Big Data Partnership helps organisations across all industries become more data-driven by reducing costs and grasping new big-data opportunities, rapidly and at low risk.
We help you Discover why and how to become data driven; we work with you to Develop and prove the value of this approach; we Deliver cost effective solutions which exploit faster and more scalable technology. We reduce risk by Training your staff in the necessary new skills and by providing Support.
For more information, visit http://www.bigdatapartnership.com.