TABLE OF CONTENT
1. Amazon EMR2: Introduction EMR3: The Advantages Application Environment4. Job Run 5. Workers6. Getting Started8. Conclusion9. About CloudThat10. FAQs
Introduction to Amazon EMR
Amazon Elastic Map Reduce Serverless (EMR) is the latest deployment option of Amazon EMR. It offers a serverless environment that makes it easier to run analytics applications that use the most recent open-source frameworks like Apache Spark and Apache Hive. EMR Serverless makes it possible to eliminate the need to set up, secure, configure, and operate clusters in order to run these frameworks.
EMR Serverless offers open-source compatibility, concurrency and optimized runtime performance for popular frameworks.
EMR has many advantages
It prevents you from over- or under-provisioning resources to your data processing jobs.
It determines automatically the resources required to process your job application and releases them when it is finished.
You can pre-initialize resources to be used in cases where you need an application to respond within seconds, such interactive data analysis.
Environment for Application
EMR Serverless can be used to create EMR Serverless applications that utilize open-source analytics frameworks. Some predefined attributes are required to do this:
You must specify the Amazon EMR release version for the open-source framework version you wish to use.
You will need to specify the runtime that is most useful for the application, such Apache Spark or Apache Hive.
The EMR Serverless application runs only on Amazon VPCs, unlike other applications. AWS IAM policies should be defined to allow only certain IAM users and roles to access the application. You can also track the usage costs of the application.
EMR Serverless, a regional service, simplifies the way workloads are distributed across multiple Availability Zones within a Region.
Run for the job
EMR Serverless plays an important role in job run. A job run is nothing more than a request to an EMR Serverless app. It specifies that the application executes in asynchronous fashion and typically tracks through completion. To allow the job to access AWS resources, you must first submit a job. Multiple job requests can be submitted to an app. Each job run can use a different role to access AWS resources. Once it has received the job requests, an EMR Serverless app starts to execute them. It can also run multiple job requests simultaneously.
Workers
The EMR Serverless application uses workers internally to execute workloads. These workers are classified according to the type of application and Amazon EMR release. You can change the sizes when scheduling a job run.
EMR Serverless usually computes the resources after submitting a job based on the job’s application and the worker schedules. EMR Serverless then breaks down the workloads into tasks. EMR then downloads image provisions and sets up workers. Finally, EMR decommissions them once the job is done.
EMR Serverless can automatically scale up or down workers based on the work load. It also allows parallelism at all stages of the job. This allows us to leverage without having to estimate the worker count required by the application to run your workloads.
Getting Started
Amazon EMR Serverless- Simple Spark Application Demo
EMR Serverless Navigation
Log in to Amazon Console to open EMR Console
To navigate to the EMR Serverless homepage, click on the EMR Serverless link located to the left of the navigation page.
Click the Get Started button on the homepage.
EMR Studio Creation:
You will then need to create EMR Serverless apps. EMR Studio is required to create and manage EMR Serverless apps.
If you choose the EMR Studio option,
