Talend Real-Time Big Data
Talend Real-Time Big Data Overview of Real-time Big Data Pre-requisites to run Setup & Talend License
Talend Real-Time Big Data Big Data Setup & About this cookbook What is the Talend Cookbook? Using the Talend Real-Time Big Data Platform, this Cookbook provides step-by-step instructions to built and run an end-2-end integration scenario. The demo is built on a real world usecase in the Retail industry and demonstrates how Talend, Spark, NoSQL and real-time messaging can be easily used together to provide real-time offers as part of an online shopping experience. Whether batch, streaming or realtime integration, understand how Talend can be used to address your big data challenges and move you into and beyond the sandbox stage.
Talend Real-Time Big Data Big Data Setup & About Talend What does Talend offer? At Talend, it s our mission to connect the data-driven enterprise, so our customers can operate in real-time with new insight about their customers, markets and business. Talend helps companies with big data challenges with the most advanced big data integration platform, used by businesses to deliver timely and easy access to all their data. Talend provides the industry s first data integration platform with native support for Apache Spark, Spark Streaming and Hadoop. Talend delivers unmatched data processing speed and enables any company to convert streaming big data or IoT sensor information into immediately actionable insights.
Talend Real-Time Big Data Big Data Setup & About Talend Big Data 1st Data Integration Platform on Apache Spark Visually develop jobs that run 100% on Spark: 5X times faster using independent benchmarks 10X developer productivity gained over hand-coding Spark 100X faster with in-memory processing Over 100 new drag-n-drop Spark components: HDFS, RDBMS, NoSQL, Cloud Storage, Transformation, Messaging, In-memory analytics & machine learning recommendations, and much more In-memory data caching & windowed computations Click to enable Spark Streaming for real-time data processing Convert Talend MapReduce jobs to Spark with the click of a button, future proofing your investment
Examples Talend Real-Time Big Data Big Data Setup & What is the Big Data? Virtual Environment Talend Real- Time Big Data Platform Sample scenarios pre-built and ready-to-run Data Real-time decisions The Talend Real-Time Big Data is a virtual environment that combines the Talend Real-Time Big Data Platform with some sample scenarios pre-built and ready-to-run. See how Talend can turn data into real-time decisions through sandbox examples that integrate Apache Kafka, Spark, Spark Streaming, Hadoop and NoSQL.
Talend Real-Time Big Data Big Data Setup & What Pre-requisites are required to run? Talend Platform for Big Data includes a graphical IDE (Talend Studio), teamwork management, data quality, and advanced big data features. To see a full list of features please visit Talend s Website: http://www.talend.com/products/platform-for-big-data You will need a Virtual Machine player such as VMWare, which can be downloaded from VMware Player Site Follow the VM Player install instructions from the provider The recommended host machine Memory 8GB Disk Space 20GB (10GB is for the image download)
Talend Real-Time Big Data Big Data Setup & How do I set-up & configure? Download the Virtual Machine file at www.talend.com/talend-big-data-sandbox. You will receive an email with a license key attachment and a second email with a list of support resources and videos. Follow the steps below to install and configure your Big Data : 1. Open the VMware Player. 2. Click on Open a Virtual Machine 3. Find the.ova file that you downloaded. Select it and click Open. 1 2 4. Select where you would like the disk to be stored on your local host machine: e.g. C:/vmware/sandbox 3a 4a 5. Click on Import. 3b 5 Note: The Username/Sudo Username = talend Password = talend Having trouble with configuration settings? click here for troubleshooting guide
Talend Real-Time Big Data Big Data Setup & How do I set-up & configure? (cont.) 6. Edit Settings if needed: a) Right-click NAT icon in upper-right corner and select settings. Check the setting to make sure the memory and processors are not too high for your host machine. b) It is recommended to have 8GB or more allocated to the VM and it runs very well with 10GB if your host machine can afford the memory. 6a 6b 7. The NAT Network Adaptor should already be configured for your VM. If it is not, you can add it by following the steps below: a) Click Add b) Select Network Adapter : NAT and select Next 7b 7c c) Once finished select Finish to return to the main Player home page. 8. Start the VM 7a 8
Talend Real-Time Big Data Big Data Setup & How do I set-up & configure? Follow the steps below to install and configure your Big Data (Cont.): 1. Click on Play Virtual Machine 2. The virtual machine starts loading 2 1
Talend Real-Time Big Data Big Data Setup & How do I set-up & configure? Follow the steps below to install and configure your Big Data (Cont.): 1. Once virtual machine has finished loading, you are brought to the login screen. Enter the password talend to continue 1
Talend Real-Time Big Data Big Data Setup & How do I setup the on Virtual machine? You should have been provided a license file by your Talend representative or by an automatic email from the Talend Real-time Big Data program. If you did not receive a license key click on link To obtain the license key: https://info.talend.com/prodevaltpbdrealtimesandboxdrive.html
Talend Real-Time Big Data Big Data Setup & How do I setup the on Virtual machine? This license file is required to open the Talend Studio and must reside within the VM. To get the license file on the VM: 1a 1b 1. Click the Download button of the license key document and click Save As, to save it on your laptop in a place you will be able to find it. 2 3 4 2. In the Virtual Player, click Files 3. Double-click Documents folder 4. Locate License Key document and Drag-and-Drop it into the Documents folder on the Virtual Player. Important Notes: For VirtualBox users, there is a known issue with Drag-and-drop functionality. The easiest way to get the Talend license file onto the VM is by saving it to a cloud storage site such as Dropbox.com or sending it to a web-based email client that you have access (such as gmail, yahoo, hotmail, etc ), then navigating to that location from within the Virtual Machine web browser to download the file.
Talend Real-Time Big Data Big Data Setup & Real-time Recommendation In this you will see a simple version of making your website an Intelligent Application. You will experience: Building a Spark Recommendation Model Setting up a new Kafka topic to help simulate live web traffic coming from Live web users browsing a retail web store. Customers Shopping Cart (Recommendation s) Channels Email Website Store NOSQL Streaming Window Updates Spark Engine (Recommendation) Most important you will see first-hand with Talend how you can take streaming data and turn it into real-time recommendations to help improve shopping cart sales. Internal Systems POS Clickstream. Streaming The following will help you see the value that using Talend can bring to your big data projects: The Real-time Recommendation is designed to illustrate the simplicity and flexibility Talend brings to using Spark in your Big Data Architecture.
Talend Real-Time Big Data Big Data Setup & Real-time Recommendation In this, you will see how you can Create a Kafka Topic Create a recommendation model Steam Live Recommendations Pipeline Create a Kafka Topic to Produce and Consume real-time streaming data Create a Spark recommendation model based on specific user actions See live streaming recommendations to a Cassandra NoSQL database for Fast Data access for a WebUI If you are familiar with the ALS model, you can update the ALS parameters to enhance the model or just leave the default values.
Talend Real-Time Big Data Big Data Setup & Real-time Recommendation REQUIRED Running a shell script: 1. From the Desktop, double click on the Start_Kafka Icon. If prompted for a password enter talend. 2. You can stop Kafka at any time by double-clicking on Stop_Kafka. If prompted for a password, enter talend. 1 2
Talend Real-Time Big Data Big Data Setup & Real-time Recommendation REQUIRED Starting Talend Studio: The first time you start up Talend Studio you have to browse for the license 1 1. To begin, Click on Talend- Studio 2. Click My product license is on the local file system then click Browse 2a 2b 3b 3. Navigate the Documents folder. Click on the license file you downloaded 3a 4. Click OK then click Next 5. Talend Real-Time Big Data Platform window pops up, let it load, and when complete click Finish 4 5
Talend Real-Time Big Data Big Data Setup & Real-time Recommendation To execute the Real-time Recommendation : First, a Kafka topic must be created. This task can be completed by executing the following job 1. Navigate to the job designs folder: 2. Click on Standard Jobs > Realtime_Recommendation_ 3. Double click on OneTime_Create_Clickstream_Kafka_Topic 0.1 This opens the job in the designer window 4. From the Run tab, click on Run to execute 1 2 3 3b Now you can generate the recommendation model by loading the product ratings data into the Alternating Least Squares (ALS) Algorithm. Rather than coding a complex algorithm with Scala, a single Spark component available in Talend Studio simplifies the model creation process. The resultant model can be stored in HDFS or in this case, locally. 4 If you are familiar with the ALS model, you can update the ALS parameters to enhance the model or just leave the default values.
Talend Real-Time Big Data Big Data Setup & Real-time Recommendation Run the job to generate the recommendation model 1. Navigate to the job designs folder: 2. Click on BigData batch > Realtime_Recommendations_ 3. Double click on Build_Recommendation_Model_with _Spark This opens the job in the designer window. 4. From the Run tab, click on Run to execute 1 2 3 3b With the Recommendation model created, your lookup tables populated and your Kafka topic ready to consume data, you can now stream your Clickstream data into your Recommendation model and put the results into your Cassandra tables for reference from a WebUI. 4
Talend Real-Time Big Data Big Data Setup & Real-time Recommendation 1. Navigate to the job designs folder: 2. Click on Standard Jobs > Realtime_Recommendations_ 3. Double click on Push_Clickstream_To_Kafka 0.1 This opens the job in the designer window First, lets look quickly at the Push_Clickstream_To_Kafka job. This job is setup to simulate real-time streaming of web traffic and clickstream data into a kafka topic that will then be consumed by our recommendation engine to produce our recommendations. We are reviewing this job now. It will be executed in the next few steps
Talend Real-Time Big Data Big Data Setup & Real-time Recommendation 1. Navigate to the job designs folder: 2. Click on Big Data Streaming > Realtime_Recommendation_ 3. Double click on Realtime_Recommendation_Engine _Pipeline 0.1 This opens the job in the designer window
Talend Real-Time Big Data Big Data Setup & Real-time Recommendation Next, take a look at the Realtime_Recommendation_Engine _Pipeline job. A In this job, you will see the input is your Kafka Consumer of Clickstream Data. The data will be fed into your Recommendation Engine to produce Real-time offers based on the current user s activity. A Using the twindow component, you can control how often you send recommendations. Your recommendations are sent to 3 output streams - the execution window for viewing purposes, flat file for later processing in your Big Data Analytics environment and to a Cassandra table for use in your Fast Data layer by your WebUI. B B Click on Run to Start Recommendation Engine
Talend Real-Time Big Data Big Data Setup & Real-time Recommendation With your recommendation engine running, you can start sending data to your Kafka topic. 3 1 1. Navigate back to the Push_Clickstream_To_Kafka job and 2. Click Run on the run tab to execute 3. Once this job starts switch back over to the Recommendation Engine job 4. Watch the execution output window. You will now see your real-time data coming through with recommended products based on your Recommendation Model. 2 Your recommendations are also written to a Cassandra database so they can be referenced by a WebUI to offer, for instance, last minute product suggestions when a customer is about to check-out. 5 5. Once you have seen the results, you can kill the Recommendation Engine to stop the streaming recommendations. 4
Talend Real-Time Big Data Big Data Setup & Conclusion Product recommendations have evolved ETL it would take weeks to gather and process required data MapReduce Now you can process even more data then before in hours rather then days and weeks Spark NOW you can process even more in minutes and even seconds The good news is that With Talend, it is now just a few clicks to make this type of transformation a reality. What are your next steps? Now that you understand how you can address your big data opportunities using Talend... Let s take one final look at how Talend will help you The next step would be to discuss with your Talend sales representative your specific requirements and how Talend can help Jumpstart your big data project into production.
Talend Real-Time Big Data Big Data Setup & Conclusion How will Talend help you? Talend vastly simplifies big data integration First, Talend vastly simplifies big data integration, allowing you to leverage in-house resources to use Talend's rich graphical tools that generate big data code (Spark, MapReduce, PIG, Java) for you. Talend is based on standards such as Eclipse, Java, and SQL, and is backed by a large collaborative community. So you can up skill existing resources instead of finding new resources. Talend is built for batch and real-time big data. Second, Talend is built for batch and real-time big data. Unlike other solutions that map to big data or support a few components, Talend is the first data integration platform built on Spark with over 100 Spark components. Whether integrating batch (MapReduce, Spark), streaming (Spark), NoSQL, or in real-time, Talend provides a single tool for all your integration needs. Talend s native Hadoop data quality solution delivers clean and consistent data at infinite scale. Talend lowers operations costs And third, Talend lowers operations costs. Talend s zero footprint solution takes the complexity out of integration deployment, management, maintenance A usage based subscription model provides a fast return on investment without large upfront costs.