Friday, May 18, 2018

Download CCA175 Dumps Practice Exam Questions 2018

Latest Updated CCA175 Exam dumps Questions from Exact2pass CCA175 PDF dumps! Welcome to download the newest Exact2pass CCA175 VCE dumps: 

Keywords: CCA175 exam dumps, CCA175 exam questions, CCA175 VCE dumps, CCA175 PDF dumps, CCA175 practice tests, CCA175 study guide, CCA175braindumps


Problem Scenario 71 :

Write down a Spark script using Python,
In which it read a file "Content.txt" (On hdfs) with following content.
After that split each row as (key, value), where key is first word in line and entire line as value.
Filter out the empty lines.
And save this key value in "problem86" as Sequence file(On hdfs)

Part 2 : Save as sequence file , where key as null and entire line as value. Read back the stored sequence files.

Hello this is
This is
Apache Spark Training
This is Spark Learning Session
Spark is faster than MapReduce

Answer: See the explanation for Step by Step Solution and configuration.

Solution :

Step 1 :

# Import SparkContext and SparkConf
from pyspark import SparkContext, SparkConf

Step 2:
#load data from hdfs
contentRDD = sc.textFile(MContent.txt")

Step 3:
#filter out non-empty lines
nonemptyjines = contentRDD.filter(lambda x: len(x) > 0)

Step 4:
#Split line based on space (Remember : It is mandatory to convert is in tuple} words = x: tuple(x.split('', 1))) words.saveAsSequenceFile("problem86")

Step 5: Check contents in directory problem86 hdfs dfs -cat problem86/part*

Step 6 : Create key, value pair (where key is null) line: (None, Mne}).saveAsSequenceFile("problem86_1")

Step 7 : Reading back the sequence file data using spark. seqRDD = sc.sequenceFile("problem86_1")

Step 8 : Print the content to validate the same.
for line in seqRDD.collect():

Problem Scenario 72 : You have been given a table named "employee2" with following detail.

first_name string
last_name string

Write a spark script in python which read this table and print all the rows and individual column values.

Answer: See the explanation for Step by Step Solution and configuration.
Solution :

Step 1 : Import statements for HiveContext from pyspark.sql import HiveContext

Step 2 : Create sqIContext sqIContext = HiveContext(sc)

Step 3 : Query hive
employee2 = sqlContext.sql("select' from employee2")

Step 4 : Now prints the data for row in employee2.collect(): print(row)

Step 5 : Print specific column for row in employee2.collect(): print( rst_name)

Problem Scenario 73 : You have been given data in json format as below.

{"first_name":"Ankit", "last_name":"Jain"}
{"first_name":"Amir", "last_name":"Khan"}
{"first_name":"Rajesh", "last_name":"Khanna"}
{"first_name":"Priynka", "last_name":"Chopra"}
{"first_name":"Kareena", "last_name":"Kapoor"}
{"first_name":"Lokesh", "last_name":"Yadav"}

Do the following activity

1. create employee.json file locally.
2. Load this file on hdfs
3. Register this data as a temp table in Spark using Python.
4. Write select query and print this data.
5. Now save back this selected data in json format.

Answer: See the explanation for Step by Step Solution and configuration.
Solution :

Step 1 : create employee.json tile locally.
vi employee.json (press insert) past the content.

Step 2 : Upload this tile to hdfs, default location hadoop fs -put employee.json
Step 3 : Write spark script

#lmport SQLContext
from pyspark import SQLContext

#Create instance of SQLContext sqIContext = SQLContext(sc)

#Load json file
employee = sqlContext.jsonFile("employee.json")

#Register RDD as a temp table employee.registerTempTablef'EmployeeTab"}

#Select data from Employee table
employeelnfo = sqlContext.sql("select * from EmployeeTab"}

#lterate data and print
for row in employeelnfo.collect():

Step 4 : Write dataas a Text file employeelnfo.toJSON().saveAsTextFile("employeeJson1")

Step 5: Check whether data has been created or not hadoop fs -cat employeeJsonl/part"


  1. Extra-Ordinary piece of work. Interesting concepts to read. Very much informative. Thanks for sharing. Waiting for your future posts.
    Tableau Training in Chennai
    Tableau Course in Chennai
    Tableau Certification
    <a href="”>Tableau Training in Adyar</a>

  2. provides authentic IT Certification exams preparation material guaranteed to make you pass in the first attempt. Download instant free demo & begin preparation.


Microsoft Azure Exam DP-200 Dumps Questions Answers [2020]

Microsoft DP-200 EXAM DESCRIPTION Microsoft DP-200 certification exam is one of the most asked for certification exams by the IT profess...