sparkSpark is a cluster computing system. It is faster as compared to other cluster computing systems (such as Hadoop). It provides hh-level APIs in Python, Scala, and Java. Parallel jobs are easy to write in Spark.Spark’s primary abstraction is a distributed collection of items called a Dataset. Datasets can be created from Hadoop InputFormats (such as HDFS files) or by transforming other Datasets.