Top "Distributed-computing" questions

utilizing more than one computer, connected to each other with a communication link to accomplish a common task.

Change File Split size in Hadoop

I have a bunch of small files in an HDFS directory. Although the volume of the files is relatively small, …

java hadoop mapreduce distributed-computing
How does Spark aggregate function - aggregateByKey work?

Say I have a distribute system on 3 nodes and my data is distributed among those nodes. for example, I have …

apache-spark distributed-computing
what is zookeeper port and its usage?

I am quite new for zookeeper port through which I am coming across from past few days. I introduced with …

java neo4j distributed-computing apache-zookeeper voltdb
Exploding nested Struct in Spark dataframe

I'm working through a Databricks example. The schema for the dataframe looks like: > parquetDF.printSchema root |-- department: struct (…

scala apache-spark apache-spark-sql distributed-computing databricks
Concatenating datasets of different RDDs in Apache spark using scala

Is there a way to concatenate datasets of two different RDDs in spark? Requirement is - I create two intermediate …

scala apache-spark apache-spark-sql distributed-computing rdd
Convert a simple one line string to RDD in Spark

I have a simple line: line = "Hello, world" I would like to convert it to an RDD with only one …

python apache-spark pyspark distributed-computing rdd
What is spark.driver.maxResultSize?

The ref says: Limit of total size of serialized results of all partitions for each Spark action (e.g. collect). …

apache-spark configuration driver communication distributed-computing
How to write to CSV in Spark

I'm trying to find an effective way of saving the result of my Spark Job as a csv file. I'm …

file csv hadoop apache-spark distributed-computing
Best distributed filesystem for commodity linux storage farm

I have a lot of spare intel linux servers laying around (hundreds) and want to use them for a distributed …

linux filesystems distributed-computing distributed-system
Apache Spark vs Akka

Could you please tell me the difference between Apache Spark and AKKA, I know that both frameworks meant to programme …

apache-spark parallel-processing akka distributed-computing