Popular "distributed-computing" questions | Page 2

I have a bunch of small files in an HDFS directory. Although the volume of the files is relatively small, …

java hadoop mapreduce distributed-computing

Say I have a distribute system on 3 nodes and my data is distributed among those nodes. for example, I have …

apache-spark distributed-computing

I am quite new for zookeeper port through which I am coming across from past few days. I introduced with …

java neo4j distributed-computing apache-zookeeper voltdb

I'm working through a Databricks example. The schema for the dataframe looks like: > parquetDF.printSchema root |-- department: struct (…

scala apache-spark apache-spark-sql distributed-computing databricks

Is there a way to concatenate datasets of two different RDDs in spark? Requirement is - I create two intermediate …

scala apache-spark apache-spark-sql distributed-computing rdd

I have a simple line: line = "Hello, world" I would like to convert it to an RDD with only one …

python apache-spark pyspark distributed-computing rdd

The ref says: Limit of total size of serialized results of all partitions for each Spark action (e.g. collect). …

apache-spark configuration driver communication distributed-computing

I'm trying to find an effective way of saving the result of my Spark Job as a csv file. I'm …

file csv hadoop apache-spark distributed-computing

I have a lot of spare intel linux servers laying around (hundreds) and want to use them for a distributed …

linux filesystems distributed-computing distributed-system

Could you please tell me the difference between Apache Spark and AKKA, I know that both frameworks meant to programme …

apache-spark parallel-processing akka distributed-computing

Top "Distributed-computing" questions