Big data is a concept that deals with data sets of extreme volumes.
As we are hearing often about apache zeppelin, So few questions comes to our mind: What is Apache zeppelin? What …
apache-spark bigdata apache-zeppelinI am working on a use case where I have to transfer data from RDBMS to HDFS. We have done …
hadoop apache-spark-sql sqoop bigdataIm trying to play with the reddit data on bigquery and I want to see comments and replies in one …
sql subquery google-bigquery reddit bigdataI can't understand reduceByKey(_ + _) in the first example of spark with scala object WordCount { def main(args: Array[String]): Unit = { …
scala apache-spark word-count bigdataLet say we have a table with 6 million records. There are 16 integer columns and few text column. It is read-only …
arrays performance postgresql join bigdataI have some expirience with Apache Spark and Spark-SQL. Recently I've found Apache Drill project. Could you describe me what …
hadoop apache-spark bigdata apache-drillI am using RStudio 0.97.320 (R 2.15.3) on Amazon EC2. My data frame has 200k rows and 12 columns. I am trying to …
performance r bigdataI have a set of n (~1000000) strings (DNA sequences) stored in a list trans. I have to find the minimum …
python algorithm bigdata hamming-distance