Top "Bigdata" questions

Big data is a concept that deals with data sets of extreme volumes.

Hbase quickly count number of rows

Right now I implement row count over ResultScanner like this for (Result rs = scanner.next(); rs != null; rs = scanner.next()) { …

hadoop hbase bigdata
Hive ParseException - cannot recognize input near 'end' 'string'

I am getting the following error when trying to create a Hive table from an existing DynamoDB table: NoViableAltException(88@[]) at …

hadoop mapreduce hive bigdata amazon-dynamodb
How to copy data from one HDFS to another HDFS?

I have two HDFS setup and want to copy (not migrate or move) some tables from HDFS1 to HDFS2. How …

hadoop hdfs bigdata sqoop
How to create a large pandas dataframe from an sql query without running out of memory?

I have trouble querying a table of > 5 million records from MS SQL Server database. I want to select all …

python sql pandas bigdata
"Container killed by YARN for exceeding memory limits. 10.4 GB of 10.4 GB physical memory used" on an EMR cluster with 75GB of memory

I'm running a 5 node Spark cluster on AWS EMR each sized m3.xlarge (1 master 4 slaves). I successfully ran through a 146…

apache-spark emr amazon-emr bigdata
Best way to delete millions of rows by ID

I need to delete about 2 million rows from my PG database. I have a list of IDs that I need …

sql postgresql bigdata sql-delete postgresql-performance
Spark parquet partitioning : Large number of files

I am trying to leverage spark partitioning. I was trying to do something like data.write.partitionBy("key").parquet("/location") …

apache-spark spark-dataframe rdd apache-spark-2.0 bigdata
What is the actual difference between Data Warehouse & Big Data?

I know what is Data Warehouse & what is Big Data. But I am confused with Data Warehouse Vs Big …

database bigdata data-warehouse
Fastest way to compare row and previous row in pandas dataframe with millions of rows

I'm looking for solutions to speed up a function I have written to loop through a pandas dataframe and compare …

python performance pandas bigdata cython
DELETE records which do not have a match in another table

There are two tables linked by an id: item_tbl (id) link_tbl (item_id) There are some records in …

sql postgresql exists bigdata sql-delete