Top "Data-cleaning" questions

Data cleaning is the process of removing or repairing errors, and normalizing data used in computer programs.

Python pandas groupby aggregate on multiple columns, then pivot

In Python, I have a pandas DataFrame similar to the following: Item | shop1 | shop2 | shop3 | Category ------------------------------------ Shoes| 45 | 50 | 53 | Clothes TV | 200 | 300 | 250 | …

python pandas dataframe pivot data-cleaning
Python Pandas replace multiple columns zero to Nan

List with attributes of persons loaded into pandas dataframe df2. For cleanup I want to replace value zero (0 or '0…

python pandas dataframe data-cleaning
Find all columns of dataframe in Pandas whose type is float, or a particular type?

I have a dataframe, df, that has some columns of type float64, while the others are of object. Due to …

python pandas dataframe data-cleaning
pandas.to_numeric - find out which string it was unable to parse

Applying pandas.to_numeric to a dataframe column which contains strings that represent numbers (and possibly other unparsable strings) results …

python pandas data-science data-cleaning
Removing non-English words from text using Python

I am doing a data cleaning exercise on python and the text that I am cleaning contains Italian words which …

python data-science data-cleaning
Avoiding type conflicts with dplyr::case_when

I am trying to use dplyr::case_when within dplyr::mutate to create a new variable where I set some …

r dplyr data-cleaning
How do I delete observations with no data in Stata?

I have data with IDs which may or may not have all values present. I want to delete ONLY the …

stata data-cleaning
How to remove carriage return in a dataframe

I am having a dataframe that contains columns named id, country_name, location and total_deaths. While doing data cleaning …

python pandas replace carriage-return data-cleaning
How to match a string and white space in R

I have a dataframe with columns having values like: "Average 18.24" "Error 23.34". My objective is to replace the text and following …

regex r data-cleaning
cleaning data with dropna in Pyspark

I'm still relatively new to Pyspark. I use version 2.1.0. I'm trying to clean some data on a much larger data …

pyspark data-cleaning