Notice: Undefined index: main in /home/u407412259/domains/emmalanglab.com/public_html/wp-content/plugins/kattene/plugin.php on line 44
Warning: call_user_func_array() expects parameter 1 to be a valid callback, function 'kattene_custom' not found or invalid function name in /home/u407412259/domains/emmalanglab.com/public_html/wp-includes/class-wp-hook.php on line 324
Warning: array_merge(): Expected parameter 1 to be an array, null given in /home/u407412259/domains/emmalanglab.com/public_html/wp-content/plugins/kattene/plugin.php on line 76
Notice: Trying to access array offset on value of type null in /home/u407412259/domains/emmalanglab.com/public_html/wp-content/plugins/kattene/plugin.php on line 79
Notice: Trying to access array offset on value of type null in /home/u407412259/domains/emmalanglab.com/public_html/wp-content/plugins/kattene/plugin.php on line 81
Notice: Trying to access array offset on value of type null in /home/u407412259/domains/emmalanglab.com/public_html/wp-content/plugins/kattene/plugin.php on line 87
Notice: Trying to access array offset on value of type null in /home/u407412259/domains/emmalanglab.com/public_html/wp-content/plugins/kattene/plugin.php on line 91
Notice: Trying to access array offset on value of type null in /home/u407412259/domains/emmalanglab.com/public_html/wp-content/plugins/kattene/plugin.php on line 91
A quite simple way to extract identical data using R (or RStudio) and an Excel file.
Creating A Dataset
(If you already have a dataset at hand, you may skip this section.)
This time, I will use an Excel file.
For instance, when I am writing posts related to newly learned English words from a book, I'm avoiding repeating the same words across the posts. To this end, I created an Excel file like the following image:
If you want to use a CSV file, you can select the format when you save the data (or when you rename it).
R's Script
R will extract the data we want by executing the following script.
library("openxlsx")
vocab_file <- read.xlsx("/Users/Emma/filename.xlsx") # storing the dataset into "vocab_file"
head(vocab_file) # checking the content if needed
library("dplyr")
duplicate <- vocab_file %>% # inside the "vocab_file,"
group_by(Word) %>% # in the column named "Word,"
filter(n()>1) # extract the rows that have identical value
duplicate <- duplicate[order(duplicate$Word),] # sorting the result in alphabetical order
duplicate # displaying the results
Line 1 & 2: for CSV files, it's much easier to use read.csv()
instead of read.xlsx()
. In that case, skip the first line which loads the library "openxlsx".
Also, the path to the file should be changed in accordance with where your file is located.
Line 10th: I sorted the data with duplicate <- duplicate[order(duplicate$Word)]
to organize the data. In this case, R reorders the English words in alphabetical order.
The result will be like the following:
If Any Duplicates
I intentionally inserted three duplicated lines in the file. So, R extracted those three and the original three as pairs.
> duplicate
# A tibble: 6 x 5
# Groups: Word [3]
`Ch#` n Word PoS Syn.Rephrase
<dbl> <chr> <chr> <chr> <chr>
1 1 8 dimpled cheeks n cheeks have natural dents; dimples
2 99 1 dimpled cheeks n cheeks have natural dents; dimples
3 4 14 impeccable (adj) adj flawless
4 100 14 impeccable (adj) adj flawless
5 5 5 initiate (n) n a person who has been initiated
6 100 5 initiate (n) n a person who has been initiated
If this isn't sorted in alphabetical order, the pair may be distant one another.
If No Duplicates
> duplicate
# A tibble: 0 x 5
# Groups: Word [0]
# … with 6 variables: `Ch#` <dbl>, n <chr>, Word <chr>, PoS <chr>, Syn.Rephrase <chr>
If there is no duplicates in the file, R shows comments like the above.
Learning The Basics of R
R for Data Science written by Dr. Hadley Wickhamis—currently Chief Scientist at RStudio—is a great resource to learn the basics of R. You can read it online (the above link) for free. There is also paperback and Kindle edition if you prefer them over online material. Highly recommended for beginners of Data Science or people self-teaching R.