R Tips
Hide the legend in a bar chart:
geom_col(show.legend = FALSE)
Insert new section
Ctrl-Shift-R
Identify na’s and incompletes:
df %>% filter(!complete.cases(.)) %>% View() This will show all cases with na/empty values
filter(age < 10 | age > 80)
Write to excel:
library(writexl)
write_excel_csv(estate, "~/Desktop/2020-07-20-12-33 estate.csv", na = 'NA', append = FALSE, delim = ',', quote_escape = 'double')
Setting the axis and formats:
scale_y_continuous(labels = dollar_format())+
scale_x_continuous()+
expand_limits(y=0)+
Rename ‘cells’ within a column by criteria:
mutate(street_address = fct_recode(street_address, "1040 Delaware Avenue","1040 Delaware Ave"))
See the built-in datasets:
data()
For long text in the x-axis labels, just rotate it vertically:
Theme(axis.text.x = element.text(angle=90, hjust=1)
Clean the column names:
tips %>%
map(janitor::clean_names)
Get rid of commas in numbers:
x <- c(23,460 , 12,340 , 3,451 , 57,999)
x <- mutate(x=readr::parse_number(x)
# this will give (23460 , 12340 , 3451 , 57999)
?What does ‘bind_rows’ do?
?Weight
count(address, wt=n, sort = TRUE)
To reorder by the sum of a numerical variable:
mutate(region = fct_reorder(region, n, sum))
No more coord_flip() Horizontal bar plots can be really useful, especially for categorical data whose levels have long names that overlap if placed on the x-axis. Previously, making horizontal bar plots required mapping the variable to be plotted to the x aesthetic and then applying a coord_flip() layer to flip the axes, e.g.
ggplot(penguins, aes(x = species)) +
geom_bar() +
coord_flip()
![][penguins-species-bar-coord-flip-1]
geom_bar() now works in both directions, so the categorical variable can be directly mapped to the y aesthetic to achieve the horizontal box plot.
ggplot(penguins, aes(y = species)) +
geom_bar()
![][penguins-species-bar-coord-flip-1-1]
Put the fct_reorder in ggplot:
penguins %>%
count(species) %>%
mutate(prop = n / sum(n)) %>%
ggplot(aes(x = prop, y = fct_reorder(species, prop))) +
geom_col()
![][penguins-species-props-bar-reorder-1]
No more gather / spread Previously you might have approached this with the gather()/spread() functions. But there is a new pair of much more intuitive functions in town (i.e. in the tidyr package): pivot_wider() for going from longer to wider data and pivot_longer() for going from wider to longer data. The following animated visualisation by Mara Averick does a fantastic job of visually explaining what we mean by pivoting the longer (or wider).
![][tidyr-longer-wider]
Which columns to pivot: any column that starts with the character string “body_mass”, what the name of the new variable where we put the names of the variables that are being pivoted should go to: names_to = “measurement”, and what the name of the new variable where the values of the the variables that are being pivoted should go to: values_to = “body_mass”.
penguins_madeup_long <- penguins_madeup_wide %>%
pivot_longer(
cols = starts_with("body_mass"),
names_to = "measurement",
names_prefix = "body_mass_",
values_to = "body_mass"
)
penguins_madeup_long
Using ‘summarise across’
penguins_madeup_wide %>%
summarise(across(
starts_with("body_mass"),
list(sample_mean = mean, sample_sd = sd)
))
mtcars %>%
group_by (cycl) %>%
summarise(across(c("mpg", "hp"), list(mean=mean, median=median, sd=sd)))
Turn blank into na:
mtcars%>%
na_if("")
Using summarise_all:
summarise_all(~mean(!is.na(.)))
Nice shortcut (reassignment pipe):
mtcars <- mtcars%>%
na_if("")
# Is the same as
mtcars%<>%
na_if("")
Use skimr: ref youtube watch?v=uG3igAGX7UE
multiple_choice_responses %>%
select_if(is.numeric) %>%
skimr::skim()
Find number of distinct values by col:
![][Image]
Separate a column and rows based on what you choose to separate it by
![][Image-1]
Order a graph correctly Add percent scales:
scale_y_continuous(labels = scales::percent)
For knitr output that show up as scientific notation (eg The total number of question attempts was r sum(expert$attempts)
.
yields 5.12312414 ^{5}
To correct this, just include the following line of code inside the setoptions-chunk in the beginning of a knitr document:
knitr::opts_chunk$set(echo = TRUE, options(scipen=999))
Change the x-axis breaks
scale_x_continuous(breaks = 1:9) # if there were nine seasons
Rotate text 90 degrees on the x-axis:
theme(axis.text.x = element_text(angle = 90, hjust = 1))
No Legend:
theme(legend.position = "none")
Ignore words in a blacklist in the df$word column:
blacklist_words = c("dog", "cat", "sheep:)
df %>%
filter(!word %in% blacklist_words)
In ggplot, order the bars correctly:
mutate(word = reorder_within (word, tf_idf, character) %>%
scale_x_reordered()+
Use a gradient of colors in ggplot:
scale_color_gradient2(low="blue", high="red", midpoint = 0.5, labels = scales::percent_format())
Change years into decades:
mutate(decade = 10* (year%/% 10))
Get the top 10 and bottom 10
slice(c(1:10, seq(n() - 10, n()))) %>%
Limit the number of digits in output of RMD documents:
knitr::opts_chunk$set(echo = TRUE, options(scipen=999, digits=1))
If you want to put additional material at the end of the document, as a sort of ‘notes’ that will be saved but not printed to the document or blog, use this at the end:
knitr::knit_exit()
#then more code or markdown after it won't be incorporated.
How to read a csv file from an online repository (e.g., GitHub) and glimpse it
#Find the url of the raw csv file
library(tidyverse)
df <- readr::read_csv("https://blah_blahEtc.com/blah")
df %>% glimpse()
#R