R Tips

3 minute read

Hide the legend in a bar chart:

geom_col(show.legend = FALSE)    

Insert new section

Ctrl-Shift-R

Identify na’s and incompletes:

    df %>% filter(!complete.cases(.)) %>% View()   This will show all cases with na/empty values   
	filter(age < 10 | age > 80)

Write to excel:

    library(writexl)
    write_excel_csv(estate, "~/Desktop/2020-07-20-12-33 estate.csv", na = 'NA', append = FALSE, delim = ',', quote_escape = 'double')  

Setting the axis and formats:

  scale_y_continuous(labels = dollar_format())+
  scale_x_continuous()+
  expand_limits(y=0)+

Rename ‘cells’ within a column by criteria:

    mutate(street_address = fct_recode(street_address, "1040 Delaware Avenue","1040 Delaware Ave"))

See the built-in datasets:

data()

For long text in the x-axis labels, just rotate it vertically:

Theme(axis.text.x = element.text(angle=90, hjust=1)

Clean the column names:

tips %>%
map(janitor::clean_names)

Get rid of commas in numbers:

x <- c(23,460 , 12,340 , 3,451 , 57,999)
x <- mutate(x=readr::parse_number(x)
# this will give (23460 , 12340 , 3451 , 57999)

?What does ‘bind_rows’ do?

?Weight

count(address, wt=n, sort = TRUE)

To reorder by the sum of a numerical variable:

    mutate(region = fct_reorder(region, n, sum))

No more coord_flip() Horizontal bar plots can be really useful, especially for categorical data whose levels have long names that overlap if placed on the x-axis. Previously, making horizontal bar plots required mapping the variable to be plotted to the x aesthetic and then applying a coord_flip() layer to flip the axes, e.g.

ggplot(penguins, aes(x = species)) +
  geom_bar() +
  coord_flip()
![][penguins-species-bar-coord-flip-1]
geom_bar() now works in both directions, so the categorical variable can be directly mapped to the y aesthetic to achieve the horizontal box plot.
ggplot(penguins, aes(y = species)) +
  geom_bar()
![][penguins-species-bar-coord-flip-1-1]

Put the fct_reorder in ggplot:

penguins %>%
  count(species) %>%
  mutate(prop = n / sum(n)) %>%
  ggplot(aes(x = prop, y = fct_reorder(species, prop))) +
  geom_col()
![][penguins-species-props-bar-reorder-1]

No more gather / spread Previously you might have approached this with the gather()/spread() functions. But there is a new pair of much more intuitive functions in town (i.e. in the tidyr package): pivot_wider() for going from longer to wider data and pivot_longer() for going from wider to longer data. The following animated visualisation by Mara Averick does a fantastic job of visually explaining what we mean by pivoting the longer (or wider).

![][tidyr-longer-wider]

Which columns to pivot: any column that starts with the character string “body_mass”, what the name of the new variable where we put the names of the variables that are being pivoted should go to: names_to = “measurement”, and what the name of the new variable where the values of the the variables that are being pivoted should go to: values_to = “body_mass”.

penguins_madeup_long <- penguins_madeup_wide %>%
  pivot_longer(
    cols = starts_with("body_mass"),
    names_to = "measurement",
    names_prefix = "body_mass_",
    values_to = "body_mass"
  )

penguins_madeup_long

Using ‘summarise across’

penguins_madeup_wide %>%
  summarise(across(
    starts_with("body_mass"),
    list(sample_mean = mean, sample_sd = sd)
  ))

mtcars %>%
  group_by (cycl) %>%
  summarise(across(c("mpg", "hp"), list(mean=mean, median=median, sd=sd)))

Turn blank into na:

  mtcars%>%
    na_if("")

Using summarise_all:

summarise_all(~mean(!is.na(.)))

Nice shortcut (reassignment pipe):

mtcars <- mtcars%>%
    na_if("")

# Is the same as
   mtcars%<>%
     na_if("")

Use skimr: ref youtube watch?v=uG3igAGX7UE

    multiple_choice_responses %>%
    select_if(is.numeric) %>%
    skimr::skim()

Find number of distinct values by col:

![][Image]

Separate a column and rows based on what you choose to separate it by

![][Image-1]

Order a graph correctly Add percent scales:

    scale_y_continuous(labels = scales::percent)

For knitr output that show up as scientific notation (eg The total number of question attempts was r sum(expert$attempts). yields 5.12312414 ^{5}

To correct this, just include the following line of code inside the setoptions-chunk in the beginning of a knitr document:

knitr::opts_chunk$set(echo = TRUE, options(scipen=999))

Change the x-axis breaks

scale_x_continuous(breaks = 1:9) # if there were nine seasons

Rotate text 90 degrees on the x-axis:

    theme(axis.text.x = element_text(angle = 90, hjust = 1))

No Legend:

    theme(legend.position = "none")

Ignore words in a blacklist in the df$word column:

    blacklist_words = c("dog", "cat", "sheep:)
    df %>%
       filter(!word %in% blacklist_words)

In ggplot, order the bars correctly:

mutate(word = reorder_within (word, tf_idf, character) %>%
scale_x_reordered()+

Use a gradient of colors in ggplot:

scale_color_gradient2(low="blue", high="red", midpoint = 0.5, labels = scales::percent_format())

Change years into decades:

mutate(decade = 10* (year%/% 10))

Get the top 10 and bottom 10

slice(c(1:10, seq(n() - 10, n()))) %>%

Limit the number of digits in output of RMD documents:

knitr::opts_chunk$set(echo = TRUE, options(scipen=999, digits=1))

If you want to put additional material at the end of the document, as a sort of ‘notes’ that will be saved but not printed to the document or blog, use this at the end:

knitr::knit_exit()

then more code or markdown after it won’t be incorporated.

Share on

Twitter Facebook LinkedIn

Charles Snyder

R Tips

then more code or markdown after it won’t be incorporated.

Share on

You May Also Enjoy

The Gale-Shapley Algorithm (you’ve never heard of) and the national residency match program

Pandas Help

Divorce Statistics

Simpson’s Paradox