Creating a waffle plot together with facets in ggplot2Round vector of numerics to integer while preserving their sumRotating and spacing axis labels in ggplot2Plot two graphs in same plot in Rggplot2 position='dodge' producing bars that are too wideggplot2 stacked bar plot order broken after updatePlot vector as barplotR qplot - plotting two facets with sums on the y-axis instead of countsPrevent factor level being dropped in ggplot facetted histogramsDimension of the multi barplot using ggplot2 facet_wrap() in RHow to plot faceted histogram (not bar charts) with percents relative to each facet?ggplot2 to plot mean and sd of a variable side by side
Did the Allies reverse the threads on secret microfilm-hiding buttons to thwart the Germans?
What could a technologically advanced but outnumbered alien race do to destroy humanity?
How does kinetic energy work in braking a vehicle?
Right Ascension for epoch 2000 - physical location?
Why do some web servers still provide information on vendor and version in the HTTP response headers
Simple code that checks if you're old enough to drive
Why do the new Star Trek series have so few episodes in each season?
Does Dim Light created by an effect override Bright Light in an area?
What do you call this when cats hunch their backs and their fur stands on end?
Why do some switching regulator require tantalum or electrolytic capacitors instead of ceramic?
Why does Bane's stock exchange robbery actually work to bankrupt Bruce Wayne?
Substantivization of "continuum"
Make a haystack (with a needle)
I am particularly fascinated by the Chinese character that is pronounced SHIN & means faith or belief
How can I filter an EntityClass by _not_ having a property?
Is there any algorithm that runs faster in Mathematica than in C or Fortran?
What's the most profitable use for an elemental transmuter?
why we need self-synchronization?
Is dark matter inside galaxies different from dark matter in intergalactic space?
NFL football incentives
Why is Eastern Switzerland called Suisse orientale in French?
Is the ''yoi'' meaning ''ready'' when doing karate the same as the ''yoi'' which means nice/good?
Why is my plastic credit card and activation code sent separately?
Protecting Seals from Forgery
Creating a waffle plot together with facets in ggplot2
Round vector of numerics to integer while preserving their sumRotating and spacing axis labels in ggplot2Plot two graphs in same plot in Rggplot2 position='dodge' producing bars that are too wideggplot2 stacked bar plot order broken after updatePlot vector as barplotR qplot - plotting two facets with sums on the y-axis instead of countsPrevent factor level being dropped in ggplot facetted histogramsDimension of the multi barplot using ggplot2 facet_wrap() in RHow to plot faceted histogram (not bar charts) with percents relative to each facet?ggplot2 to plot mean and sd of a variable side by side
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;
Is there any easy way to create waffle plots in combination with facets in ggplot2, or combing with the waffle package?
For example, replacing every bar in the below with 100 squares to represent 1 percent.
ggplot(mtcars, aes(x = factor(vs), y = hp, fill = as.factor(carb))) +
geom_bar(stat = 'identity', position = 'fill') +
facet_wrap('gear')
r ggplot2 waffle-chart
add a comment
|
Is there any easy way to create waffle plots in combination with facets in ggplot2, or combing with the waffle package?
For example, replacing every bar in the below with 100 squares to represent 1 percent.
ggplot(mtcars, aes(x = factor(vs), y = hp, fill = as.factor(carb))) +
geom_bar(stat = 'identity', position = 'fill') +
facet_wrap('gear')
r ggplot2 waffle-chart
add a comment
|
Is there any easy way to create waffle plots in combination with facets in ggplot2, or combing with the waffle package?
For example, replacing every bar in the below with 100 squares to represent 1 percent.
ggplot(mtcars, aes(x = factor(vs), y = hp, fill = as.factor(carb))) +
geom_bar(stat = 'identity', position = 'fill') +
facet_wrap('gear')
r ggplot2 waffle-chart
Is there any easy way to create waffle plots in combination with facets in ggplot2, or combing with the waffle package?
For example, replacing every bar in the below with 100 squares to represent 1 percent.
ggplot(mtcars, aes(x = factor(vs), y = hp, fill = as.factor(carb))) +
geom_bar(stat = 'identity', position = 'fill') +
facet_wrap('gear')
r ggplot2 waffle-chart
r ggplot2 waffle-chart
edited Mar 14 at 20:25
Marcus Campbell
2,1553 gold badges13 silver badges30 bronze badges
2,1553 gold badges13 silver badges30 bronze badges
asked Oct 10 '18 at 13:43
MLENMLEN
8061 gold badge6 silver badges20 bronze badges
8061 gold badge6 silver badges20 bronze badges
add a comment
|
add a comment
|
3 Answers
3
active
oldest
votes
I'm not sure I'll ever get around to stat_waffle()
/ geom_waffle()
but you can just use the logic in the package to do the same thing the long way round:
library(hrbrthemes)
library(tidyverse)
We need to figure out the percents then get each group to sum to 100 even, so we need a helper function that's been around on SO for a while:
smart_round <- function(x, digits = 0) # somewhere on SO
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
There are 2 bits of "magic" in the waffle
package. One bit is the part of the algorithm that just replicates the factor components the right number of times. We'll apply the following function row-wise to the data frame we'll make:
waffleize <- function(xdf)
data_frame(
gear_vs = rep(xdf$gear_vs, xdf$pct),
carb = rep(xdf$carb, xdf$pct)
)
Now we need to:
- do the same math ggplot2 did for your example
- calculate the percents for the groups
- turn the % into parts of 100 and make each group even total to 100
- since we can't have two different
geom_tile()
s in one panel we need to hack a facet name that'll do the same thing - group by said hack
- replicate each row
pct
times - make sure things are ordered right
- join a 10x10
x
/y
grid for as many facets as we'll have (in this case, 6); this is the other bit of the waffle package's "magic" - draw the tiles
^^ translates to 👇 (this pipe chain is a wee bit long for my comfort level, but "it works"):
count(mtcars, gear, vs, carb, wt=hp) %>%
group_by(gear, vs) %>%
mutate(pct = n/sum(n)) %>%
mutate(pct = (smart_round(pct, 1) * 100L) %>% as.integer()) %>%
select(-n) %>%
ungroup() %>%
mutate(carb = as.character(carb)) %>%
mutate(gear_vs = sprintf("%s-%s", gear, vs)) %>%
select(gear_vs, carb, pct, -gear, -vs) %>%
rowwise() %>%
do(waffleize(.)) %>%
ungroup() %>%
arrange(gear_vs, carb) %>%
bind_cols(
map_df(seq_len(length(unique(.$gear_vs))), ~expand.grid(y = 1:10, x = 1:10))
) %>%
ggplot(aes(x, y)) +
geom_tile(aes(fill=carb), color="white", size=0.5) +
ggthemes::scale_fill_tableau() +
facet_wrap(~gear_vs) +
coord_equal() +
labs(x=NULL, y = NULL) +
hrbrthemes::theme_ipsum_rc(grid="") +
theme(axis.text=element_blank())
Are the percentages are rounded wrong for 3-0, 4-1 and 5-0?
– Mikko
Oct 26 '18 at 7:21
add a comment
|
A little addition to @hrbrmstr excellent answer (thanks for making the package and sharing the code with us!). I also have struggled to make such plots, since I think this is a better way of visualizing data when doing comparisons than pie charts. My philosophy with waffle charts differs from the existing solutions (waffle and ggwaffle) in three ways:
- The waffles should always be presented in a 10x10% grid with aspect ratio 1, as in required OP's question and in @hrbrmstr answer. This makes reading the percentages easier.
- The waffles should be filled from left to right (reading direction) and from bottom upwards (like water in a glass, unlike in the existing solutions). This makes reading the percentages more natural to me.
- The 1% cells could be divided to accommodate fractional percentages. Groups that contribute <1% disappear when using full percentages, yet in the applications I have used these plots, missing groups have a different meaning than (0,1) groups.
Because it took me quite some time to solve the issues mentioned above, I post my solution under. "The magic" (i.e. the approach) behind the code has been scavenged from the existing packages. I hope this helps someone and pushes the waffle chart development forward. I really think these charts have a potential in data visualization. It would be nice to implement the functions as ggplot2
proto
objects, but after trying I had to give up. I do not understand that proto
system well enough to write code for it. I just copy the code I intend to include in my R package once I have added all the features I plan to add (the plan is to make waffle charts work similarly to scatterpie
). Note that a lot of the code below is to make the waffles pretty. The part that actually bakes the waffles is not that complicated and is well explained in @hrbrmstr answer.
Functions required to run the example:
library(dplyr)
library(ggplot2)
#' @title Convert line sizes measured as points to ggplot line sizes
#' @description Converts line sizes measured as points (as given by most programs such as Adobe Illustrator etc.) to ggplot font sizes
#' @param x numeric vector giving the lines sizes in points
#' @return Returns a numeric vector of lenght codex of ggplot line sizes
#' @keywords internal
#' @export
#'
LS <- function(x) x/2.13
#' @title Round values preserving total sums
#' @description The function rounds values preserving total sums
#' @param x numeric vector of values to be rounded
#' @param digits integer indicating the number of decimal places. See codelink[base]round.
#' @return Returns a numeric vector.
#' @author The function is written as a communal effort. Main authors are hrefhttps://stackoverflow.com/questions/32544646/round-vector-of-numerics-to-integer-while-preserving-their-sumjosliber and hrefhttps://www.r-bloggers.com/round-values-while-preserve-their-rounded-sum-in-r/BioStatMatt.
#' @keywords internal
#' @family waffle
#' @export
round_preserve_sum <- function(x, digits = 0)
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
# from https://www.r-bloggers.com/round-values-while-preserve-their-rounded-sum-in-r/
#' @title Prepare data for waffle plots
#' @description Manipulates a data frame ready for plotting with the codelinkwaffle_chart function.
#' @param dt data frame containing the data which should be transformed
#' @param fill character specifying the column name which should be used as fill for the waffle plot.
#' @param value character specifying the column name which contains values of the codefill variable.
#' @param composition logical indicating whether a compositional waffle (i.e. fill adds up to 100%) should be created. If codeFALSE, waffle cells will be scaled to codemax_value and missing cells filled with an "empty" category.
#' @param max_value numerical giving the value to which waffle cells should be scaled to, if codecomposition = FALSE.
#' @param digits integer indicating the number of decimal places to be used in rounding of the waffle cells.
#' @return returns a link[tibble]tibble data frame containing the minimum and maximum extent of each codefill level.
#' @author Mikko Vihtakari
#' @keywords internal
#' @family waffle
#' @import dplyr
#' @export
waffleize <- function(dt, fill, value, composition = TRUE, max_value = NULL, digits = 3)
x <- dt[c(fill, value)]
names(x) <- c("variable", "value")
if(composition)
x$value <- round_preserve_sum(10^digits*x$value/sum(x$value))
else
if(is.null(max_value)) stop("max_value has to be given, if composition = FALSE")
if(max_value < sum(x$value)) stop("max_value has to be larger than the sum of 'value' column. Use composition = TRUE, if you want a compositional waffle chart")
x <- rbind(x, data.frame(variable = "empty", value = max_value - sum(x$value)))
x$value <- round_preserve_sum(10^digits*x$value/max_value)
if(!is.factor(x$variable)) x$variable <- factor(x$variable, levels = c(sort(unique(x$variable)[unique(x$variable) != "empty"]), "empty"))
x <- x[order(x$variable),]
#tmp <- data.frame(X = 1:100, ymin = rep(c(0, (1:9)*10), each = 100), ymax = rep((1:10)*10, each = 100), variable = rep(dt[[fill]], dt[[value]]))
tmp <- data.frame(X = 1:10^(digits-1), ymin = rep(c(0, (1:9)*10^(digits-2)), each = 10^(digits-1)), ymax = rep((1:10)*10^(digits-2), each = 10^(digits-1)), variable = rep(x$variable, x$value))
out <- tmp %>% group_by(variable, ymin, ymax) %>% summarise(xmin = min(X)-1, xmax = max(X))
## Remove the empty category
out[out$variable != "empty",]
#' @title Plot waffle charts
#' @description The function uses link[ggplot2]ggplot2 to create waffle charts from data.
#' @param data data frame to be plotted
#' @param fill character specifying the column name which should be used as fill for the waffle plot.
#' @param value character specifying the column name which contains values of the codefill variable. Will be used to fill the waffle cells.
#' @param facet character specifying the column name which should be used to codelink[ggplot2]facet_wrap waffle charts.
#' @param ncol number of columns to be used in facetting. See codelink[ggplot2]facet_wrap.
#' @param composition logical indicating whether a compositional waffle (i.e. fill adds up to 100%) should be created. If codeFALSE, waffle cells will be scaled to codemax_value and missing cells filled with an "empty" category.
#' @param max_value numerical giving the value to which waffle cells should be scaled to, if codecomposition = FALSE.
#' @param digits integer indicating the number of decimal places to be used in rounding of the waffle cells. The value 3 indicates percentages, while 4 permilles.
#' @param fill_colors named character vector giving the colors for codefill levels. See codelink[ggplot2]scale_fill_manual.
#' @param fill_title character giving the title for the color legend.
#' @param base_size numeric giving the base size for the plot. See codelink[ggplot2]theme_void.
#' @param legend.position character specifying the position of the legend. See codelink[ggplot2]ggtheme.
#' @details The waffle charts are read from left to right (like text) and from bottom upwards (like water glass). The cells indicate 1% of the maximum value (100% if codecomposition = TRUE else codemax_value). The cells are divided vertically to fractions specifies by the codedigits argument.
#' @return Returns a link[ggplot2]ggplot2 waffle plot
#' @import ggplot2 dplyr
#' @family waffle
#' @author Mikko Vihtakari with code ideas from hrefhttps://github.com/hrbrmstr/wafflehrbrmstr and hrefhttps://github.com/liamgilbey/ggwaffleLiam Gilbey
#' @export
# data = dt; fill = "variable"; value = "value"; facet = NULL; composition = TRUE; max_value = NULL; digits = 3; fill_colors = NULL; fill_title = NULL; ncol = NULL; base_size = 12; legend.position = "bottom"
waffle_chart <- function(data, fill, value = "value", facet = NULL, composition = TRUE, max_value = NULL, digits = 3, fill_colors = NULL, fill_title = NULL, ncol = NULL, base_size = 12, legend.position = "bottom")
## White 1% grid
grid_data <- data.frame(xmin = c(0,(1:9)*10^(digits-2)), xmax = (1:10)*10^(digits-2), ymin = rep(c(0,(1:9)*10^(digits-2)), each = 10^(digits-2)), ymax = rep((1:10)*10^(digits-2), each = 10^(digits-2)))
if(is.null(facet)) ## No facetting
if(any(duplicated(data[[fill]]))) stop("data contains duplicated entries in fill column. Use the facet argument or summarize data before plotting.")
waffle_data <- waffleize(dt = data, fill = fill, value = value, composition = composition, max_value = max_value, digits = digits)
## Plot ####
p <- ggplot() +
geom_rect(data = waffle_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = variable)) +
geom_rect(data = grid_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), fill = NA, color = "white") +
coord_equal(expand = FALSE) +
theme_void()
## ####
else ## Facetting
waffle_data <- data %>% group_by_(facet) %>% do(waffleize(dt = ., fill = fill, value = value, composition = composition, max_value = max_value, digits = digits))
## Plot ####
p <- ggplot() +
geom_rect(data = waffle_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = variable)) +
geom_rect(data = grid_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), fill = NA, color = "white") +
facet_wrap(facet, ncol = ncol) +
coord_equal(expand = FALSE) +
theme_void()
## ####
## Fill colors ####
if(is.null(fill_title)) fill_title <- "Variable"
if(!is.null(fill_colors))
p <- p + scale_fill_manual(name = fill_title, values = fill_colors)
else
p <- p + scale_fill_viridis_d(name = fill_title)
## Final theme manipulation
p <- p + theme(
legend.position = legend.position,
aspect.ratio = 1,
panel.border = element_rect(color = "black", size = LS(1), fill = NA),
strip.background = element_rect(fill = alpha("white", 0.4), color = NA),
strip.text.x = element_text(size = base_size*0.8, margin = margin(t = 2, r = 0, b = 3, l = 0, unit = "pt")),
plot.title = element_text(size = base_size, hjust = 0.5, face = 2),
legend.background = element_blank(),
legend.box.background = element_blank(),
legend.title = element_text(size = base_size),
legend.text = element_text(size = base_size),
plot.background = element_blank(),
panel.spacing = unit(0.2, units = "line"),
legend.box.margin = margin(t = 0, r = 0, b = 3, l = 0, unit = "pt"),
plot.margin = unit(c(0.2, 0.5, 0.1, 0.1), units = "line")
)
## Return the plot
p
Finally, we can make the charts:
# Manipulate the dataset first to make sure that there are no replicate
# entries of factors used for the waffles
data("mtcars")
mtcars$gear_vs <- paste(mtcars$gear, mtcars$vs, sep = "-")
mtcars$carb <- factor(mtcars$carb)
x <- mtcars %>% group_by(gear_vs, carb) %>% summarise(value = sum(hp))
waffle_chart(x, fill = "carb", facet = "gear_vs", value = "value")
## You can also scale the waffles to a maximum hp in gear_vs
y <- x %>% group_by(gear_vs) %>% summarise(value = sum(value))
waffle_chart(x, fill = "carb", facet = "gear_vs", value = "value", composition = FALSE, max_value = max(y$value))
add a comment
|
Here is an alternative approach only using tidyverse
(i.e. dplyr
, tidyr
and ggplot2
) to create the waffle plot or square pie chart. It is based on hrbrmstr's answer, but I've tried to make it slightly more general; any frequency table works as input and it is easy to adjust the dimensions of the waffles (e.g. rectangle instead of square).
library(tidyverse)
freq_table = mtcars %>%
count(gear, vs, carb, wt = hp) %>%
group_by(gear, vs) %>%
mutate(pct = n / sum(n)) %>%
select(gear, vs, carb, pct)
Second step creates the coordinates. Uses tidyr::expand()
instead of waffleize()
. Still using smart_round()
.
waffle.n = 100 # Number of blocks
waffle.cols = ceiling(sqrt(waffle.n)) # For square. Otherwise pick integer.
coordinates = freq_table %>%
group_by(gear, vs) %>%
mutate(waffle.num = smart_round(pct,1) * waffle.n) %>%
group_by(carb, gear, vs) %>%
expand(count = seq(1:waffle.num)) %>%
select(-count) %>%
group_by(gear, vs) %>%
arrange(gear, vs) %>%
mutate(
waffle.x = rep_len(1:waffle.cols, waffle.n),
waffle.y = floor((row_number() - 1) / waffle.cols)
)
I group by two variables (gear
and vs
) and therefore use facet_grid()
. If you group by a single variable then use facet_wrap()
. You will need to tweak the options slightly to get the optimal result (such as size of the device, or size and stroke of the points).
fig = coordinates %>%
ggplot(aes(x = waffle.x, y = waffle.y, fill = as.factor(carb))) +
geom_point(size = 7, shape = 22, color = "white", stroke = 0.8) +
#geom_raster() + # Alternative to geom_point() without gap between blocks.
facet_grid(rows = vars(gear), cols = vars(vs)) +
theme_void() +
theme(legend.position = "bottom", plot.margin = margin(5.5, 5.5, 5.5, 5.5, "pt"), panel.spacing = unit(15, "pt"))
fig
#ggsave("fig.pdf", width = 13, height = 17.5, units = "cm", dpi = 150)
A more interesting example with an uneven number of blocks and where the shares are not divisible by 10.
Dividing blocks/cells to accommodate fractional percentages, as Mikko suggests can be archived by increasingwaffle.n
and using something likegeom_raster(hjust=0, vjust=0) + geom_vline(xintercept=seq(0,waffle.cols,by=2), color="white") + geom_hline(yintercept=seq(0,waffle.cols-1,by=1), color="white")
instead ofgeom_point()
– bonna
Mar 30 at 11:51
add a comment
|
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52741666%2fcreating-a-waffle-plot-together-with-facets-in-ggplot2%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
I'm not sure I'll ever get around to stat_waffle()
/ geom_waffle()
but you can just use the logic in the package to do the same thing the long way round:
library(hrbrthemes)
library(tidyverse)
We need to figure out the percents then get each group to sum to 100 even, so we need a helper function that's been around on SO for a while:
smart_round <- function(x, digits = 0) # somewhere on SO
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
There are 2 bits of "magic" in the waffle
package. One bit is the part of the algorithm that just replicates the factor components the right number of times. We'll apply the following function row-wise to the data frame we'll make:
waffleize <- function(xdf)
data_frame(
gear_vs = rep(xdf$gear_vs, xdf$pct),
carb = rep(xdf$carb, xdf$pct)
)
Now we need to:
- do the same math ggplot2 did for your example
- calculate the percents for the groups
- turn the % into parts of 100 and make each group even total to 100
- since we can't have two different
geom_tile()
s in one panel we need to hack a facet name that'll do the same thing - group by said hack
- replicate each row
pct
times - make sure things are ordered right
- join a 10x10
x
/y
grid for as many facets as we'll have (in this case, 6); this is the other bit of the waffle package's "magic" - draw the tiles
^^ translates to 👇 (this pipe chain is a wee bit long for my comfort level, but "it works"):
count(mtcars, gear, vs, carb, wt=hp) %>%
group_by(gear, vs) %>%
mutate(pct = n/sum(n)) %>%
mutate(pct = (smart_round(pct, 1) * 100L) %>% as.integer()) %>%
select(-n) %>%
ungroup() %>%
mutate(carb = as.character(carb)) %>%
mutate(gear_vs = sprintf("%s-%s", gear, vs)) %>%
select(gear_vs, carb, pct, -gear, -vs) %>%
rowwise() %>%
do(waffleize(.)) %>%
ungroup() %>%
arrange(gear_vs, carb) %>%
bind_cols(
map_df(seq_len(length(unique(.$gear_vs))), ~expand.grid(y = 1:10, x = 1:10))
) %>%
ggplot(aes(x, y)) +
geom_tile(aes(fill=carb), color="white", size=0.5) +
ggthemes::scale_fill_tableau() +
facet_wrap(~gear_vs) +
coord_equal() +
labs(x=NULL, y = NULL) +
hrbrthemes::theme_ipsum_rc(grid="") +
theme(axis.text=element_blank())
Are the percentages are rounded wrong for 3-0, 4-1 and 5-0?
– Mikko
Oct 26 '18 at 7:21
add a comment
|
I'm not sure I'll ever get around to stat_waffle()
/ geom_waffle()
but you can just use the logic in the package to do the same thing the long way round:
library(hrbrthemes)
library(tidyverse)
We need to figure out the percents then get each group to sum to 100 even, so we need a helper function that's been around on SO for a while:
smart_round <- function(x, digits = 0) # somewhere on SO
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
There are 2 bits of "magic" in the waffle
package. One bit is the part of the algorithm that just replicates the factor components the right number of times. We'll apply the following function row-wise to the data frame we'll make:
waffleize <- function(xdf)
data_frame(
gear_vs = rep(xdf$gear_vs, xdf$pct),
carb = rep(xdf$carb, xdf$pct)
)
Now we need to:
- do the same math ggplot2 did for your example
- calculate the percents for the groups
- turn the % into parts of 100 and make each group even total to 100
- since we can't have two different
geom_tile()
s in one panel we need to hack a facet name that'll do the same thing - group by said hack
- replicate each row
pct
times - make sure things are ordered right
- join a 10x10
x
/y
grid for as many facets as we'll have (in this case, 6); this is the other bit of the waffle package's "magic" - draw the tiles
^^ translates to 👇 (this pipe chain is a wee bit long for my comfort level, but "it works"):
count(mtcars, gear, vs, carb, wt=hp) %>%
group_by(gear, vs) %>%
mutate(pct = n/sum(n)) %>%
mutate(pct = (smart_round(pct, 1) * 100L) %>% as.integer()) %>%
select(-n) %>%
ungroup() %>%
mutate(carb = as.character(carb)) %>%
mutate(gear_vs = sprintf("%s-%s", gear, vs)) %>%
select(gear_vs, carb, pct, -gear, -vs) %>%
rowwise() %>%
do(waffleize(.)) %>%
ungroup() %>%
arrange(gear_vs, carb) %>%
bind_cols(
map_df(seq_len(length(unique(.$gear_vs))), ~expand.grid(y = 1:10, x = 1:10))
) %>%
ggplot(aes(x, y)) +
geom_tile(aes(fill=carb), color="white", size=0.5) +
ggthemes::scale_fill_tableau() +
facet_wrap(~gear_vs) +
coord_equal() +
labs(x=NULL, y = NULL) +
hrbrthemes::theme_ipsum_rc(grid="") +
theme(axis.text=element_blank())
Are the percentages are rounded wrong for 3-0, 4-1 and 5-0?
– Mikko
Oct 26 '18 at 7:21
add a comment
|
I'm not sure I'll ever get around to stat_waffle()
/ geom_waffle()
but you can just use the logic in the package to do the same thing the long way round:
library(hrbrthemes)
library(tidyverse)
We need to figure out the percents then get each group to sum to 100 even, so we need a helper function that's been around on SO for a while:
smart_round <- function(x, digits = 0) # somewhere on SO
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
There are 2 bits of "magic" in the waffle
package. One bit is the part of the algorithm that just replicates the factor components the right number of times. We'll apply the following function row-wise to the data frame we'll make:
waffleize <- function(xdf)
data_frame(
gear_vs = rep(xdf$gear_vs, xdf$pct),
carb = rep(xdf$carb, xdf$pct)
)
Now we need to:
- do the same math ggplot2 did for your example
- calculate the percents for the groups
- turn the % into parts of 100 and make each group even total to 100
- since we can't have two different
geom_tile()
s in one panel we need to hack a facet name that'll do the same thing - group by said hack
- replicate each row
pct
times - make sure things are ordered right
- join a 10x10
x
/y
grid for as many facets as we'll have (in this case, 6); this is the other bit of the waffle package's "magic" - draw the tiles
^^ translates to 👇 (this pipe chain is a wee bit long for my comfort level, but "it works"):
count(mtcars, gear, vs, carb, wt=hp) %>%
group_by(gear, vs) %>%
mutate(pct = n/sum(n)) %>%
mutate(pct = (smart_round(pct, 1) * 100L) %>% as.integer()) %>%
select(-n) %>%
ungroup() %>%
mutate(carb = as.character(carb)) %>%
mutate(gear_vs = sprintf("%s-%s", gear, vs)) %>%
select(gear_vs, carb, pct, -gear, -vs) %>%
rowwise() %>%
do(waffleize(.)) %>%
ungroup() %>%
arrange(gear_vs, carb) %>%
bind_cols(
map_df(seq_len(length(unique(.$gear_vs))), ~expand.grid(y = 1:10, x = 1:10))
) %>%
ggplot(aes(x, y)) +
geom_tile(aes(fill=carb), color="white", size=0.5) +
ggthemes::scale_fill_tableau() +
facet_wrap(~gear_vs) +
coord_equal() +
labs(x=NULL, y = NULL) +
hrbrthemes::theme_ipsum_rc(grid="") +
theme(axis.text=element_blank())
I'm not sure I'll ever get around to stat_waffle()
/ geom_waffle()
but you can just use the logic in the package to do the same thing the long way round:
library(hrbrthemes)
library(tidyverse)
We need to figure out the percents then get each group to sum to 100 even, so we need a helper function that's been around on SO for a while:
smart_round <- function(x, digits = 0) # somewhere on SO
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
There are 2 bits of "magic" in the waffle
package. One bit is the part of the algorithm that just replicates the factor components the right number of times. We'll apply the following function row-wise to the data frame we'll make:
waffleize <- function(xdf)
data_frame(
gear_vs = rep(xdf$gear_vs, xdf$pct),
carb = rep(xdf$carb, xdf$pct)
)
Now we need to:
- do the same math ggplot2 did for your example
- calculate the percents for the groups
- turn the % into parts of 100 and make each group even total to 100
- since we can't have two different
geom_tile()
s in one panel we need to hack a facet name that'll do the same thing - group by said hack
- replicate each row
pct
times - make sure things are ordered right
- join a 10x10
x
/y
grid for as many facets as we'll have (in this case, 6); this is the other bit of the waffle package's "magic" - draw the tiles
^^ translates to 👇 (this pipe chain is a wee bit long for my comfort level, but "it works"):
count(mtcars, gear, vs, carb, wt=hp) %>%
group_by(gear, vs) %>%
mutate(pct = n/sum(n)) %>%
mutate(pct = (smart_round(pct, 1) * 100L) %>% as.integer()) %>%
select(-n) %>%
ungroup() %>%
mutate(carb = as.character(carb)) %>%
mutate(gear_vs = sprintf("%s-%s", gear, vs)) %>%
select(gear_vs, carb, pct, -gear, -vs) %>%
rowwise() %>%
do(waffleize(.)) %>%
ungroup() %>%
arrange(gear_vs, carb) %>%
bind_cols(
map_df(seq_len(length(unique(.$gear_vs))), ~expand.grid(y = 1:10, x = 1:10))
) %>%
ggplot(aes(x, y)) +
geom_tile(aes(fill=carb), color="white", size=0.5) +
ggthemes::scale_fill_tableau() +
facet_wrap(~gear_vs) +
coord_equal() +
labs(x=NULL, y = NULL) +
hrbrthemes::theme_ipsum_rc(grid="") +
theme(axis.text=element_blank())
edited Oct 12 '18 at 11:52
answered Oct 10 '18 at 14:47
hrbrmstrhrbrmstr
64.7k8 gold badges99 silver badges160 bronze badges
64.7k8 gold badges99 silver badges160 bronze badges
Are the percentages are rounded wrong for 3-0, 4-1 and 5-0?
– Mikko
Oct 26 '18 at 7:21
add a comment
|
Are the percentages are rounded wrong for 3-0, 4-1 and 5-0?
– Mikko
Oct 26 '18 at 7:21
Are the percentages are rounded wrong for 3-0, 4-1 and 5-0?
– Mikko
Oct 26 '18 at 7:21
Are the percentages are rounded wrong for 3-0, 4-1 and 5-0?
– Mikko
Oct 26 '18 at 7:21
add a comment
|
A little addition to @hrbrmstr excellent answer (thanks for making the package and sharing the code with us!). I also have struggled to make such plots, since I think this is a better way of visualizing data when doing comparisons than pie charts. My philosophy with waffle charts differs from the existing solutions (waffle and ggwaffle) in three ways:
- The waffles should always be presented in a 10x10% grid with aspect ratio 1, as in required OP's question and in @hrbrmstr answer. This makes reading the percentages easier.
- The waffles should be filled from left to right (reading direction) and from bottom upwards (like water in a glass, unlike in the existing solutions). This makes reading the percentages more natural to me.
- The 1% cells could be divided to accommodate fractional percentages. Groups that contribute <1% disappear when using full percentages, yet in the applications I have used these plots, missing groups have a different meaning than (0,1) groups.
Because it took me quite some time to solve the issues mentioned above, I post my solution under. "The magic" (i.e. the approach) behind the code has been scavenged from the existing packages. I hope this helps someone and pushes the waffle chart development forward. I really think these charts have a potential in data visualization. It would be nice to implement the functions as ggplot2
proto
objects, but after trying I had to give up. I do not understand that proto
system well enough to write code for it. I just copy the code I intend to include in my R package once I have added all the features I plan to add (the plan is to make waffle charts work similarly to scatterpie
). Note that a lot of the code below is to make the waffles pretty. The part that actually bakes the waffles is not that complicated and is well explained in @hrbrmstr answer.
Functions required to run the example:
library(dplyr)
library(ggplot2)
#' @title Convert line sizes measured as points to ggplot line sizes
#' @description Converts line sizes measured as points (as given by most programs such as Adobe Illustrator etc.) to ggplot font sizes
#' @param x numeric vector giving the lines sizes in points
#' @return Returns a numeric vector of lenght codex of ggplot line sizes
#' @keywords internal
#' @export
#'
LS <- function(x) x/2.13
#' @title Round values preserving total sums
#' @description The function rounds values preserving total sums
#' @param x numeric vector of values to be rounded
#' @param digits integer indicating the number of decimal places. See codelink[base]round.
#' @return Returns a numeric vector.
#' @author The function is written as a communal effort. Main authors are hrefhttps://stackoverflow.com/questions/32544646/round-vector-of-numerics-to-integer-while-preserving-their-sumjosliber and hrefhttps://www.r-bloggers.com/round-values-while-preserve-their-rounded-sum-in-r/BioStatMatt.
#' @keywords internal
#' @family waffle
#' @export
round_preserve_sum <- function(x, digits = 0)
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
# from https://www.r-bloggers.com/round-values-while-preserve-their-rounded-sum-in-r/
#' @title Prepare data for waffle plots
#' @description Manipulates a data frame ready for plotting with the codelinkwaffle_chart function.
#' @param dt data frame containing the data which should be transformed
#' @param fill character specifying the column name which should be used as fill for the waffle plot.
#' @param value character specifying the column name which contains values of the codefill variable.
#' @param composition logical indicating whether a compositional waffle (i.e. fill adds up to 100%) should be created. If codeFALSE, waffle cells will be scaled to codemax_value and missing cells filled with an "empty" category.
#' @param max_value numerical giving the value to which waffle cells should be scaled to, if codecomposition = FALSE.
#' @param digits integer indicating the number of decimal places to be used in rounding of the waffle cells.
#' @return returns a link[tibble]tibble data frame containing the minimum and maximum extent of each codefill level.
#' @author Mikko Vihtakari
#' @keywords internal
#' @family waffle
#' @import dplyr
#' @export
waffleize <- function(dt, fill, value, composition = TRUE, max_value = NULL, digits = 3)
x <- dt[c(fill, value)]
names(x) <- c("variable", "value")
if(composition)
x$value <- round_preserve_sum(10^digits*x$value/sum(x$value))
else
if(is.null(max_value)) stop("max_value has to be given, if composition = FALSE")
if(max_value < sum(x$value)) stop("max_value has to be larger than the sum of 'value' column. Use composition = TRUE, if you want a compositional waffle chart")
x <- rbind(x, data.frame(variable = "empty", value = max_value - sum(x$value)))
x$value <- round_preserve_sum(10^digits*x$value/max_value)
if(!is.factor(x$variable)) x$variable <- factor(x$variable, levels = c(sort(unique(x$variable)[unique(x$variable) != "empty"]), "empty"))
x <- x[order(x$variable),]
#tmp <- data.frame(X = 1:100, ymin = rep(c(0, (1:9)*10), each = 100), ymax = rep((1:10)*10, each = 100), variable = rep(dt[[fill]], dt[[value]]))
tmp <- data.frame(X = 1:10^(digits-1), ymin = rep(c(0, (1:9)*10^(digits-2)), each = 10^(digits-1)), ymax = rep((1:10)*10^(digits-2), each = 10^(digits-1)), variable = rep(x$variable, x$value))
out <- tmp %>% group_by(variable, ymin, ymax) %>% summarise(xmin = min(X)-1, xmax = max(X))
## Remove the empty category
out[out$variable != "empty",]
#' @title Plot waffle charts
#' @description The function uses link[ggplot2]ggplot2 to create waffle charts from data.
#' @param data data frame to be plotted
#' @param fill character specifying the column name which should be used as fill for the waffle plot.
#' @param value character specifying the column name which contains values of the codefill variable. Will be used to fill the waffle cells.
#' @param facet character specifying the column name which should be used to codelink[ggplot2]facet_wrap waffle charts.
#' @param ncol number of columns to be used in facetting. See codelink[ggplot2]facet_wrap.
#' @param composition logical indicating whether a compositional waffle (i.e. fill adds up to 100%) should be created. If codeFALSE, waffle cells will be scaled to codemax_value and missing cells filled with an "empty" category.
#' @param max_value numerical giving the value to which waffle cells should be scaled to, if codecomposition = FALSE.
#' @param digits integer indicating the number of decimal places to be used in rounding of the waffle cells. The value 3 indicates percentages, while 4 permilles.
#' @param fill_colors named character vector giving the colors for codefill levels. See codelink[ggplot2]scale_fill_manual.
#' @param fill_title character giving the title for the color legend.
#' @param base_size numeric giving the base size for the plot. See codelink[ggplot2]theme_void.
#' @param legend.position character specifying the position of the legend. See codelink[ggplot2]ggtheme.
#' @details The waffle charts are read from left to right (like text) and from bottom upwards (like water glass). The cells indicate 1% of the maximum value (100% if codecomposition = TRUE else codemax_value). The cells are divided vertically to fractions specifies by the codedigits argument.
#' @return Returns a link[ggplot2]ggplot2 waffle plot
#' @import ggplot2 dplyr
#' @family waffle
#' @author Mikko Vihtakari with code ideas from hrefhttps://github.com/hrbrmstr/wafflehrbrmstr and hrefhttps://github.com/liamgilbey/ggwaffleLiam Gilbey
#' @export
# data = dt; fill = "variable"; value = "value"; facet = NULL; composition = TRUE; max_value = NULL; digits = 3; fill_colors = NULL; fill_title = NULL; ncol = NULL; base_size = 12; legend.position = "bottom"
waffle_chart <- function(data, fill, value = "value", facet = NULL, composition = TRUE, max_value = NULL, digits = 3, fill_colors = NULL, fill_title = NULL, ncol = NULL, base_size = 12, legend.position = "bottom")
## White 1% grid
grid_data <- data.frame(xmin = c(0,(1:9)*10^(digits-2)), xmax = (1:10)*10^(digits-2), ymin = rep(c(0,(1:9)*10^(digits-2)), each = 10^(digits-2)), ymax = rep((1:10)*10^(digits-2), each = 10^(digits-2)))
if(is.null(facet)) ## No facetting
if(any(duplicated(data[[fill]]))) stop("data contains duplicated entries in fill column. Use the facet argument or summarize data before plotting.")
waffle_data <- waffleize(dt = data, fill = fill, value = value, composition = composition, max_value = max_value, digits = digits)
## Plot ####
p <- ggplot() +
geom_rect(data = waffle_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = variable)) +
geom_rect(data = grid_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), fill = NA, color = "white") +
coord_equal(expand = FALSE) +
theme_void()
## ####
else ## Facetting
waffle_data <- data %>% group_by_(facet) %>% do(waffleize(dt = ., fill = fill, value = value, composition = composition, max_value = max_value, digits = digits))
## Plot ####
p <- ggplot() +
geom_rect(data = waffle_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = variable)) +
geom_rect(data = grid_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), fill = NA, color = "white") +
facet_wrap(facet, ncol = ncol) +
coord_equal(expand = FALSE) +
theme_void()
## ####
## Fill colors ####
if(is.null(fill_title)) fill_title <- "Variable"
if(!is.null(fill_colors))
p <- p + scale_fill_manual(name = fill_title, values = fill_colors)
else
p <- p + scale_fill_viridis_d(name = fill_title)
## Final theme manipulation
p <- p + theme(
legend.position = legend.position,
aspect.ratio = 1,
panel.border = element_rect(color = "black", size = LS(1), fill = NA),
strip.background = element_rect(fill = alpha("white", 0.4), color = NA),
strip.text.x = element_text(size = base_size*0.8, margin = margin(t = 2, r = 0, b = 3, l = 0, unit = "pt")),
plot.title = element_text(size = base_size, hjust = 0.5, face = 2),
legend.background = element_blank(),
legend.box.background = element_blank(),
legend.title = element_text(size = base_size),
legend.text = element_text(size = base_size),
plot.background = element_blank(),
panel.spacing = unit(0.2, units = "line"),
legend.box.margin = margin(t = 0, r = 0, b = 3, l = 0, unit = "pt"),
plot.margin = unit(c(0.2, 0.5, 0.1, 0.1), units = "line")
)
## Return the plot
p
Finally, we can make the charts:
# Manipulate the dataset first to make sure that there are no replicate
# entries of factors used for the waffles
data("mtcars")
mtcars$gear_vs <- paste(mtcars$gear, mtcars$vs, sep = "-")
mtcars$carb <- factor(mtcars$carb)
x <- mtcars %>% group_by(gear_vs, carb) %>% summarise(value = sum(hp))
waffle_chart(x, fill = "carb", facet = "gear_vs", value = "value")
## You can also scale the waffles to a maximum hp in gear_vs
y <- x %>% group_by(gear_vs) %>% summarise(value = sum(value))
waffle_chart(x, fill = "carb", facet = "gear_vs", value = "value", composition = FALSE, max_value = max(y$value))
add a comment
|
A little addition to @hrbrmstr excellent answer (thanks for making the package and sharing the code with us!). I also have struggled to make such plots, since I think this is a better way of visualizing data when doing comparisons than pie charts. My philosophy with waffle charts differs from the existing solutions (waffle and ggwaffle) in three ways:
- The waffles should always be presented in a 10x10% grid with aspect ratio 1, as in required OP's question and in @hrbrmstr answer. This makes reading the percentages easier.
- The waffles should be filled from left to right (reading direction) and from bottom upwards (like water in a glass, unlike in the existing solutions). This makes reading the percentages more natural to me.
- The 1% cells could be divided to accommodate fractional percentages. Groups that contribute <1% disappear when using full percentages, yet in the applications I have used these plots, missing groups have a different meaning than (0,1) groups.
Because it took me quite some time to solve the issues mentioned above, I post my solution under. "The magic" (i.e. the approach) behind the code has been scavenged from the existing packages. I hope this helps someone and pushes the waffle chart development forward. I really think these charts have a potential in data visualization. It would be nice to implement the functions as ggplot2
proto
objects, but after trying I had to give up. I do not understand that proto
system well enough to write code for it. I just copy the code I intend to include in my R package once I have added all the features I plan to add (the plan is to make waffle charts work similarly to scatterpie
). Note that a lot of the code below is to make the waffles pretty. The part that actually bakes the waffles is not that complicated and is well explained in @hrbrmstr answer.
Functions required to run the example:
library(dplyr)
library(ggplot2)
#' @title Convert line sizes measured as points to ggplot line sizes
#' @description Converts line sizes measured as points (as given by most programs such as Adobe Illustrator etc.) to ggplot font sizes
#' @param x numeric vector giving the lines sizes in points
#' @return Returns a numeric vector of lenght codex of ggplot line sizes
#' @keywords internal
#' @export
#'
LS <- function(x) x/2.13
#' @title Round values preserving total sums
#' @description The function rounds values preserving total sums
#' @param x numeric vector of values to be rounded
#' @param digits integer indicating the number of decimal places. See codelink[base]round.
#' @return Returns a numeric vector.
#' @author The function is written as a communal effort. Main authors are hrefhttps://stackoverflow.com/questions/32544646/round-vector-of-numerics-to-integer-while-preserving-their-sumjosliber and hrefhttps://www.r-bloggers.com/round-values-while-preserve-their-rounded-sum-in-r/BioStatMatt.
#' @keywords internal
#' @family waffle
#' @export
round_preserve_sum <- function(x, digits = 0)
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
# from https://www.r-bloggers.com/round-values-while-preserve-their-rounded-sum-in-r/
#' @title Prepare data for waffle plots
#' @description Manipulates a data frame ready for plotting with the codelinkwaffle_chart function.
#' @param dt data frame containing the data which should be transformed
#' @param fill character specifying the column name which should be used as fill for the waffle plot.
#' @param value character specifying the column name which contains values of the codefill variable.
#' @param composition logical indicating whether a compositional waffle (i.e. fill adds up to 100%) should be created. If codeFALSE, waffle cells will be scaled to codemax_value and missing cells filled with an "empty" category.
#' @param max_value numerical giving the value to which waffle cells should be scaled to, if codecomposition = FALSE.
#' @param digits integer indicating the number of decimal places to be used in rounding of the waffle cells.
#' @return returns a link[tibble]tibble data frame containing the minimum and maximum extent of each codefill level.
#' @author Mikko Vihtakari
#' @keywords internal
#' @family waffle
#' @import dplyr
#' @export
waffleize <- function(dt, fill, value, composition = TRUE, max_value = NULL, digits = 3)
x <- dt[c(fill, value)]
names(x) <- c("variable", "value")
if(composition)
x$value <- round_preserve_sum(10^digits*x$value/sum(x$value))
else
if(is.null(max_value)) stop("max_value has to be given, if composition = FALSE")
if(max_value < sum(x$value)) stop("max_value has to be larger than the sum of 'value' column. Use composition = TRUE, if you want a compositional waffle chart")
x <- rbind(x, data.frame(variable = "empty", value = max_value - sum(x$value)))
x$value <- round_preserve_sum(10^digits*x$value/max_value)
if(!is.factor(x$variable)) x$variable <- factor(x$variable, levels = c(sort(unique(x$variable)[unique(x$variable) != "empty"]), "empty"))
x <- x[order(x$variable),]
#tmp <- data.frame(X = 1:100, ymin = rep(c(0, (1:9)*10), each = 100), ymax = rep((1:10)*10, each = 100), variable = rep(dt[[fill]], dt[[value]]))
tmp <- data.frame(X = 1:10^(digits-1), ymin = rep(c(0, (1:9)*10^(digits-2)), each = 10^(digits-1)), ymax = rep((1:10)*10^(digits-2), each = 10^(digits-1)), variable = rep(x$variable, x$value))
out <- tmp %>% group_by(variable, ymin, ymax) %>% summarise(xmin = min(X)-1, xmax = max(X))
## Remove the empty category
out[out$variable != "empty",]
#' @title Plot waffle charts
#' @description The function uses link[ggplot2]ggplot2 to create waffle charts from data.
#' @param data data frame to be plotted
#' @param fill character specifying the column name which should be used as fill for the waffle plot.
#' @param value character specifying the column name which contains values of the codefill variable. Will be used to fill the waffle cells.
#' @param facet character specifying the column name which should be used to codelink[ggplot2]facet_wrap waffle charts.
#' @param ncol number of columns to be used in facetting. See codelink[ggplot2]facet_wrap.
#' @param composition logical indicating whether a compositional waffle (i.e. fill adds up to 100%) should be created. If codeFALSE, waffle cells will be scaled to codemax_value and missing cells filled with an "empty" category.
#' @param max_value numerical giving the value to which waffle cells should be scaled to, if codecomposition = FALSE.
#' @param digits integer indicating the number of decimal places to be used in rounding of the waffle cells. The value 3 indicates percentages, while 4 permilles.
#' @param fill_colors named character vector giving the colors for codefill levels. See codelink[ggplot2]scale_fill_manual.
#' @param fill_title character giving the title for the color legend.
#' @param base_size numeric giving the base size for the plot. See codelink[ggplot2]theme_void.
#' @param legend.position character specifying the position of the legend. See codelink[ggplot2]ggtheme.
#' @details The waffle charts are read from left to right (like text) and from bottom upwards (like water glass). The cells indicate 1% of the maximum value (100% if codecomposition = TRUE else codemax_value). The cells are divided vertically to fractions specifies by the codedigits argument.
#' @return Returns a link[ggplot2]ggplot2 waffle plot
#' @import ggplot2 dplyr
#' @family waffle
#' @author Mikko Vihtakari with code ideas from hrefhttps://github.com/hrbrmstr/wafflehrbrmstr and hrefhttps://github.com/liamgilbey/ggwaffleLiam Gilbey
#' @export
# data = dt; fill = "variable"; value = "value"; facet = NULL; composition = TRUE; max_value = NULL; digits = 3; fill_colors = NULL; fill_title = NULL; ncol = NULL; base_size = 12; legend.position = "bottom"
waffle_chart <- function(data, fill, value = "value", facet = NULL, composition = TRUE, max_value = NULL, digits = 3, fill_colors = NULL, fill_title = NULL, ncol = NULL, base_size = 12, legend.position = "bottom")
## White 1% grid
grid_data <- data.frame(xmin = c(0,(1:9)*10^(digits-2)), xmax = (1:10)*10^(digits-2), ymin = rep(c(0,(1:9)*10^(digits-2)), each = 10^(digits-2)), ymax = rep((1:10)*10^(digits-2), each = 10^(digits-2)))
if(is.null(facet)) ## No facetting
if(any(duplicated(data[[fill]]))) stop("data contains duplicated entries in fill column. Use the facet argument or summarize data before plotting.")
waffle_data <- waffleize(dt = data, fill = fill, value = value, composition = composition, max_value = max_value, digits = digits)
## Plot ####
p <- ggplot() +
geom_rect(data = waffle_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = variable)) +
geom_rect(data = grid_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), fill = NA, color = "white") +
coord_equal(expand = FALSE) +
theme_void()
## ####
else ## Facetting
waffle_data <- data %>% group_by_(facet) %>% do(waffleize(dt = ., fill = fill, value = value, composition = composition, max_value = max_value, digits = digits))
## Plot ####
p <- ggplot() +
geom_rect(data = waffle_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = variable)) +
geom_rect(data = grid_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), fill = NA, color = "white") +
facet_wrap(facet, ncol = ncol) +
coord_equal(expand = FALSE) +
theme_void()
## ####
## Fill colors ####
if(is.null(fill_title)) fill_title <- "Variable"
if(!is.null(fill_colors))
p <- p + scale_fill_manual(name = fill_title, values = fill_colors)
else
p <- p + scale_fill_viridis_d(name = fill_title)
## Final theme manipulation
p <- p + theme(
legend.position = legend.position,
aspect.ratio = 1,
panel.border = element_rect(color = "black", size = LS(1), fill = NA),
strip.background = element_rect(fill = alpha("white", 0.4), color = NA),
strip.text.x = element_text(size = base_size*0.8, margin = margin(t = 2, r = 0, b = 3, l = 0, unit = "pt")),
plot.title = element_text(size = base_size, hjust = 0.5, face = 2),
legend.background = element_blank(),
legend.box.background = element_blank(),
legend.title = element_text(size = base_size),
legend.text = element_text(size = base_size),
plot.background = element_blank(),
panel.spacing = unit(0.2, units = "line"),
legend.box.margin = margin(t = 0, r = 0, b = 3, l = 0, unit = "pt"),
plot.margin = unit(c(0.2, 0.5, 0.1, 0.1), units = "line")
)
## Return the plot
p
Finally, we can make the charts:
# Manipulate the dataset first to make sure that there are no replicate
# entries of factors used for the waffles
data("mtcars")
mtcars$gear_vs <- paste(mtcars$gear, mtcars$vs, sep = "-")
mtcars$carb <- factor(mtcars$carb)
x <- mtcars %>% group_by(gear_vs, carb) %>% summarise(value = sum(hp))
waffle_chart(x, fill = "carb", facet = "gear_vs", value = "value")
## You can also scale the waffles to a maximum hp in gear_vs
y <- x %>% group_by(gear_vs) %>% summarise(value = sum(value))
waffle_chart(x, fill = "carb", facet = "gear_vs", value = "value", composition = FALSE, max_value = max(y$value))
add a comment
|
A little addition to @hrbrmstr excellent answer (thanks for making the package and sharing the code with us!). I also have struggled to make such plots, since I think this is a better way of visualizing data when doing comparisons than pie charts. My philosophy with waffle charts differs from the existing solutions (waffle and ggwaffle) in three ways:
- The waffles should always be presented in a 10x10% grid with aspect ratio 1, as in required OP's question and in @hrbrmstr answer. This makes reading the percentages easier.
- The waffles should be filled from left to right (reading direction) and from bottom upwards (like water in a glass, unlike in the existing solutions). This makes reading the percentages more natural to me.
- The 1% cells could be divided to accommodate fractional percentages. Groups that contribute <1% disappear when using full percentages, yet in the applications I have used these plots, missing groups have a different meaning than (0,1) groups.
Because it took me quite some time to solve the issues mentioned above, I post my solution under. "The magic" (i.e. the approach) behind the code has been scavenged from the existing packages. I hope this helps someone and pushes the waffle chart development forward. I really think these charts have a potential in data visualization. It would be nice to implement the functions as ggplot2
proto
objects, but after trying I had to give up. I do not understand that proto
system well enough to write code for it. I just copy the code I intend to include in my R package once I have added all the features I plan to add (the plan is to make waffle charts work similarly to scatterpie
). Note that a lot of the code below is to make the waffles pretty. The part that actually bakes the waffles is not that complicated and is well explained in @hrbrmstr answer.
Functions required to run the example:
library(dplyr)
library(ggplot2)
#' @title Convert line sizes measured as points to ggplot line sizes
#' @description Converts line sizes measured as points (as given by most programs such as Adobe Illustrator etc.) to ggplot font sizes
#' @param x numeric vector giving the lines sizes in points
#' @return Returns a numeric vector of lenght codex of ggplot line sizes
#' @keywords internal
#' @export
#'
LS <- function(x) x/2.13
#' @title Round values preserving total sums
#' @description The function rounds values preserving total sums
#' @param x numeric vector of values to be rounded
#' @param digits integer indicating the number of decimal places. See codelink[base]round.
#' @return Returns a numeric vector.
#' @author The function is written as a communal effort. Main authors are hrefhttps://stackoverflow.com/questions/32544646/round-vector-of-numerics-to-integer-while-preserving-their-sumjosliber and hrefhttps://www.r-bloggers.com/round-values-while-preserve-their-rounded-sum-in-r/BioStatMatt.
#' @keywords internal
#' @family waffle
#' @export
round_preserve_sum <- function(x, digits = 0)
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
# from https://www.r-bloggers.com/round-values-while-preserve-their-rounded-sum-in-r/
#' @title Prepare data for waffle plots
#' @description Manipulates a data frame ready for plotting with the codelinkwaffle_chart function.
#' @param dt data frame containing the data which should be transformed
#' @param fill character specifying the column name which should be used as fill for the waffle plot.
#' @param value character specifying the column name which contains values of the codefill variable.
#' @param composition logical indicating whether a compositional waffle (i.e. fill adds up to 100%) should be created. If codeFALSE, waffle cells will be scaled to codemax_value and missing cells filled with an "empty" category.
#' @param max_value numerical giving the value to which waffle cells should be scaled to, if codecomposition = FALSE.
#' @param digits integer indicating the number of decimal places to be used in rounding of the waffle cells.
#' @return returns a link[tibble]tibble data frame containing the minimum and maximum extent of each codefill level.
#' @author Mikko Vihtakari
#' @keywords internal
#' @family waffle
#' @import dplyr
#' @export
waffleize <- function(dt, fill, value, composition = TRUE, max_value = NULL, digits = 3)
x <- dt[c(fill, value)]
names(x) <- c("variable", "value")
if(composition)
x$value <- round_preserve_sum(10^digits*x$value/sum(x$value))
else
if(is.null(max_value)) stop("max_value has to be given, if composition = FALSE")
if(max_value < sum(x$value)) stop("max_value has to be larger than the sum of 'value' column. Use composition = TRUE, if you want a compositional waffle chart")
x <- rbind(x, data.frame(variable = "empty", value = max_value - sum(x$value)))
x$value <- round_preserve_sum(10^digits*x$value/max_value)
if(!is.factor(x$variable)) x$variable <- factor(x$variable, levels = c(sort(unique(x$variable)[unique(x$variable) != "empty"]), "empty"))
x <- x[order(x$variable),]
#tmp <- data.frame(X = 1:100, ymin = rep(c(0, (1:9)*10), each = 100), ymax = rep((1:10)*10, each = 100), variable = rep(dt[[fill]], dt[[value]]))
tmp <- data.frame(X = 1:10^(digits-1), ymin = rep(c(0, (1:9)*10^(digits-2)), each = 10^(digits-1)), ymax = rep((1:10)*10^(digits-2), each = 10^(digits-1)), variable = rep(x$variable, x$value))
out <- tmp %>% group_by(variable, ymin, ymax) %>% summarise(xmin = min(X)-1, xmax = max(X))
## Remove the empty category
out[out$variable != "empty",]
#' @title Plot waffle charts
#' @description The function uses link[ggplot2]ggplot2 to create waffle charts from data.
#' @param data data frame to be plotted
#' @param fill character specifying the column name which should be used as fill for the waffle plot.
#' @param value character specifying the column name which contains values of the codefill variable. Will be used to fill the waffle cells.
#' @param facet character specifying the column name which should be used to codelink[ggplot2]facet_wrap waffle charts.
#' @param ncol number of columns to be used in facetting. See codelink[ggplot2]facet_wrap.
#' @param composition logical indicating whether a compositional waffle (i.e. fill adds up to 100%) should be created. If codeFALSE, waffle cells will be scaled to codemax_value and missing cells filled with an "empty" category.
#' @param max_value numerical giving the value to which waffle cells should be scaled to, if codecomposition = FALSE.
#' @param digits integer indicating the number of decimal places to be used in rounding of the waffle cells. The value 3 indicates percentages, while 4 permilles.
#' @param fill_colors named character vector giving the colors for codefill levels. See codelink[ggplot2]scale_fill_manual.
#' @param fill_title character giving the title for the color legend.
#' @param base_size numeric giving the base size for the plot. See codelink[ggplot2]theme_void.
#' @param legend.position character specifying the position of the legend. See codelink[ggplot2]ggtheme.
#' @details The waffle charts are read from left to right (like text) and from bottom upwards (like water glass). The cells indicate 1% of the maximum value (100% if codecomposition = TRUE else codemax_value). The cells are divided vertically to fractions specifies by the codedigits argument.
#' @return Returns a link[ggplot2]ggplot2 waffle plot
#' @import ggplot2 dplyr
#' @family waffle
#' @author Mikko Vihtakari with code ideas from hrefhttps://github.com/hrbrmstr/wafflehrbrmstr and hrefhttps://github.com/liamgilbey/ggwaffleLiam Gilbey
#' @export
# data = dt; fill = "variable"; value = "value"; facet = NULL; composition = TRUE; max_value = NULL; digits = 3; fill_colors = NULL; fill_title = NULL; ncol = NULL; base_size = 12; legend.position = "bottom"
waffle_chart <- function(data, fill, value = "value", facet = NULL, composition = TRUE, max_value = NULL, digits = 3, fill_colors = NULL, fill_title = NULL, ncol = NULL, base_size = 12, legend.position = "bottom")
## White 1% grid
grid_data <- data.frame(xmin = c(0,(1:9)*10^(digits-2)), xmax = (1:10)*10^(digits-2), ymin = rep(c(0,(1:9)*10^(digits-2)), each = 10^(digits-2)), ymax = rep((1:10)*10^(digits-2), each = 10^(digits-2)))
if(is.null(facet)) ## No facetting
if(any(duplicated(data[[fill]]))) stop("data contains duplicated entries in fill column. Use the facet argument or summarize data before plotting.")
waffle_data <- waffleize(dt = data, fill = fill, value = value, composition = composition, max_value = max_value, digits = digits)
## Plot ####
p <- ggplot() +
geom_rect(data = waffle_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = variable)) +
geom_rect(data = grid_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), fill = NA, color = "white") +
coord_equal(expand = FALSE) +
theme_void()
## ####
else ## Facetting
waffle_data <- data %>% group_by_(facet) %>% do(waffleize(dt = ., fill = fill, value = value, composition = composition, max_value = max_value, digits = digits))
## Plot ####
p <- ggplot() +
geom_rect(data = waffle_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = variable)) +
geom_rect(data = grid_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), fill = NA, color = "white") +
facet_wrap(facet, ncol = ncol) +
coord_equal(expand = FALSE) +
theme_void()
## ####
## Fill colors ####
if(is.null(fill_title)) fill_title <- "Variable"
if(!is.null(fill_colors))
p <- p + scale_fill_manual(name = fill_title, values = fill_colors)
else
p <- p + scale_fill_viridis_d(name = fill_title)
## Final theme manipulation
p <- p + theme(
legend.position = legend.position,
aspect.ratio = 1,
panel.border = element_rect(color = "black", size = LS(1), fill = NA),
strip.background = element_rect(fill = alpha("white", 0.4), color = NA),
strip.text.x = element_text(size = base_size*0.8, margin = margin(t = 2, r = 0, b = 3, l = 0, unit = "pt")),
plot.title = element_text(size = base_size, hjust = 0.5, face = 2),
legend.background = element_blank(),
legend.box.background = element_blank(),
legend.title = element_text(size = base_size),
legend.text = element_text(size = base_size),
plot.background = element_blank(),
panel.spacing = unit(0.2, units = "line"),
legend.box.margin = margin(t = 0, r = 0, b = 3, l = 0, unit = "pt"),
plot.margin = unit(c(0.2, 0.5, 0.1, 0.1), units = "line")
)
## Return the plot
p
Finally, we can make the charts:
# Manipulate the dataset first to make sure that there are no replicate
# entries of factors used for the waffles
data("mtcars")
mtcars$gear_vs <- paste(mtcars$gear, mtcars$vs, sep = "-")
mtcars$carb <- factor(mtcars$carb)
x <- mtcars %>% group_by(gear_vs, carb) %>% summarise(value = sum(hp))
waffle_chart(x, fill = "carb", facet = "gear_vs", value = "value")
## You can also scale the waffles to a maximum hp in gear_vs
y <- x %>% group_by(gear_vs) %>% summarise(value = sum(value))
waffle_chart(x, fill = "carb", facet = "gear_vs", value = "value", composition = FALSE, max_value = max(y$value))
A little addition to @hrbrmstr excellent answer (thanks for making the package and sharing the code with us!). I also have struggled to make such plots, since I think this is a better way of visualizing data when doing comparisons than pie charts. My philosophy with waffle charts differs from the existing solutions (waffle and ggwaffle) in three ways:
- The waffles should always be presented in a 10x10% grid with aspect ratio 1, as in required OP's question and in @hrbrmstr answer. This makes reading the percentages easier.
- The waffles should be filled from left to right (reading direction) and from bottom upwards (like water in a glass, unlike in the existing solutions). This makes reading the percentages more natural to me.
- The 1% cells could be divided to accommodate fractional percentages. Groups that contribute <1% disappear when using full percentages, yet in the applications I have used these plots, missing groups have a different meaning than (0,1) groups.
Because it took me quite some time to solve the issues mentioned above, I post my solution under. "The magic" (i.e. the approach) behind the code has been scavenged from the existing packages. I hope this helps someone and pushes the waffle chart development forward. I really think these charts have a potential in data visualization. It would be nice to implement the functions as ggplot2
proto
objects, but after trying I had to give up. I do not understand that proto
system well enough to write code for it. I just copy the code I intend to include in my R package once I have added all the features I plan to add (the plan is to make waffle charts work similarly to scatterpie
). Note that a lot of the code below is to make the waffles pretty. The part that actually bakes the waffles is not that complicated and is well explained in @hrbrmstr answer.
Functions required to run the example:
library(dplyr)
library(ggplot2)
#' @title Convert line sizes measured as points to ggplot line sizes
#' @description Converts line sizes measured as points (as given by most programs such as Adobe Illustrator etc.) to ggplot font sizes
#' @param x numeric vector giving the lines sizes in points
#' @return Returns a numeric vector of lenght codex of ggplot line sizes
#' @keywords internal
#' @export
#'
LS <- function(x) x/2.13
#' @title Round values preserving total sums
#' @description The function rounds values preserving total sums
#' @param x numeric vector of values to be rounded
#' @param digits integer indicating the number of decimal places. See codelink[base]round.
#' @return Returns a numeric vector.
#' @author The function is written as a communal effort. Main authors are hrefhttps://stackoverflow.com/questions/32544646/round-vector-of-numerics-to-integer-while-preserving-their-sumjosliber and hrefhttps://www.r-bloggers.com/round-values-while-preserve-their-rounded-sum-in-r/BioStatMatt.
#' @keywords internal
#' @family waffle
#' @export
round_preserve_sum <- function(x, digits = 0)
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
# from https://www.r-bloggers.com/round-values-while-preserve-their-rounded-sum-in-r/
#' @title Prepare data for waffle plots
#' @description Manipulates a data frame ready for plotting with the codelinkwaffle_chart function.
#' @param dt data frame containing the data which should be transformed
#' @param fill character specifying the column name which should be used as fill for the waffle plot.
#' @param value character specifying the column name which contains values of the codefill variable.
#' @param composition logical indicating whether a compositional waffle (i.e. fill adds up to 100%) should be created. If codeFALSE, waffle cells will be scaled to codemax_value and missing cells filled with an "empty" category.
#' @param max_value numerical giving the value to which waffle cells should be scaled to, if codecomposition = FALSE.
#' @param digits integer indicating the number of decimal places to be used in rounding of the waffle cells.
#' @return returns a link[tibble]tibble data frame containing the minimum and maximum extent of each codefill level.
#' @author Mikko Vihtakari
#' @keywords internal
#' @family waffle
#' @import dplyr
#' @export
waffleize <- function(dt, fill, value, composition = TRUE, max_value = NULL, digits = 3)
x <- dt[c(fill, value)]
names(x) <- c("variable", "value")
if(composition)
x$value <- round_preserve_sum(10^digits*x$value/sum(x$value))
else
if(is.null(max_value)) stop("max_value has to be given, if composition = FALSE")
if(max_value < sum(x$value)) stop("max_value has to be larger than the sum of 'value' column. Use composition = TRUE, if you want a compositional waffle chart")
x <- rbind(x, data.frame(variable = "empty", value = max_value - sum(x$value)))
x$value <- round_preserve_sum(10^digits*x$value/max_value)
if(!is.factor(x$variable)) x$variable <- factor(x$variable, levels = c(sort(unique(x$variable)[unique(x$variable) != "empty"]), "empty"))
x <- x[order(x$variable),]
#tmp <- data.frame(X = 1:100, ymin = rep(c(0, (1:9)*10), each = 100), ymax = rep((1:10)*10, each = 100), variable = rep(dt[[fill]], dt[[value]]))
tmp <- data.frame(X = 1:10^(digits-1), ymin = rep(c(0, (1:9)*10^(digits-2)), each = 10^(digits-1)), ymax = rep((1:10)*10^(digits-2), each = 10^(digits-1)), variable = rep(x$variable, x$value))
out <- tmp %>% group_by(variable, ymin, ymax) %>% summarise(xmin = min(X)-1, xmax = max(X))
## Remove the empty category
out[out$variable != "empty",]
#' @title Plot waffle charts
#' @description The function uses link[ggplot2]ggplot2 to create waffle charts from data.
#' @param data data frame to be plotted
#' @param fill character specifying the column name which should be used as fill for the waffle plot.
#' @param value character specifying the column name which contains values of the codefill variable. Will be used to fill the waffle cells.
#' @param facet character specifying the column name which should be used to codelink[ggplot2]facet_wrap waffle charts.
#' @param ncol number of columns to be used in facetting. See codelink[ggplot2]facet_wrap.
#' @param composition logical indicating whether a compositional waffle (i.e. fill adds up to 100%) should be created. If codeFALSE, waffle cells will be scaled to codemax_value and missing cells filled with an "empty" category.
#' @param max_value numerical giving the value to which waffle cells should be scaled to, if codecomposition = FALSE.
#' @param digits integer indicating the number of decimal places to be used in rounding of the waffle cells. The value 3 indicates percentages, while 4 permilles.
#' @param fill_colors named character vector giving the colors for codefill levels. See codelink[ggplot2]scale_fill_manual.
#' @param fill_title character giving the title for the color legend.
#' @param base_size numeric giving the base size for the plot. See codelink[ggplot2]theme_void.
#' @param legend.position character specifying the position of the legend. See codelink[ggplot2]ggtheme.
#' @details The waffle charts are read from left to right (like text) and from bottom upwards (like water glass). The cells indicate 1% of the maximum value (100% if codecomposition = TRUE else codemax_value). The cells are divided vertically to fractions specifies by the codedigits argument.
#' @return Returns a link[ggplot2]ggplot2 waffle plot
#' @import ggplot2 dplyr
#' @family waffle
#' @author Mikko Vihtakari with code ideas from hrefhttps://github.com/hrbrmstr/wafflehrbrmstr and hrefhttps://github.com/liamgilbey/ggwaffleLiam Gilbey
#' @export
# data = dt; fill = "variable"; value = "value"; facet = NULL; composition = TRUE; max_value = NULL; digits = 3; fill_colors = NULL; fill_title = NULL; ncol = NULL; base_size = 12; legend.position = "bottom"
waffle_chart <- function(data, fill, value = "value", facet = NULL, composition = TRUE, max_value = NULL, digits = 3, fill_colors = NULL, fill_title = NULL, ncol = NULL, base_size = 12, legend.position = "bottom")
## White 1% grid
grid_data <- data.frame(xmin = c(0,(1:9)*10^(digits-2)), xmax = (1:10)*10^(digits-2), ymin = rep(c(0,(1:9)*10^(digits-2)), each = 10^(digits-2)), ymax = rep((1:10)*10^(digits-2), each = 10^(digits-2)))
if(is.null(facet)) ## No facetting
if(any(duplicated(data[[fill]]))) stop("data contains duplicated entries in fill column. Use the facet argument or summarize data before plotting.")
waffle_data <- waffleize(dt = data, fill = fill, value = value, composition = composition, max_value = max_value, digits = digits)
## Plot ####
p <- ggplot() +
geom_rect(data = waffle_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = variable)) +
geom_rect(data = grid_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), fill = NA, color = "white") +
coord_equal(expand = FALSE) +
theme_void()
## ####
else ## Facetting
waffle_data <- data %>% group_by_(facet) %>% do(waffleize(dt = ., fill = fill, value = value, composition = composition, max_value = max_value, digits = digits))
## Plot ####
p <- ggplot() +
geom_rect(data = waffle_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = variable)) +
geom_rect(data = grid_data, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), fill = NA, color = "white") +
facet_wrap(facet, ncol = ncol) +
coord_equal(expand = FALSE) +
theme_void()
## ####
## Fill colors ####
if(is.null(fill_title)) fill_title <- "Variable"
if(!is.null(fill_colors))
p <- p + scale_fill_manual(name = fill_title, values = fill_colors)
else
p <- p + scale_fill_viridis_d(name = fill_title)
## Final theme manipulation
p <- p + theme(
legend.position = legend.position,
aspect.ratio = 1,
panel.border = element_rect(color = "black", size = LS(1), fill = NA),
strip.background = element_rect(fill = alpha("white", 0.4), color = NA),
strip.text.x = element_text(size = base_size*0.8, margin = margin(t = 2, r = 0, b = 3, l = 0, unit = "pt")),
plot.title = element_text(size = base_size, hjust = 0.5, face = 2),
legend.background = element_blank(),
legend.box.background = element_blank(),
legend.title = element_text(size = base_size),
legend.text = element_text(size = base_size),
plot.background = element_blank(),
panel.spacing = unit(0.2, units = "line"),
legend.box.margin = margin(t = 0, r = 0, b = 3, l = 0, unit = "pt"),
plot.margin = unit(c(0.2, 0.5, 0.1, 0.1), units = "line")
)
## Return the plot
p
Finally, we can make the charts:
# Manipulate the dataset first to make sure that there are no replicate
# entries of factors used for the waffles
data("mtcars")
mtcars$gear_vs <- paste(mtcars$gear, mtcars$vs, sep = "-")
mtcars$carb <- factor(mtcars$carb)
x <- mtcars %>% group_by(gear_vs, carb) %>% summarise(value = sum(hp))
waffle_chart(x, fill = "carb", facet = "gear_vs", value = "value")
## You can also scale the waffles to a maximum hp in gear_vs
y <- x %>% group_by(gear_vs) %>% summarise(value = sum(value))
waffle_chart(x, fill = "carb", facet = "gear_vs", value = "value", composition = FALSE, max_value = max(y$value))
edited Oct 26 '18 at 6:56
answered Oct 25 '18 at 15:56
MikkoMikko
4,1474 gold badges34 silver badges73 bronze badges
4,1474 gold badges34 silver badges73 bronze badges
add a comment
|
add a comment
|
Here is an alternative approach only using tidyverse
(i.e. dplyr
, tidyr
and ggplot2
) to create the waffle plot or square pie chart. It is based on hrbrmstr's answer, but I've tried to make it slightly more general; any frequency table works as input and it is easy to adjust the dimensions of the waffles (e.g. rectangle instead of square).
library(tidyverse)
freq_table = mtcars %>%
count(gear, vs, carb, wt = hp) %>%
group_by(gear, vs) %>%
mutate(pct = n / sum(n)) %>%
select(gear, vs, carb, pct)
Second step creates the coordinates. Uses tidyr::expand()
instead of waffleize()
. Still using smart_round()
.
waffle.n = 100 # Number of blocks
waffle.cols = ceiling(sqrt(waffle.n)) # For square. Otherwise pick integer.
coordinates = freq_table %>%
group_by(gear, vs) %>%
mutate(waffle.num = smart_round(pct,1) * waffle.n) %>%
group_by(carb, gear, vs) %>%
expand(count = seq(1:waffle.num)) %>%
select(-count) %>%
group_by(gear, vs) %>%
arrange(gear, vs) %>%
mutate(
waffle.x = rep_len(1:waffle.cols, waffle.n),
waffle.y = floor((row_number() - 1) / waffle.cols)
)
I group by two variables (gear
and vs
) and therefore use facet_grid()
. If you group by a single variable then use facet_wrap()
. You will need to tweak the options slightly to get the optimal result (such as size of the device, or size and stroke of the points).
fig = coordinates %>%
ggplot(aes(x = waffle.x, y = waffle.y, fill = as.factor(carb))) +
geom_point(size = 7, shape = 22, color = "white", stroke = 0.8) +
#geom_raster() + # Alternative to geom_point() without gap between blocks.
facet_grid(rows = vars(gear), cols = vars(vs)) +
theme_void() +
theme(legend.position = "bottom", plot.margin = margin(5.5, 5.5, 5.5, 5.5, "pt"), panel.spacing = unit(15, "pt"))
fig
#ggsave("fig.pdf", width = 13, height = 17.5, units = "cm", dpi = 150)
A more interesting example with an uneven number of blocks and where the shares are not divisible by 10.
Dividing blocks/cells to accommodate fractional percentages, as Mikko suggests can be archived by increasingwaffle.n
and using something likegeom_raster(hjust=0, vjust=0) + geom_vline(xintercept=seq(0,waffle.cols,by=2), color="white") + geom_hline(yintercept=seq(0,waffle.cols-1,by=1), color="white")
instead ofgeom_point()
– bonna
Mar 30 at 11:51
add a comment
|
Here is an alternative approach only using tidyverse
(i.e. dplyr
, tidyr
and ggplot2
) to create the waffle plot or square pie chart. It is based on hrbrmstr's answer, but I've tried to make it slightly more general; any frequency table works as input and it is easy to adjust the dimensions of the waffles (e.g. rectangle instead of square).
library(tidyverse)
freq_table = mtcars %>%
count(gear, vs, carb, wt = hp) %>%
group_by(gear, vs) %>%
mutate(pct = n / sum(n)) %>%
select(gear, vs, carb, pct)
Second step creates the coordinates. Uses tidyr::expand()
instead of waffleize()
. Still using smart_round()
.
waffle.n = 100 # Number of blocks
waffle.cols = ceiling(sqrt(waffle.n)) # For square. Otherwise pick integer.
coordinates = freq_table %>%
group_by(gear, vs) %>%
mutate(waffle.num = smart_round(pct,1) * waffle.n) %>%
group_by(carb, gear, vs) %>%
expand(count = seq(1:waffle.num)) %>%
select(-count) %>%
group_by(gear, vs) %>%
arrange(gear, vs) %>%
mutate(
waffle.x = rep_len(1:waffle.cols, waffle.n),
waffle.y = floor((row_number() - 1) / waffle.cols)
)
I group by two variables (gear
and vs
) and therefore use facet_grid()
. If you group by a single variable then use facet_wrap()
. You will need to tweak the options slightly to get the optimal result (such as size of the device, or size and stroke of the points).
fig = coordinates %>%
ggplot(aes(x = waffle.x, y = waffle.y, fill = as.factor(carb))) +
geom_point(size = 7, shape = 22, color = "white", stroke = 0.8) +
#geom_raster() + # Alternative to geom_point() without gap between blocks.
facet_grid(rows = vars(gear), cols = vars(vs)) +
theme_void() +
theme(legend.position = "bottom", plot.margin = margin(5.5, 5.5, 5.5, 5.5, "pt"), panel.spacing = unit(15, "pt"))
fig
#ggsave("fig.pdf", width = 13, height = 17.5, units = "cm", dpi = 150)
A more interesting example with an uneven number of blocks and where the shares are not divisible by 10.
Dividing blocks/cells to accommodate fractional percentages, as Mikko suggests can be archived by increasingwaffle.n
and using something likegeom_raster(hjust=0, vjust=0) + geom_vline(xintercept=seq(0,waffle.cols,by=2), color="white") + geom_hline(yintercept=seq(0,waffle.cols-1,by=1), color="white")
instead ofgeom_point()
– bonna
Mar 30 at 11:51
add a comment
|
Here is an alternative approach only using tidyverse
(i.e. dplyr
, tidyr
and ggplot2
) to create the waffle plot or square pie chart. It is based on hrbrmstr's answer, but I've tried to make it slightly more general; any frequency table works as input and it is easy to adjust the dimensions of the waffles (e.g. rectangle instead of square).
library(tidyverse)
freq_table = mtcars %>%
count(gear, vs, carb, wt = hp) %>%
group_by(gear, vs) %>%
mutate(pct = n / sum(n)) %>%
select(gear, vs, carb, pct)
Second step creates the coordinates. Uses tidyr::expand()
instead of waffleize()
. Still using smart_round()
.
waffle.n = 100 # Number of blocks
waffle.cols = ceiling(sqrt(waffle.n)) # For square. Otherwise pick integer.
coordinates = freq_table %>%
group_by(gear, vs) %>%
mutate(waffle.num = smart_round(pct,1) * waffle.n) %>%
group_by(carb, gear, vs) %>%
expand(count = seq(1:waffle.num)) %>%
select(-count) %>%
group_by(gear, vs) %>%
arrange(gear, vs) %>%
mutate(
waffle.x = rep_len(1:waffle.cols, waffle.n),
waffle.y = floor((row_number() - 1) / waffle.cols)
)
I group by two variables (gear
and vs
) and therefore use facet_grid()
. If you group by a single variable then use facet_wrap()
. You will need to tweak the options slightly to get the optimal result (such as size of the device, or size and stroke of the points).
fig = coordinates %>%
ggplot(aes(x = waffle.x, y = waffle.y, fill = as.factor(carb))) +
geom_point(size = 7, shape = 22, color = "white", stroke = 0.8) +
#geom_raster() + # Alternative to geom_point() without gap between blocks.
facet_grid(rows = vars(gear), cols = vars(vs)) +
theme_void() +
theme(legend.position = "bottom", plot.margin = margin(5.5, 5.5, 5.5, 5.5, "pt"), panel.spacing = unit(15, "pt"))
fig
#ggsave("fig.pdf", width = 13, height = 17.5, units = "cm", dpi = 150)
A more interesting example with an uneven number of blocks and where the shares are not divisible by 10.
Here is an alternative approach only using tidyverse
(i.e. dplyr
, tidyr
and ggplot2
) to create the waffle plot or square pie chart. It is based on hrbrmstr's answer, but I've tried to make it slightly more general; any frequency table works as input and it is easy to adjust the dimensions of the waffles (e.g. rectangle instead of square).
library(tidyverse)
freq_table = mtcars %>%
count(gear, vs, carb, wt = hp) %>%
group_by(gear, vs) %>%
mutate(pct = n / sum(n)) %>%
select(gear, vs, carb, pct)
Second step creates the coordinates. Uses tidyr::expand()
instead of waffleize()
. Still using smart_round()
.
waffle.n = 100 # Number of blocks
waffle.cols = ceiling(sqrt(waffle.n)) # For square. Otherwise pick integer.
coordinates = freq_table %>%
group_by(gear, vs) %>%
mutate(waffle.num = smart_round(pct,1) * waffle.n) %>%
group_by(carb, gear, vs) %>%
expand(count = seq(1:waffle.num)) %>%
select(-count) %>%
group_by(gear, vs) %>%
arrange(gear, vs) %>%
mutate(
waffle.x = rep_len(1:waffle.cols, waffle.n),
waffle.y = floor((row_number() - 1) / waffle.cols)
)
I group by two variables (gear
and vs
) and therefore use facet_grid()
. If you group by a single variable then use facet_wrap()
. You will need to tweak the options slightly to get the optimal result (such as size of the device, or size and stroke of the points).
fig = coordinates %>%
ggplot(aes(x = waffle.x, y = waffle.y, fill = as.factor(carb))) +
geom_point(size = 7, shape = 22, color = "white", stroke = 0.8) +
#geom_raster() + # Alternative to geom_point() without gap between blocks.
facet_grid(rows = vars(gear), cols = vars(vs)) +
theme_void() +
theme(legend.position = "bottom", plot.margin = margin(5.5, 5.5, 5.5, 5.5, "pt"), panel.spacing = unit(15, "pt"))
fig
#ggsave("fig.pdf", width = 13, height = 17.5, units = "cm", dpi = 150)
A more interesting example with an uneven number of blocks and where the shares are not divisible by 10.
answered Mar 28 at 21:38
bonnabonna
6151 gold badge9 silver badges27 bronze badges
6151 gold badge9 silver badges27 bronze badges
Dividing blocks/cells to accommodate fractional percentages, as Mikko suggests can be archived by increasingwaffle.n
and using something likegeom_raster(hjust=0, vjust=0) + geom_vline(xintercept=seq(0,waffle.cols,by=2), color="white") + geom_hline(yintercept=seq(0,waffle.cols-1,by=1), color="white")
instead ofgeom_point()
– bonna
Mar 30 at 11:51
add a comment
|
Dividing blocks/cells to accommodate fractional percentages, as Mikko suggests can be archived by increasingwaffle.n
and using something likegeom_raster(hjust=0, vjust=0) + geom_vline(xintercept=seq(0,waffle.cols,by=2), color="white") + geom_hline(yintercept=seq(0,waffle.cols-1,by=1), color="white")
instead ofgeom_point()
– bonna
Mar 30 at 11:51
Dividing blocks/cells to accommodate fractional percentages, as Mikko suggests can be archived by increasing
waffle.n
and using something like geom_raster(hjust=0, vjust=0) + geom_vline(xintercept=seq(0,waffle.cols,by=2), color="white") + geom_hline(yintercept=seq(0,waffle.cols-1,by=1), color="white")
instead of geom_point()
– bonna
Mar 30 at 11:51
Dividing blocks/cells to accommodate fractional percentages, as Mikko suggests can be archived by increasing
waffle.n
and using something like geom_raster(hjust=0, vjust=0) + geom_vline(xintercept=seq(0,waffle.cols,by=2), color="white") + geom_hline(yintercept=seq(0,waffle.cols-1,by=1), color="white")
instead of geom_point()
– bonna
Mar 30 at 11:51
add a comment
|
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52741666%2fcreating-a-waffle-plot-together-with-facets-in-ggplot2%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown