The Growth of Woodland Area within the United Kingdom (1998-2023)

Data Source

This project is focussed on visualising the increase in woodland within the United Kingdom, per country, from 1998 to 2023. The raw data and logo for this project were retrieved from ‘Forest Research’, a branch of the Forestry Commission (FC) which collects data by totaling extractions made from multiple forest services operating within the United Kingdom. It principally gathers information on tree-related statistics, including but not limited to gathering longitudinal provisional data on the estimated amount of woodland area in the UK.

The “Woodland area, UK, 1998 to 2023” ODS file contained 5 sheets of actual data, though was also supplemented with cover, content and notes pages. In tandem, the pages containing data were annotated with further notes clarifying the origin of the data and different commissions involved in gathering it. The pages containing data split woodland area based on whether it was coniferous or deciduous, private or public sector, and reported total estimated woodland per country and in the United Kingdom as a whole.

## NECESSARY PACKAGES:

library(here)
library(readODS)
library(tidyverse)
library(cowplot)
library(magick)

## Should packages need to be installed, remove '#' and run these two lines first:
#libraries<-c("tidyverse", "cowplot", "magick", "readODS", "here")
#install.packages(libraries, repos="http://cran.rstudio.com")

## DATA PREPARATION:
# Extraction of raw data from ODS file

sheets<-c("England","Wales","Scotland","Northern Ireland")
pathway<-paste0(here("raw_data", "area-timeseries-15jun23.ods"))
countries_list<-list()
for (i in seq_along(sheets)){
  countries<-sheets[i]
  countries_list[[countries]]<-read_ods(pathway, sheet = i+3)
} # this loop extracts all of the raw data from the country-specific sheets
raw_extracted_data<-data.frame(countries_list) # which is then put into a dataframe

rm(list=setdiff(ls(), "raw_extracted_data")) # removes unnecessary variables to clean environment

# Glimpse of the raw data

head(raw_extracted_data)

The Research Question

The main rationale for exploring this data was to elucidate whether there has been a growth in woodland area within the United Kingdom between 1998 and 2023; furthermore, it is to visualise how each country compares in both their numeric increase, and rate (percentage increase) of development. In particular, calculating the percentage increase is important to contextualise the proportional rate of growth for each country, beyond simply interpreting development based on raw numeric increase in hectares of woodland.

Data Preparation

Based on this research question, the main variables of interest were the Year and the total Woodland Area (in thousand hectares) of each country, per year. Before continuing, lines of descriptive text needed to be removed in order to make the data more amenable to analysis.

# Processing and cleaning of data

processed_extracted_data<-raw_extracted_data[-c(1:3),] # removes unnecessary text
names(processed_extracted_data)<-as.matrix(processed_extracted_data[1,]) # labels by original titles
processed_extracted_data<-processed_extracted_data[-1,] # removes text names from the data

# Glimpse of the processed data

head(processed_extracted_data)

From the now processed data-frame, all of the data on the amount of woodland area needed to be tabulated to give a glimpse of how woodland has changed - as well as to facilitate the calculation of percentage increase in total woodland from 1998 to 2023 for each country.

## DATA EXTRACTION: 
# Extraction of necessary variables from processed dataframe

year_ending_March_31st<-as.numeric(processed_extracted_data$`Year ending 31 March`) # The Year
woodland_area<-select(processed_extracted_data, ends_with(" total (thousand ha)")) # Amount of Woodland Area
country_names<-c("England", "Wales", "Scotland", "Northern Ireland") # Country Names

for (country in country_names){
  assign(country, as.numeric(woodland_area[[paste0(country, " total (thousand ha)")]]))
} # creates numeric variables for the woodland area of each country, per year
woodland_area<-data.frame(England,Wales,Scotland,`Northern Ireland`) # puts back into dataframe
table(is.na(woodland_area)) # checks for any missing data - none!
## 
## FALSE 
##   104
woodland_area<-woodland_area%>%
  rename(Northern_Ireland=`Northern.Ireland`)
country_names<-c("England", "Wales", "Scotland", "Northern_Ireland") # corrects for interaction issues caused by spacing of `Northern Ireland`

# Calculation of proportional change in woodland area per country

percentage_results<-data.frame(country=character(),percentage_increase=numeric(),stringsAsFactors=FALSE) 
# creates an empty dataframe

for(country in country_names){
  min_v<-min(woodland_area[[country]])
  max_v<-max(woodland_area[[country]])
  percentage_increase<-(((max_v-min_v)/min_v)*100)
  percentage_results<-rbind(percentage_results, data.frame(country=country,percentage_increase=percentage_increase))
} # this loop calculates the percentage increase in woodland area of each country from 1998-2023
percentage_results[1:4,2]<-round(percentage_results[1:4,2],2) # then rounds the data to 2 decimal places

## PRELIMINARY DATA CHECK: 

woodland_area
percentage_results

As is viewable above, there has been evident growth of woodland within all countries of the United Kingdom from 1998 to 2023 - in tandem, calculating the percentage increase has been beneficial for showing that in spite of smaller countries, namely Northern Ireland, having the lowest baseline (81 thousand hectares) and final (118 thousand hectares) woodland estimates, their proportional increase (45.68%) was quite notable in comparison to the other countries.

Data Visualisation

Finally, a visualisation of the growth of woodland area within the countries of the United Kingdom from 1998 to 2023 can be created, which displays this development both in terms of numeric and percentage increase.

## VISUALISATION:

rm(list=setdiff(ls(),c("woodland_area","year_ending_March_31st","country_names","percentage_results"))) 
# removes all unnecessary variables

woodland_growth_over_time<-data.frame(
  country=c(rep("England",26),rep("Wales",26),rep("Scotland",26),rep("Northern Ireland",26)),
  woodland=c(woodland_area$England,woodland_area$Wales,
             woodland_area$Scotland,woodland_area$Northern_Ireland),
  year=c(year_ending_March_31st)
) # creates a final dataframe amenable to the upcoming visualisation

mapping<-aes(x=year,y=woodland,colour=country) # creates the mapping for the visualisation
fig_path<-here("figs") # creates the necessary path for saving the figure
logo_file<-paste0(here("logo","Picture1.jpg")) # creates the path for applying the logo

# Creates a plot mapping the amount of woodland area development from 1998-2023, as 
# divisable by country, and as annotated with percentage increase

FinalPlot<-woodland_growth_over_time %>%
  ggplot(mapping=mapping)+
  geom_smooth(method="gam")+
  labs(x="Year (commencing from March 31st)",
       y="Woodland area (in thousand hectares)",
       colour="Country",
       title="The Growth of Woodland Area within the United Kingdom",
       subtitle="As annotated with percentage increase from 1998-2023",
       caption="Data retrieved from: Forest Research, 2023")+
  annotate("label",x=2010,y=1210,label=paste(percentage_results[1,2],"%"),
           colour="#EE0000",size=3,fontface="bold")+
  annotate("label",x=2010,y=375,label=paste(percentage_results[2,2],"%"),
           colour="#00CD00",size=3,fontface="bold")+
  annotate("label",x=2010,y=1450,label=paste(percentage_results[3,2],"%"),
           colour="#0000CD",size=3,fontface="bold")+
  annotate("label",x=2010,y=175,label=paste(percentage_results[4,2],"%"),
           colour="#FFA500",size=3,fontface="bold")+
  scale_colour_manual(values=c(England="#EE0000",Wales="#00CD00",
                               Scotland="#0000CD",`Northern Ireland`="#FFA500"))+
  scale_x_continuous(breaks=seq(1998,2023,5))+
  scale_y_continuous(breaks=seq(0,1500,150))+
  theme(panel.border=element_rect(colour="#8B7355",fill=NA,linewidth=2),
        panel.grid.minor=element_line(colour="#CAFF70",linewidth=0.5),
        panel.grid.major=element_line(colour="#CAFF70",linewidth=0.7),
        panel.background=element_rect(fill="#FFFFF0"),
        axis.line=element_line(linewidth=2,colour="#8B7355"),
        plot.title=element_text(face="bold"),
        plot.subtitle=element_text(face="italic"),
        text=element_text(family="serif"),
        legend.title=element_text(face="bold"),
        legend.box.background=element_rect(colour="#8B7355"),
        legend.box.margin=margin(1,1,1,1),
        legend.key=element_rect(colour="#8B7355"))

# Adds the official logo to sit alongside the data source

FinalPlot<-ggdraw(FinalPlot)+
  draw_image(logo_file, scale=.2,x=1,hjust=1,halign=1,valign=0)

## Visualisation of the growth of woodland area within the United Kingdom, from 1998 to 2023

FinalPlot

## SAVES THE PLOT

filename<-paste("The Growth of Woodland Area in the UK from 1998 to 2023.png",sep="")
ggsave(file.path(fig_path,filename),plot=FinalPlot,width=7,height=6.37)

Conclusion

From this plot, there is visual clarity that the growth of woodland area within countries should be interpreted both in the context of actual hectares of woodland area developed, as well as the proportional increase relative to the original amount of woodland within each country. This is because reporting either of these measures outside of the context of the other could be misleading. For example, interpreting that England or Scotland have made superior efforts to increase Woodland area compared to Northern Ireland, in the context that they have gained considerably higher amounts of woodland area from 1998-2023, neglects how Northern Ireland has shown greater proportional growth through a higher percentage increase in native woodland. Regardless of these differences, the data and accompanying visual can simply be interpreted on the promising notion that all countries within the United Kingdom have shown a growth in woodland area between 1998 and 2023. Further research could use data from the original file to map differences in the development of public or private sector woodland, or even split based on being deciduous or coniferous. In tandem, usage of other available data from the Forest Research service, such as how the amount of sawmills and wood production has changed from 1998-2023, could be interesting for investigating whether the rate of increasing woodland is sustainable compared to the rate of increasing wood production by sawmills in the UK. Beyond the constraints of the available data, taking into account the actual geographic size of each country could also be highly insightful for mapping this change in relation to both total land area, and the percentage of each country which is covered in woodland area (and how this changes).

Full Data Reference

Forest Research (2023, March 31). Tools and Resources: Data Downloads. https://www.forestresearch.gov.uk/tools-and-resources/statistics/data-downloads/