How to Conduct A Content Audit

How to Conduct A Content Audit

A UX Designer looks frustrated as she prepares to tackle a mass of documents
Auditing a large content website can be a daunting task. Where to start?
Summary:

A content audit isn’t something you’re going to want to tackle. But you can’t undertake a redesign of a content-heavy site without it.

Donna Spencer shows you how to conduct a Content Audit in this sketch video.

If you’re working on any kind of redesign project involving a large amount of content, such as that of a website, intranet or mobile site, one of the first tasks you’ll need to perform is a content audit.

I say need, not want—a content audit isn’t something you’re necessarily going to want to tackle. It’s one of those un-sexy, tedious jobs that hardly anyone talks about. But you can’t undertake a redesign of a content-heavy site without it.

What is a Content Audit?

A content audit is the activity of checking all of the content on a website, and compiling it into a big list. There are three main types of audits you can perform:

  • Full content inventory: A complete listing of every content item on the site. This may include all pages as well as all assets (such as downloadable files and videos).
  • Partial content inventory: A listing of a subset of the site’s content. A partial inventory may include, for example, the top few levels of a hierarchical site or the past six months of articles. All sections of the site will be covered.
  • Content sample: A less detailed collection of example content from the site.

What is a Content Audit Used For?

The main purpose of a content audit is to produce a listing of the site’s content, usually in a big spreadsheet.

This list of content will come in handy at various stages of the project. If you’re re-doing the information architecture, you’ll return to it again and again to remind yourself of the details of each page; you can also use it to talk to authors about managing and rewriting their content; and if you’re going to be moving to a new content management system, you’ll use it to keep note of what you started with, and where you’re up to.

That said, having a comprehensive list of content isn’t the only benefit of this process. Just by taking the audit you’ll get a much better understanding of the content. You may find things you didn’t know existed, spot duplication and identify all kinds of relationships in the content. It can also serve as a precursor to a more comprehensive content analysis, but that’s a topic for another post!

What Does a Content Audit Include?

I always record a content audit in a spreadsheet, mainly because spreadsheets are so flexible. They are also great at holding a large amount of information in a fairly manageable way. Plus they’re easy to share with other people.

I recommend collecting the following information for every page:

  • Navigation title: The name of the main navigation link to the content (e.g. the link title in the main navigation)
  • Page name: The displayed page title
  • URL: You may want to display the URL or just link from the page name
  • Comments: Notes and things for you to remember
  • Content hierarchy: Some way of showing the basic relationship of the content items

You may also like to add information about:

  • Content Type: Is this a basic page, publication, news story, article, technique, FAQ, or something else?
  • Basic content description: A brief reminder about what’s on the page
  • Topic, tags or category: Meta data for products, articles, news, blog posts
  • Author: Who wrote this content?
  • Owner: Who is responsible for the content?
  • Date last updated: When was the content last updated?
  • Attached files: How many files are attached, and what type of files are they?
  • Related: What information is linked from sidebars or Related Links boxes on this page?
  • Availability: Is the content available to desktop, mobile and/or app users? Is the content syndicated to other sites?
  • A numbering system: An index to help you when referring to each content item.

You may need to collect different information for each type of content. For example, you may want to list topics or categories for news content; and only list downloadable files in a publications area.

The most important thing to know about a content audit is there really is no right or wrong way to do it—it’s a tool for you to use throughout your project, so create your content audit in a way that will help you. And don’t be afraid to adapt it after you start—each client and project is different, so  each audit will be different.

RELATED:  Review: Usability Testing Boot Camp

Where to Begin

Getting started is easy! Here’s how I go about it.

  1. List the main pages or sections of the site in the first column of your spreadsheet (right alongside your index). Here’s an example of content audit spreadsheet for a site that may look familiar:
    Start your content audit by creating a list of the top-level items—this will often match the primary navigation.
  2. Choose one page to start with and dive into it, capturing the information you’ve decided upon for that page.
  3. If that page has sub-pages, make a list of each of them, and repeat the process for each of these in turn.
    Dive into any list of sub-pages, and complete that section before moving on.
  4. Then just keep going, until you’ve explored and written down everything you need to. That’s really all there is to it.
    Capturing the content of a site in a spreadsheet will help you make informed design decisions.

Auditing your content it this way—writing down details of the current page, then listing the sub-pages, then exploring a page—builds out your list in a way that allows you to come back and explore each section one-by-one.

If you’re auditing a big site, it can be very easy to get lost—it’s important to take this process step-by-step, and to finish one section before starting another.

Tips

  • If your site is run from a CMS, you should be able to get access to a list of all the pages from the site. If it’s a good CMS, and the content is already fairly well structured, you may even be able to have the CMS generate a good quality starter audit for you. If the CMS can’t do it, a tool like the Content Analysis Tool may help.
  • Don’t capture information you are unlikely to need or use. If you’re unsure whether you need information for a specific page, write it down for a handful of pages, to get a feel for whether it will be useful. You can always come back and fill it in for other pages at a later stage.
  • It can sometimes be difficult to determine how a site is structured. In fact, often the process of figuring out what the main sections of a site are can be a challenge. Don’t worry too much about getting the relationships right and showing how pages are connected at the beginning. Just focus on getting pages written down into the spreadsheet—as you get through the audit, you may find a better way of organising the information.
  • Don’t expect the content audit to be fast. Big sites can take days and days to audit. I use this fact as an excuse to buy new music, then sit down and plough through it!
  • Don’t try to take shortcuts, skip sections or skim through without really looking. It’s important that you understand all of the content before you try to work with it later.
  • If you’re working on a brand new site, a content audit can still be useful. Instead of starting with the current site, make a list of all of the resources you’ll be using—printed procedure manuals, fact sheets, videos, paper forms and other documents that will influence the site.

It All Starts with Content

Whether you decide to create a comprehensive list of every item, or just a sample selection, a content audit is a crucial first step in the path to understanding any content-heavy website. While the process may sound tedious (and, granted, often is!), undertaking this process will provide you with the insight and context you need to make informed design decisions.

Creating a content audit doesn’t require years of experience, but it does require patience, persistence, curiosity, and attention to detail—all good traits of a UX Designer!

Download the content inventory spreadsheet used in this example.

Donna Spencer
Written by
Donna Spencer
Join the discussion

18 comments
  • Although I agree that understanding your content is an essential step in performing a redesign/relaunch, I have encountered cases where a full content audit is impractical, or even impossible.

    As an example, I worked on redesigning the UX for the European websites of a multinational. There were 18 language variations and each country had 5 separate websites – main site, consumer site, support, professional products, corporate. Every day new articles were being published and older articles were being amended or deleted. In addition new functionality was being implemented on a monthly basis, and this often led to new content types emerging. A conservative estimate was that there were upwards of 500,000 live content pages across the property.

    It wasn’t practical, or indeed necessary, to perform a rigorous audit of every content item.

    Instead, we audited a subset of the English language content – enough to identify the main types of article being published – and circulated this to each country’s web production team to confirm that these types applied equally to them.

    From this we identified 3 broad types of content:
    * Horrible stuff – Content inside systems that could not practically be reorganised within the scope of the project. Solution: design around them and organise a future project to deal with them properly.
    * Boring stuff – Content that, due to time sensitive nature, was not worth spending effort on reorganising. Solution: Created an archiving process that involved minimal metadata changes.
    * Important stuff – Existing or imminent content that either had a long shelf life or would have high visibility at the time of the relaunch.

    The Horrible and the Boring content represented the vast majority of the system and grouping them in this way allowed us leave them until another day.

    We concentrated our effort on auditing the subtypes of the Important stuff but, again, we did not take a rigorous approach, preferring instead to delegate that responsibility to the individual countries.

    We were able to formulate a migration/reclassification process that each country would need to undertake before they could roll over to the redesigned system. The release was staggered over 6 weeks, with the largest territories (UK, Germany & France) being rolled out last, so that we could identify any glitches when dealing with the smaller territories.

    In short, we performed a thorough audit at the macro level, but devolved responsibility for understanding the micro detail to the respective local owners.

    • Excellent – horrible and boring categories :) I’ve never been that brave.

      I once worked on a university-wide project (actually, in pieces I still am). That was also one that was impossible to take a full audit. There was not even a chance that I’d be able to find ‘everything’. So I took a good audit of most of the important student-focused material and just listed other ‘sites’ that existed in case I needed to know about them one day.

  • Colin,

    I like the ‘horrible/boring/important’ schema. For the content audit I’m doing right now, which is smaller but still daunting, I’ve used two columns:

    Readability: I plug the text into the gunning fog index, then grading it according to its score. A score of 17+ (ie, you’d need a PhD to understand it) is graded as a 1, and so on. Readability indexes aren’t watertight, but they are useful for a snapshot.

    Usability: A 10-second expert review. I’m aiming to mirror the user’s initial impressions. Is the layout all over the place? Can I make sense of the headings or the link structure? Again, 1-5.

    This approach helps with reporting to the client on the usability of their content (part of my brief) – and helps me to make sweeping statement such as ‘90% of your content is written at a postgraduate level).

    • Other dimensions are ‘findability’ and ‘cullability’.

      I have a redesign project about to start for a website that was heavily and clumsily SEOd – keyword stuffed in places to the point that it’s unreadable to humans.

      I’ve just gone through 6 months of Google Analytics data to see which pages are attracting search traffic and which can be chopped. I’m pleased to say, a lot of the clumsy SEO work attracts no traffic so it will be culled.

      However, I’m left with a lot of horribly written content that attracts significant traffic. The client is wary of simply rewriting this content as he is clinging to the fetishism of the SEO witch doctors.

      The plan is to, kind of, sweep it under the carpet – move it into a blog archive structure with 301 redirects so that search engines know where to find it. As they should retain their search goodness they will effectively become landing pages, so we will add prominent waymarkers to these pages that link readers to the new, well written, content.

      If all works well, the new pages should gradually rise in the search rankings as they will have lots on internal links from well ranking pages.

      • Thanks.

        It *should* work – I did something similar a few years back where Google loved certain articles that had been superseded. By inserting a quite disruptive “This article is now out of date – click here for the latest information” box at the top of the articles we managed to get between 70% and 85% click through to the more relevant content.

  • Hi Donna,
    I wanted to reach out and say that this was a very useful and thorough article. I certainly will be using the tips that you proposed. What especially resonated with me was, “capturing the content of a site in a spreadsheet will help you make informed design decisions.” I have recently been investigating the important of content in user experience. So, this article gives me the support to further pursue this topic and issue internally for the better.

    Thank you once again!

    Danielle

  • Thanks Donna, I’m going to use this article to try to push the content agenda on a new-build project, it’s never too early to be thinking about REAL content….

  • I just finished the website redesign project I mentioned in an earlier comment on this thread.

    I used the site back-end to produce a complete list of live pages. I then cross referred this with Google Analytics data for the previous 6 months to identify the top landing pages, search keywords driving traffic to each page, and popular pages that weren’t landing pages (that users navigated to during their visit).

    This was extremely useful as it allowed me to do the following:
    * Identify the most popular pages on the site from a Search Engine perspective:
    ** Ensure these pages were not lost during the redesign
    ** Prioritise them for rewrites, but with sensitivity to preserving SEO goodness
    * Identify less popular pages that we would have expected to receive visits and ensure they were surfaced and rewritten in the redesign.
    * Identify duplicate content & dedupe – almost half of the site content was duplicate content, created by a pitifully inept SEO agency. I kept the fittest page and made 301 redirects from the previous duplicates to remove the duplication.
    * Last, but by no means least, we were able to kill off pages that had received few or no visits in the previous 6 months.

    This grading and sifting process gave me a much more manageable subset of content pages that could be evaluated and recategorised into structures that we knew, by referring to the search keywords, were aligned to what people are looking for on the site.

    It took about 2 days at the front end of the project to perform the content audit and propose the new categorisation. If it hadn’t been done there would have been a lot of effort wasted on content elements that nobody was interested in.

    • I’ve done a similar thing in the past, colour-coding the spreadsheet according to page popularity, entry points etc. For some sites it isn’t particularly dramatic, but for some doing this clearly showed that almost all the traffic was in one place.

  • Hi Donna,

    This was very helpful thanks. I recommend taking at look at a tool created for SEO but that has excellent uses for this workflow – Screaming Frog SEO – http://www.screamingfrog.co.uk/seo-spider/

    Basically it collects everything about a website that you have listed here and can then be exported to Excel where you can make adjustments with comments etc.

    It goes a bit further and also analyses some best practices as far as Title and Description tag lengths are concerned and identifies duplicates and additional technical issues pages may encounter when the search engines come along.

  • Great post, Donna. Do many people getting an audit inquire about grammar or accessibility? Or are those typically beyond the scope of the audit?

    • Hi Len. The audit is your tool, so do whatever you need for your project. I have never worried about listing grammar and accessibility, but you could if you’ll be likely to use them later

  • Great article!

    One thing I’ve found is that content audits, in my experience, tend to be iterative. I have never just gone all the way through a site one time. I always notice patterns and end up doubling back to check out this or that detail. I’m looking for problems and inconsistencies.

    Thanks for sharing this! It’s really helpful.

  • Hey, Donna: I’m really late to this party, but wanted to chime in, especially along the lines of Colin’s content categorization scheme. It reminds me of @gerrymcgovern’s acrostics for describing online content:

    RICH – Relevant-Interesting-Current-Helpful

    ROT – Redundant-Outdated-Trivial

    Your article is definitely RICH. Thanks for sharing it.

    Sincerely, Tim