Posted on January 18, 2019 by Jason Wright Understanding the architecture of your website can be a challenging task. There are so many moving parts on really large websites. The value you obtain from running through this exercise, however gives you insight into how any website is structured. The following content will walk you through how I do a “tare down” of an active website, run crawls, filter data, and then walk away with a rough idea for site recommendations a new sitemap. Step 1 – The Crawl Grab the URL you want to review and throw it into Screaming Frog. You may need a license to crawl a site that’s larger than 500 pages. I’m picking something easier for this example. The crawl should look something like this: Step 2 – The Export Export the HTML data from Streaming Frog. Hit the drop down and select HTML, then export the data to a CSV. I like to name my files something like Jane Does Plumbing Architecture.csv. Upload your file to Google Drive (you can follow along if you’re using Excel). Step 3 – The Initial Grind IMPORTANT – Open your recently uploaded file and DUPLICATE the sheet. Rename the original tab as “CLIENT NAME (Raw)”. Name the duplicate sheet southing like “CLIENT NAME (Initial Pass)”. …and now comes the fun part. When you have a reliable crawl uploaded you can start filtering the data down into chunks that your project team can understand. Prepare the Destruction There are several key sub-steps in this process. They are as follows: Delete columns that you don’t find critical in a typical crawl like: content type, inlinks, outlines, ratios and that kind of stuff. Keep everything you generally use like titles, descriptions, crawl codes (404, 302, etc). Set the primary table column header row as filters for the sheet. This allows to keep, but hide data like broken or missing pages. Filter out 302’s and 404’s. Categorize the Findings Insert a new column into the sheet and call it “category”. This is going to play a major role into breaking down the document. If you’re still with me, then it’s time to break down the content via categorization. The naming conventions you use are totally up to you. To get things going, I always sort the sheet by the name of the URL’s. Doing this groups up pages making it easier/faster to make big updates to categories for pages. The entire goal here is to get a clear idea of what we need to pay attention to and to have this file serve as the foundation for a client facing sitemap that meets the needs of UX/UI designers and digital marketers. The Categorizing Approach This part is rather straightforward. Through various sorting methods I categorize pages in the URL column. In the end, I want a sheet that’s going to group up all the similar pages together via labels and color coding. This sorting allows me to understand everything that’s on a live site. For example, I’ll go through and look for dynamic URLs for blogs like blogpage/2. These are dynamic URLs and I categorize them as such. I’ll take content for something like an “About” section and generally name all “About” type pages as “Main-About” in the category column. When all pages are sorted, I’ll color code the various category groups just make things a little easier for me. What All This Means When you’re done labeling and coloring the heck out of your sheet, you’re ready to take the next step. Depending on your role, this could mean a number of things. Like understanding the word count on specific pages to breaking down a future version of a sitemap through a combination of keyword research, new content plans and existing content transfer plans. Hopefully this helps you take a more detailed and forensic approach to website architecture analysis in the future. In a later post, we’ll merge this data with other data sets to give us a powerful view into the performance of a website.