If the only tool you have is a hammer, every problem starts to look like a nail. Sure, there are better tools for text splitting and string parsing for people with some programming expertise, but Excel usually acts as the default analysis and data manipulation for most analysts (including myself!) As a quick and dirty analysis and text processing tool, it does ok; the intent of this post is to walk you through some of the most common text functions within excel, and provide some examples of how I’ve used them in a PPC, SEO, and Web analytics context.
The data I’m working with in this example are full referral URLs from Google Analytics, populated into the user defined variable using an Advanced Custom Filter. Referral URLs are particularly pesky to analyze because there’s a lot going on within each. Let’s break them down into more manageable pieces using mid. Here’s what the URLs look like from GA.
Excel’s MID Function.
Mid’s arguments are Text, start number, and number of characters. Rather than hard-coding values like 4 or 10 into the arguments for Mid, it’s better to calculate the start number and characters based off of ‘Landmarks’ within your string. (When working with complex excel formulas for the first time, I like to break them down into individual formulas, then re-nest them all at the end using a good text editor.) Say we want to take a referral string from google that looks like this
http://www.google.com/search?q=jared+huber&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a
and extract the actual search query (it’s the stuff starting at “q=” and ending at the “&” after that). To use Mid, we’d first have to define a start number. In this case we can use the FIND function to help us locate the “q=” within the URL string. (FIND returns a number that corresponds with the first occurrence of one string within another text string). We’ll use:
=FIND("q=",A3)
which has a value of 30, along with
=FIND("&",A3,B3)
where B3 is the output of the previous find function.
Putting these together, we want to use mid with a start of 30, and a length of 43-30 or 13 characters.
Our formula now looks like this:
=MID(A2,B2,C2-B2)
Excel Formula Consolidation using Find & Replace
Now’s the time for that handy text editor. Let’s clean up the formula by substituting all of the values with their original formula. This basically allows us to put it all in one cell, rather than sprawling it in several cells in the spreadsheet. If this seems vaguely reminiscent of something you had to do in Algebra class, it’s because it is. Grab the formula in D2 “=MID(A2,B2,C2-B2)”, and put it into notepad (or textwrangler, if you’re lucky enough to use OS X on a daily basis). Do a find-replace (ctrl+H), and replace all instances of ‘B2′ with the formula we had in cell B2, ‘FIND(“q=”,A2)’. Likewise, find-replace all instances of ‘C2′ with our formula in C2, or FIND(“&”,A2,B2). You’ll probably have to do one more, replacing B2 again with our B2 formula. What we;re left with looks incredibly complicated (and may make you look like an excel guru), but in reality, it’s several systematic steps to produce an error-free result.
=MID(A2,FIND("q=",A2),FIND("&",A2,FIND("q=",A2))-FIND("q=",A2))
Cleaning up the output
You can then adjust your mid to exclude the q= by hardcoding a +2 on the start point and a -2 on the endpoint, or using the right function on the output and lopping off the first 2 characters. Additionally, you may want to substitute the + characters in the output with spaces to reflect the true search query.
=RIGHT(SUBSTITUTE((MID(A2,FIND("q=",A2),FIND("&",A2,FIND("q=",A2))-FIND("q=",A2))),"+"," "),LEN((MID(A2,FIND("q=",A2),FIND("&",A2,FIND("q=",A2))-FIND("q=",A2))))-2)
{ 2 comments }
