We just recently discovered an issue where a programming decision affected the ease (or lack thereof) of establishing goal funnels in Google Analytics. What we also discovered was that it was completely avoidable with a little better communicatioin between our Django programming staff and our analytics staff.
As is typical after a new site launch, we create Google Analytics Goal Funnels to track all conversion opportunities. In this particular case, the client is a hospital and the funnel is a first appointment request. We recently upgraded our doctor directory/first appointment system so that it is search engine friendly, which necessitated a change in the URL structure.
By way of example, here is the typical path (funnel) to an appointment request:
Doctor Search Page
Doctor Search Return Page
Doctor Profile Page
Appointment Request Form
Submission Confirmation Page
In the process of creating this system, a staff Django programmer determined the URL structure for the application:
As you can see, it included the physician name in the URL which enhances search engine ranking. We also needed a primary key (in this case the number 65) in the event there are two Betty Smith, MD's.
The primary key can be multiple digits and the physician name is concatenated from the first-middle-last-suffix fields in the database.
So far so good, except when it was time to create the Analytics funnel. It seems as though it would be easy, but we could not easily isolate the 3rd level directory (/physicians/65/betty-smith-md/) from the 4th level directory (/physicians/65/betty-smith-md/appointment-request/).
After knocking our heads against the wall for an hour of so, the best we could come up with was the following regular expression:
However, this included the 4th level as well. Same was true for:
Regular expressions are greedy and obtuse. It seemed as though this should work, but it didn't. Finally we threw up the white flag and asked on the Google Analytics Help Forum. Here's the answer (special thanks to chrislatseer):
This is the regex you want:
The reason [0-9]*\/.*\/$ doesn't work is because it says zero or more numbers,
then a forward slash, then any characters (which includes a forward slash), then
a forward slash. So it never stops. What you wanted it to do was not include
any more slashes.
So [0-9]+\/[A-Za-z0-9-_]+\/$ one or more numbers, then a slash, then one or more
letters, numbers, hyphens, or underscores, then a forward slash.
It works and that's the regular expression we are using, but it was avoidable. If we had been smart enough to map out the URL structure from the beginning, it would have saved time, resulted in a more funnel-friendly URL and avoided the use of regular expressions entirely. A better structure would have been:
URL structure is important for SEO, informational architecture, metric and log analysis and yes, goal funnel creation. With this lesson in mind, we are making URL's a discrete part of the upfront planning process so that all constituents can have a say in its structure.