Home / Services / Web Analytics

Web Analytics


Web Analytics and Data WarehousingWeb Analytics and Data Warehousing



Introduction

In today’s economy, organisations communicate and build relationships with their customers and prospects through online and offline channels. From traditional methods of communication, such as direct mail and storefronts, to sophisticated communication via a company’s e-business, each interaction represents a valuable opportunity to capture information about customer needs, interests and preferences.

More specifically, an e-business offers a unique characteristic that other channels do not – near realtime behavioural information, which can be indicative of future purchasing and visitor browsing behaviour. With each click of the mouse, web site visitors indicate what products they are interested in, what promotions they respond to and what services they need, empowering business managers to better serve their customers’ needs in order to increase marketing efficiency, reduce operating costs and build stronger, more profitable relationships with customers.

While web data is an invaluable source of customer information, it poses a unique challenge to businesses seeking a more complete view of their customers. Detailed data about a web visitor’s actions on a web site is captured through the logs from the web site’s web servers or tagged. Clickstream data is very raw and massive in scale, and while there is tremendous information available, extracting that information and organising it in a useful way is difficult. Successful analysis of web customer data is dependent on the creation of a robust data warehouse, which serves as a central repository for web data and a source for further analysis and reporting.

Today, the use of data warehousing and sophisticated analytics places powerful information about customers directly into the hands of marketing, e-business and service professionals who can directly analyse and measure the return on their web initiatives.

But the power of e-business intelligence does not stop with web data. A deeper value of an e-business is unlocked when behavioural information can be connected to an individual, and then integrated with information from other channels. The joining of historical information, including transactions through different enterprise touchpoints, marketing campaigns and service contacts, with web site browsing behaviour provides a more complete, multi-channel view of customers and prospects. The web site browsing behaviour and/or responses to e-mail campaigns and newsletters indicate what visitors have immediate interest in and, when coupled historical purchase information, can provide a powerful indicator of future behaviour or actions. This information is invaluable to organisations that wish to build stronger, more profitable relationships with their customers by providing personalised service and relevant marketing offers.

The key to providing detailed, web site analysis is extracting and maintaining the pertinent information from the raw clickstream data. The volume of clickstream data, the dynamic and rapid changes to the interpretation of the data as information is changed on the site and new campaigns are added, and the wide variety of web site implementations combine to provide a constantly evolving and rapidly changing environment.

Combining Matraxis’ experience of data warehousing with Webtrends ability to analyse web site behavioural data provides:

  • A complete view of web customer behaviour, collecting a complete customer session for aggregate and individual customer reporting analysis
  • A schema based on over ten years experience in processing and analysing web clickstream data
  • A clickstream analysis application that provides flexibility and demonstrated scalability
  • A browser-based user interface for managing the application and capturing dynamic changes
  • A fully documented schema that captures individual behavioural information
  • A schema that can be extended with other data to serve as a contact or customer oriented warehouse
  • A source of web behavioural data for deeper analysis or integration with other warehouses
  • A continually evolving application that takes advantage of Webtrends’ advances in simplifying the collection and analysis of clickstream data

The combination of Webtrends’ industry-leading web data warehousing with sophisticated analytics and reporting tools gives users a powerful way to analyse and improve customer relationships across all touchpoints in the organisation.

The following sections illustrate how the Webtrends Warehouse resolves the problems associated with collection, analysis and organisation of web site behavioural information, and how to utilise it to solve business problems.


Transforming Web Logs into Useful Information

Web server log files capture information about each individual transaction or request between a customer’s web browser and the web server. An individual page displayed on a browser may be composed of multiple files returned from several different web servers. The actual content delivered by these web servers may come from several different back-end databases or applications. Each file represents a single, independent transaction that will be logged by the web server that delivered it.

The typical web site is a complex environment with many different sources of information that are delivered through a network of load-balanced web servers backed up by application servers and databases. A complete visitor session may span many web servers and sources of information. Building the warehouse involves importing and transforming these individual transaction records into meaningful information about web site customers. It must also manage the transformation process by including:

  • Scheduling of imports
  • Log file import and administration
  • Transformation definitions
  • Transformation
  • Warehouse loading and updating
  • Warehouse administration
  • Stored procedures for managing and integrating data

Turning visitor events into reportable and meaningful information requires knowledge about how you plan to use the information. It also requires information about how the site is constructed and managed so that information can be properly identified. For example, there are a variety of ways in which an individual session, or visit, can be identified. Each method has its own set of challenges and plays a role in determining the accuracy of the identification of the individual visits.

The transformation process captures both default and user-defined, site-specific information. Default information includes basic statistics such as:

  • URLs referenced
  • Referring domains
  • Browser type
  • URL parameters
  • Cookies
  • Authenticated User
  • Time and Date

Site-specific information defines the pages, actions, paths, campaigns and content on the site that are interesting to the various ‘owners’ of the site, e.g. business and marketing managers. Typical of site specific information is the use of re-direct (or ‘landing’) pages for banner advertising and e-mail campaigns. E-mail advertising campaigns or newsletters may include a URL that directs the recipient to a re-direct page on the site, which identifies the campaign and may also collect a parameter attached to the URL stem that also identifies the recipient. The information identifying the campaign and the parameter are part of the site-specific information identified by the owners to allow the information to be captured, identified and classified in the Warehouse. Once in the Warehouse, the information can be reported on or used to further identify the visitor, provide additional personalised information, or include the visitor as part of a group of responders.

Content groups are another example of site-specific information. Content groups allow the site owner to identify specific pages, products or combinations of these as interesting behaviour. Whenever a visit matches the combination, the information will be stored as a visit attribute and will be available for later reporting and classification of overall visitor behaviour.

As an example consider the Webtrends Warehouse:

Webtrends Warehouse


Data Collection: From Page Views to Customers

The power of the Warehouse comes from organising the transformed log data into individual visits associated with individual visitors. Using advanced analytics, the Warehouse information can be viewed in aggregate to understand activity on the web site, including information like the top pages visited, the most popular paths to or from a page, what content was viewed or what articles were downloaded. Or you can look at the individuals who actually visited those pages, took a particular path or downloaded a particular article or paper. The ability to view the aggregated behaviour and then drill down to the individual or group of individuals who exhibited that same behaviour is extremely powerful.

Once the actual visitors are identified, actions such as creating lists for e-mail and direct mail can be targeted at that group of individuals. The information may be used to mark the individuals with a “score” that can be used to identify them for follow-up by a sales person or as part of a personalisation activity when they next visit the web site.

An overview of a Webtrends Warehouse schema and the relationships between the data is illustrated below.

Webtrends Warehouse overview


Identifying Visits

The previous section introduced the concept of tying together all of the pages that a visitor touched during a single visit. The ability to identify all of the pages in that visit accurately is critical to collecting and analysing visit behaviour consistently. Depending on how the site is constructed, this can sometimes be a problem. Since each request to the web site is essentially a single communication with the web site, there has to be a way of identifying that all of the requests come from a single visitor.

There are several well-accepted methods for identifying requests from a single visitor; including session and persistent cookies, session IDs in the URL stem or parameter and authenticated user IDs. IP addresses are generally less valuable as both a session identifier and as a means of identifying a returning visitor. IP addresses are typically dynamic and may actually change during a visit. This is especially prevalent with America Online (AOL) users or where a proxy server is used by a business and all users have the same IP.

The most foolproof method to identify a visit is to use a First Party cookie, which will then be associated with every subsequent page request from that visitor. Cookies are small data files that are placed in the visitor’s system and can be set permanently (persistent) or for the duration of the visit (session). The visitor also has the option in their browser to not accept cookies, which will add some inaccuracies to overall traffic analysis, however, Webtrends patented First Party cookie solution provides the best solution available.


Collecting Visit Attributes

Once all of the pages and events have been collected and associated with a visit, a customer’s behaviour can be derived and classified. Visit start and end times, referring domain, cookies, authenticated user ID and browser type are all examples of default information that is collected automatically and stored in the Warehouse. While this provides some basic statistical information about the visit, it is the user-defined interpretation of the data contained in the visit that can provide the most relevant information.

The user requires a number of mechanisms in the Warehouse to identify interesting events. Content groups, qualification levels, product views, calls to action, favoured paths and campaign response are typical events that can be derived from the visit. For example, qualification levels apply an interest level to a particular action, such as the download of an article. Unique entry pages will identify a response to a campaign. These events are based on web site content and organisation, and are defined in the transformation definitions. The events are identified during the log file import and captured in a set of tables known as the visit attribute tables.

Web sites are frequently very dynamic, with campaigns constantly starting and stopping and new qualifications or content groups changing to match the site changes. The Warehouse must maintain a record of all these definitions, applying the latest definition at import time. The definitions, maintained in ‘settings’ tables in the Warehouse, provide a record of the important information desired from the instantiation of the web site at the time of analysis.


Associating the Visit with a Visitor

The ability to identify ‘who’ the actual visitor is consistently for a given visit can be very difficult. Identification is extremely dependent on the strategy of the web site for identifying visitors. For example, a site using persistent cookies to identify unique visitors and to ‘sessionise’ the visits cannot always associate the same visitor with all of their visits. If the visitor accesses the site from different computer systems, he will have a unique cookie for each system. Visits made from the different systems will be associated with the cookie from that system.

To consistently identify a unique user, registration is required. Many sites require registration for access to services, downloads or e-commerce. Registration and login information can then be associated with a visitor and stored in the visitor attribute tables. Registration during just one visit allows all visits associated with that cookie to be tied to that user. Including visits made prior to the visitor registering!

As can be seen by this one example, identifying unique visitors and tying all of their visits together is not always an exact science. In the above scenario, other users from the same system could also have their visits associated with the cookie that resides on that system. Typically, these situations are not a major percentage of the visits, but it is easy to see that the site strategy and design is critical to tracking the visitors an organisation is interested in.


Integrating Visitor Behaviour with Other Channels of Information

Once registration information or a login ID is associated with the visitor, the visitor becomes identifiable. The process of registration provides information about that visitor which fully identifies him and allows communication with him. It is critical that this registration information be captured and then associated with the visitor. The registration information can be directly associated with a visitor in the Warehouse or indirectly through a login ID that is generated through the site’s registration server.

The Warehouse must maintain a very flexible visitor attribute mechanism for storing attributes directly associated with a visitor. Information such as name, address, business and e-mail address can be captured in the visitor attribute table.

With the identification of a visitor, it is now possible to associate visitor behaviour with an individual and integrate that behavioural information with other channels of information. The Warehouse should support multiple ways of using the behavioural information. It can serve as a source to other data warehouses or applications, or the Warehouse can be extended to incorporate other channels of data.

For instance, for companies implementing a CRM system in order to get a complete view of customers and prospects, the Warehouse can provide the web behaviour of those individuals in a way that complements information already in the CRM system.


A Wider Audience

The visit and visitor information provide comprehensive behavioural information about web site visitors. However, not all of a business’s customers or contacts will use the web site and it is often desirable to include all of the contacts when analysing behaviour, utilising information maintained in other corporate systems.

The audience tables support a much more detailed concept of an individual. The visitor table in the Warehouse only captures information about web site visitors and has no concept of an individual who is not a web site visitor. The audience tables, on the other hand, can be populated with a list of contacts from any source and serve as the integration point for all individual information from other corporate sources. Information from other sources would be maintained in the audience attribute tables, which are similar to the visitor attribute tables described previously, and in custom tables expressly designed for the contact data.

Visitor information is automatically linked to the audience table based upon a common set of attributes. These attributes are maintained in the audience and visitor attribute tables. Typical attributes that can be used to link a visitor with its audience table contact are e-mail address, customer ID number, telephone number or address. Frequently, a single visitor may actually be captured in the Warehouse as several different visitors. There are a number of situations where this can happen, but the audience table provides a point where they can be tied together

The audience tables are a global resource and can have multiple Warehouses tied to a single audience table. A company with multiple web sites could analyse the effectiveness of the various web sites across all of their web sites and in conjunction with their other enterprise information.


The Performance Advantages of the Warehouse

Many web site analysis vendors take the approach of loading all of the raw hit, or log file record, into a database, then analysing it using a variety of query tools. They quote very fast load times, but actually processing the data into sessions and creating meaningful reports is typically much slower and very resource intensive. A database is not the ideal tool to do all of the basic processing of the raw data, and each type of report can be very difficult to define and maintain as the site changes. A good example of this is all of the basic statistics about a visit such as start and end times, referrer, search phrase and browser type. The Webtrends Warehouse extracts all of this information as part of basic log file processing.

The Webtrends Warehouse does all the basic processing and analysis outside the database, and then loads it into the database. Along with all of the default information that is generated, it applies all of the user-defined definitions to capture and store real information about the visit and visitor. By capturing the information of an individual visit and visitor level, the next level of analysis starts at a much higher level. In addition, individual visitors and groups of visitors can be analysed to derive practical information without having to add to the schema.

Typical processing times for importing and analysing log files are about 1.5 to 4 GB of log file data per hour, depending on the complexity of the log file data. At the end of the import, however, the data has been broken into individual visits, attached to a visitor and given a full set of visit attributes (standard statistics and user-defined information such as content groups, qualification levels, path analysis, etc.) which are captured and stored into the Warehouse. It is now ready for integration into other warehouses or applications, or for integration of information from other systems into the Warehouse.


Summary

The focus of this paper is to describe the benefits of transforming raw clickstream data into reportable and actionable information about web customers and to capture that information into a robust data warehouse for further analysis using sophisticated reporting and analytics tools.

The Webtrends Warehousesystematise the import and transformation of log files and provide tools to integrate the information with other applications. Webtrends can then extend the view of the customer to include additional information from enterprise data sources like CRM, ERP and transactional systems to provide a more holistic view of customers across multiple touchpoints. The Warehouse provides comprehensive, rich web and cross-channel data in a scalable repository to power business intelligence tools. Webtrends provides added value to business intelligence solutions by providing critical, complex web behavioural information that would be otherwise difficult and expensive to build. The combined solution brings industry-leading web data warehousing and BI analytics together to provide powerful business insight throughout the enterprise.

The advantages of using the Warehouse as the source of web site behavioural data for deeper analysis using various query, reporting, OLAP or data mining techniques, or as the foundation for a multichannel warehouse, include:

  • Automated importation and analysis process
  • Simplified entry of business rules for interpreting clickstream behaviour
  • Scalability to very large sites
  • Schema designed to capture aggregate and individual behavioural information
  • Continued evolution to incorporate the latest changes in web site design techniques
  • Leverage of Webtrends’ experience as the leading vendor of web analytics
 
Copyright © 2012 Matraxis. All Rights Reserved. | Privacy Policy