Business Intelligence Recap

Over the last two months, we have travelled the field of Business Intelligence. The journey started off by describing commonalities and differences among the concepts of Business Intelligence, Big Data, Analytics, and Data Science, and considering the paradigm shift that Big Data means for data analytics. Moving forward, we continued our journey to discuss data warehouses, dashboard design, web analytics, and network analysis in harnessing the power of business intelligence. This blog is the last one in my series on business Intelligence. It revisits the main themes in MIS 587 and ends with my final thoughts, including my reflection on best practices for data analysis in business intelligence. 

Data Warehouse Design

Module 1 included such diverse topics as performance management (the Balanced Scorecard was the technique of choice here), development of Key Performance Indicators, fundamentals of Online Analytical Processing and Online Transaction Processing, design of data warehouses and data marts, use of Dimensional Modeling for the implementation of star schemas, data profiling, data quality analysis, and visualization of business data via dashboard (Tableau) design. 



I found dimensional modeling a revealing technique for database design, owing to its formal and conceptual differences with the more classical relational and object-relational database models (specially, how normalization loses its importance for the sake of processing performance). Module III (more in this below) brought up yet another database model: graph databases. In this model, queries scan the database across n-mode connections that exist between database items (e.g. x employed-by α):




Web Analytics

In the next module, we continued delving into the analytical possibilities of dashboards and performance metrics. This time, for web analytics. Different kinds of web metrics on visitor acquisition, customer behavior, and business outcomes played a central role as Key Performance Indicators (see my blog for Module 2), which we learned to leverage for analyzing and optimizing websites. Regarding data visualization and implementation of dashboards, both Tableau (in the prior module) and Google Analytics (in this one) proved to be user-friendly solutions. 



As a practitioner of Geographic Information Systems, I see similarities between Google Analytics or Tableau's business models and the way that ESRI (the long-lasting GIS giant) is penetrating the analytics market by turning its flagship product, ArcGIS, to a software-as-service, pay-per-use, cloud-based business model. In the potential scenario that this model becomes pervasive, I am concerned of the "selective oversimplification" that such cloud-based, pay-per-use BI software could suffer in their analytics functionality. For example, the most interesting or most appropriate metrics and techniques could not be accessible to organizations that couldn't afford premium services, either because of the additional costs incurred or for latency and security challenges that would make online analysis unfeasible. A case in point this time, I missed advanced functionality for spatiotemporal analysis and modeling in both Tableau and Google Analytics. Without such an advanced spatiotemporal analytics, it will be difficult to reliably model cause-effect processes along the time series of web use and business activity. 

If the analytics market eventually gets inflated by an excessive supply of pay-per-use software solutions, I think that will lead to a reinvigorated expansion of the open source developer community (which is something I would applaud), at least among mid-size businesses. While large organizations may find operational advantages in transferring the burden of developing IT infrastructure to a specialist partner (say, partnering with Microsoft or Red Hat for database management), and small organizations might just have capacity to access low-cost analytics services, it is no longer uncommon for mid-size businesses to get around third-party dependence by having data scientists on payroll, working on the implementation of ad-hoc metrics and workflows that respond to the needs of the organization.

Network analysis

The last leg of this fascinating journey was an introduction to the use of network metrics and graph visualization for understanding complex systems of interaction. Creation of network visualizations and training on how to interpret network structural properties were enlightening activities in this module.

Large networks seem to face the same visualization challenges as sliver plots, owing to overlapping graph data. For visualization purposes, I would think of 3D modeling software like the one used for design and manufacturing as a way to facilitate dynamic navigation across networks. This would also add an extra dimension to the renderer, which would help to improve visualizations by separating the nodes and edges in a higher-dimensional space (thought of Support Vector machines now?). Tools for 3D modeling (e.g., ParaView) or visualization of protein models (e.g., Bioblender) can help for such visualizations, though network metrics and computation of node locations in a 3D space are operations that would still require a different software.



Advanced topics in network analytics include comparative analysis of metric and topologic properties of graphs. By comparing graphs on the basis of such metric and topological attributes (e.g., using connectivity matrices), networks can be clustered and classified. I wonder how molecular and graph mining have or can be used to that end. Similarity techniques such as the Maximum Common Subgraph (Wegner at al. 2006; Rahman et al. 2009) look like effective candidates for their adaptation to n-mode networks. 

Comparative graph analysis also enables the historical analysis of network dynamics. Measuring how a graph changes in a time series can serve to study questions such as evolving density or covariation of graph metrics over time. For example, looking at the evolution of bridges, eigenvector centrality, and authority in a network of clients can be leveraged to optimize campaign operations, by targeting the nodes with the best connectivity properties at the time the campaign is launched and by identifying the most beneficial timing to launch campaigns in a network.



Final thoughts

I will end this post (and my blog for MIS 587!) sharing my take on what I think could be good practices in data analytics for business intelligence:

  1. An organization should make sure that analytics roles (say, computer scientist, statistician, engineer, data scientist) are part of its workforce, whether it is internal or provided by a third party.
  2. Data, methods, and techniques should be fit for the organization's analytical and business purposes. Requirements engineering to identify needs, and planning for quality control at data, technical, and methodological levels should be capable of minimizing potential quality issues and proactively detecting them as they happen.  
  3. Going beyond descriptive statistics and data visualization, statistical inference, forecasting, and simulation provide strategic knowledge. However, don’t trust in black boxes! Benefits and disadvantages of using one or another technique and how the results were computed should be understood at both the technical and executive levels (at least, by Chief Information Officer and Chief Technology Officer roles) and should inform the organization's plans for quality assurance that were introduced in the prior point.
  4. Although adaptation of the technical jargon might be needed when transferring analytical information from the technical team up to the executive branch, that should not be made at the cost of over-simplifying the knowledge gained at the analysis stage or at the cost of sacrificing  the level of quality (i.e., fitness for purpose) that is required in the analytical workflow. Reductionist and/or technically poor analysis may fail to solidly inform decision-making regardless of the scale of a business, given the high levels of uncertainty, imprecision, or maybe just plain inaccuracies that analytical results might suffer from.
  5. Concurrent use of several techniques that are fit for your data and purpose should be beneficial in the path towards knowledge discovery. Different techniques read and use data differently -- e.g. global vs local techniques, or techniques based on central tendencies vs others based on non-stationarity. Let's not miss any perspective on the data!
  6. Related to the prior two points, though more focused on tools: a diverse range of software solutions and service maybe be preferable. I haven't yet found a one-size-fits-all tool, platform, or programming language that can properly take the full analysis of a complex system (business is one of them!). Software should be able to assist in navigating the data complexity, simplifying the data space without misspecification of the underlying process. Because vendors can't usually provide a large range of specialist tools, including technical personnel in the organization could provide workforce for adaptive, internally-controlled analytical development, as stated in my first point.
  7. Analysis of a business process should be as systemic and holistic as possible. We live in coupled human-nature systems, so failing to take this complexity into account might provide a partial vision of factors (not only socioeconomic, demographic or sociologic, but also environmental) that may impact business and help predict its dynamics. 
  8. Along the lines of the prior point, temporal and spatial variables are key in business intelligence. Because temporal and spatial data have properties that may fail to meet assumptions from classical statistical models, specialized analytical techniques are required. At the least, the analyst team should be able to address issues of spatial autocorrelation, spatial inhomogeneity, boundary effects, and the Modifiable Areal Unit Problem.


References

  • Wegner J. K.; Fröhlich H.; Mielenz H.; Zell A. (2006). "Data and Graph Mining in Chemical Space for ADME and Activity Data Sets". QSAR & Combinatorial Science. 25: 205–220. doi:10.1002/qsar.200510009. 

  • Rahman, S. A.; Bashton, M.; Holliday, G.L.; Schrader R.; Thornton, J. M. (2009). "Small Molecule Subgraph Detector (SMSD) toolkit". Journal of Cheminformatics. 1: 12. doi:10.1186/1758-2946-1-12


Comments

  1. Fernando, Thanks for sharing! Great job in summarizing the key topics, moreover your final thoughts.

    ReplyDelete

Post a Comment