Retail Chain Performance Analysis
Summary
This project involved architecting a comprehensive Business Intelligence (BI) system for a retail chain, enabling the analysis and optimization of store performance. Key components, including a data warehouse on Snowflake, an ELT program for transactional datasets, and dimensional data model schemas (STAR & SNOWFLAKE), were designed and implemented. The integration of an OLAP reporting system and a data visualization layer facilitated meaningful insights. By answering critical questions, such as assessing sales performance, recommending bonus amounts, and analyzing sales trends, the project empowered the retail chain to make informed decisions, maximize profits, and strategize for continued success.
Code: Github
Problem Statement
The task at hand is to assess the sales history of a retail company and provide recommendations to enhance its sales profitability based on this analysis. The company's primary goal is generating revenue through the manufacturing and sale of various products. To streamline operations, products are categorized into different groups and types for efficient inventory management, marketing, and overall organization. Each product is associated with a retail price (the price at which it's sold to customers), a wholesale price (a discounted rate for resellers), and a production cost. Sales occur through two main routes: direct sales to customers and sales to resellers who subsequently sell to their own clientele. These sales transactions take place through diverse channels, encompassing both physical retail stores and online platforms via the company's website.
For the sake of this project the scope is narrowed to the following:
-
Overall assessment of Stores number 5 and 8’s sales.
-
Bonus allotment for the years 2013 and 2014 if the total bonus pool for 2013 is $500,000 and the total bonus pool for 2014 is $400,000. Recommendation based on how well the stores are selling Product Types of Men’s Casual and Women’s Casual.
-
Difference in performance of all stores located in states that have more than one store to all stores that are the only store in the state
Data
The data includes the sale records all stores and online platforms foe the retail company for the year 2013 ad 2014.
We are working with the following csv files:
-
Channel- The channels of sale for the retail i.e. Boutiques, On-line, Outlet, Branded Franchise and Department Stores
-
ChannelCategory- Category of channel i.e. Direct or Indirect
-
Customer- Customer information like name, contact, location etc.
-
Product- Individual product information including its price, wholesale price and cost
-
ProductCategory- Category of product eg. accessories, women's/men's apparel etc.
-
ProductType- Type of product eg. women's/men's casuals, women's/men's formals etc.
-
Reseller- Information of their resellers with their location and contact
-
SalesDetail- Contains information of the product sold, qualtity, price and date of sell etc.
-
SalesHeader- Contains additional information about the sale including customer id, channel etc.
-
Store- Information about all their stores and its location
-
Target Data - Channel Reseller and Store- 2013 and 2014 target sales amount for the different channels and stores
-
Target Data - Product- 2013 and 2014 target sales quantity of different products being sold.
Find more about this can be under the data folder on GitHub
Dimensional Data Model and Load
In this section we look at creating a dimensional data model from the normalized data model that we had.
Dimensional data models denormalize data by combining related tables into a star or snowflake schema which simplifies query processing and improves query performance significantly. They are optimized for analytics and reporting by providing a straightforward structure with fact tables at the center, surrounded by dimension tables, making it easier for end-users to create meaningful reports and analyses without needing to understand complex database schemas.
Dimensional models are designed to align with business concepts and user requirements and are more flexible and adaptable to changing business requirements.
For the given data we designed the following schema-

Key Components:
-
Fact tables: They are at the center of a dimensional data model. They contain quantitative data or measures. In the above we have 3 fact tables one of which is the actual sales made and the remaining two are targets.
-
Dimension tables: They provide context and descriptive attributes for the data in the fact tables. These attributes are used for filtering, grouping, and categorizing data. In the one above we have 7 of these for our analysis
After creating the schema we used that as reference to stage and load the facts and the dimension tables to Snowflake and create the views required to answer the questions of interest.
Find the queries in the following file:
In the coming section the analysis will be focused on Store 5 and 8.
Analysis
1. Overall performance of Store 5 & 8
Performance:
In 2013 and 2014, Store 5 achieved Actual Sales that surpassed its Target Sales, while Store 8 fell short of its Target Sales in 2013 but exceeded them in 2014. Consequently, Store 5 outperformed Store 8 in terms of both meeting and surpassing the sales targets for both years, indicating a stronger performance in achieving the sales goals.

The graph provided a projection of Store 5 and Store 8's actual sales for November and December. Based on this earlier visualization, we can deduce that both stores have already achieved their sales targets for 2014, and the anticipated actual sales for November and December are expected to further boost the total sales for both stores.

Maximizing Profit:
Following our in-depth analysis, it is evident that the women's apparel category stands out as the most lucrative segment within our product offerings. Given the insights derived from this analysis, it is recommended that stores consider reallocating and augmenting their investment budgets to focus on categories that exhibit the highest profitability. An illustrative example of such a category within women's apparel is dresses and strapless dresses, which have demonstrated notable profitability. By channeling resources into these high-return areas, stores can potentially enhance their overall financial performance and capitalize on the strengths identified through this analysis.

2. Bonus Allocations
Considering the bonus pool for 2013 is $500,000 and the total bonus pool for 2014 is $400,000 and the allocation to be made by how well the stores are selling Product Types of Men’s Casual and Women’s Casual, I would recommend making the bonus allocation as the following. The net amount to be divided for 2013 will remain the same as the target quantity was met.
For 2014 the sales for the product types of interest have not been met yet till Oct. So only a percentage of the total bonus pool will be allocated currently and the remaining percentage will be looked into at the end of the year while considering the target for the type.

Moving ahead, we can divide the bonus amounts by state-wise percentage of product sale. For instance in 2013, Arkansas contributed 31.73% to the sales quantity, Georgia contributed 18.69% and so on. Store 5 is the only store located in Georgia and hence 18.69% of the bonus for 2013 would go to Store 5. But Arkansas has 2 stores (8 and 21) and thus the bonus allocated to the state would be divided by each stores contribution to the year's sales, as shown below.

Following the same strategy, the bonuses for 2013 and 2014 (till Oct) will be divide as follows-

3. Comparing Performance of States with One Store with Multiple Stores
In the given data, we know that Arkansas(8 & 21) and Missouri(10 & 34) have 2 stores each and Mississippi and Georgia have one store each being 39 and 5 respectively. As a part of this analysis we want to check if having more stores in a state impact performance for the retail chain.
Now as seen below having multiple stores does contribute to overall sales volume and higher cumulative profits, which is expected.


But when we look at average sales for each stores, the stores located in the States where they are the only store arre performing better. Same can be seen for total sales.
Moreover, both groups have stores that met their yearly sales target and stores that did not. For instance Missouri as a state met102.85% of their cumulative sales target but Arkansas only met 86.60%. Similarly, Mississippi met only 98.60% of its sales target but Georgia met 102.87%. This indicates that having more stores in a state does not guarantee consistent positive performance across all the stores.
Meeting sales targets poses a formidable challenge for stores, irrespective of whether they operate in states with numerous branches or those with just a single store. Success in this endeavor hinges on the implementation of effective sales strategies, active customer engagement, and the optimization of operational efficiency.

Considering all, concrete conclusions regarding the superiority of single-store versus multi-store setups within a state are challenging to establish definitively. This complexity arises from the fact that numerous variables come into play, each exerting its own unique influence on store performance.
Firstly, the location of a store within a given state can have a substantial impact. Factors like population density, local economic conditions, and proximity to potential customers all contribute to the overall success of a store.
Secondly, the level of competition in the area can be a significant determinant. High competition can drive stores to innovate and adapt to stand out, whereas lower competition might provide a more stable environment.
Additionally, the demographic composition of the customer base plays a crucial role. Understanding the preferences and purchasing behaviors of the local population is vital for tailoring products and marketing strategies effectively.
Lastly, the management of each store, including its operational efficiency, staff performance, and customer service, is another pivotal factor that can sway the performance in either direction.
Considering these intricate dynamics, it becomes apparent that a more comprehensive analysis, involving a broader dataset and a deeper exploration of these factors, is essential to draw more definitive conclusions regarding the impact of having multiple stores within a state. Such an analysis would provide a more nuanced understanding of the variables at play and facilitate more informed strategic decisions for businesses.