Treating Excel as a Source of Truth
Treating Excel as a Source of Truth
Context: Across multiple client engagements, we've observed that Excel files often contain more operational intelligence than formal databases. Here's our approach to working with, not against, spreadsheet-based operations.
The Pattern
Most businesses run on Excel. This isn't a failure of technology adoption - it's a success of tool-fit. Excel provides:
- Flexibility without programming
- Visual data manipulation
- Formula-based business logic
- Collaborative editing
- Version control (even if it's "v2_FINAL_FINAL")
The Approach
Instead of replacing Excel, we build systems that treat it as a first-class data source:
import pandas as pd
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
class ExcelWatcher(FileSystemEventHandler):
"""
Watch Excel files and extract patterns without disrupting workflow
"""
def on_modified(self, event):
if event.src_path.endswith('.xlsx'):
self.process_excel_changes(event.src_path)
def process_excel_changes(self, filepath):
# Read with all formatting preserved
wb = pd.ExcelFile(filepath)
# Extract not just data, but patterns
patterns = {
'edit_frequency': self.calculate_edit_patterns(filepath),
'formula_complexity': self.analyze_formulas(wb),
'data_flow': self.trace_cell_dependencies(wb)
}
return patterns
Key Insights
- Cell colors are documentation - They encode decisions, warnings, and tribal knowledge
- Formulas are business logic - Each formula represents a business rule that evolved over time
- File names tell stories - The progression from v1 to v_FINAL reveals process evolution
- Hidden columns hide history - What's hidden often matters more than what's visible
Implementation Patterns
Pattern 1: Shadow Database
Build a database that mirrors Excel state without replacing it:
def sync_excel_to_db(excel_path, db_connection):
"""
Maintain database shadow of Excel without disrupting Excel use
"""
df = pd.read_excel(excel_path, sheet_name=None)
for sheet_name, sheet_data in df.items():
# Preserve everything, including metadata
sheet_data.to_sql(
f'excel_{sheet_name}',
db_connection,
if_exists='replace',
index=True # Even row numbers matter
)
# Store metadata separately
store_excel_metadata(excel_path, db_connection)
Pattern 2: Formula Preservation
Extract and version control the actual business logic:
from openpyxl import load_workbook
def extract_formulas(excel_path):
"""
Formulas are business logic - preserve them
"""
wb = load_workbook(excel_path, data_only=False)
formulas = {}
for sheet in wb.worksheets:
for row in sheet.iter_rows():
for cell in row:
if cell.data_type == 'f': # Formula cell
formulas[f"{sheet.title}!{cell.coordinate}"] = {
'formula': cell.value,
'current_value': cell.value,
'dependencies': extract_dependencies(cell.value)
}
return formulas
The Result
By treating Excel as a legitimate source of truth rather than technical debt:
- Adoption is immediate (nothing changes for users)
- Business logic is preserved, not recreated
- Evolution happens naturally
- Trust is maintained
The Lesson
Sometimes the best system architecture includes Excel as a first-class citizen. The challenge isn't replacing spreadsheets - it's understanding why they persist and building systems that amplify their strengths while mitigating their weaknesses.
Next: How we approach queue management in operations that "don't have queues"