From Access to SQL Server at Scale: How We Automated 130+ DB Migrations with ANTLR
When our organization decided to retire Microsoft Access from our technology stack, we faced a migration challenge of scale: 130+ Access databases, each containing 50–100 queries.
These queries weren’t trivial — they used Access-specific SQL dialect features like IIF
, NZ
, and DateAdd
, alongside joins and stored procedures.
Table of Contents
- Why Not Manual or Existing Tools?
- Why ANTLR?
- How ANTLR Works in Our Migration
- Bulk Processing
- Testing & Validation
- Challenges
- Performance & Outcome
- Lessons Learned
- Final Takeaway
Why Not Manual or Existing Tools?
- Manual conversion = thousands of developer hours.
- Existing migration tools (like SSMS’s upsizing wizard) failed on complex Access syntax.
- We needed fully automated conversion, not a semi-automated solution that still required manual rework.
We chose ANTLR + Java to build a custom, extensible query translator.
Why ANTLR?
ANTLR (Another Tool for Language Recognition) is a parser generator that turns a formal grammar into Java (or other language) code capable of:
- Lexical Analysis – Splitting the input into tokens.
- Parsing – Building a parse tree from those tokens.
- Tree Walking – Using Listeners/Visitors to traverse the tree and perform transformations.
We considered regex and JavaCC, but:
- Regex fails on nested structures and context-sensitive syntax.
- JavaCC is less modern and lacks the tooling ecosystem.
- ANTLR offered mature tooling (IntelliJ plugin, parse tree visualizer), extensibility, and runtime performance.
We used the latest ANTLR release (2024), writing an Access SQL grammar from scratch.
How ANTLR Works in Our Migration
1. Grammar Definition
We wrote .g4
grammar files defining:
- Lexer rules – Regex-like patterns for tokens (
SELECT
,FROM
,WHERE
, identifiers, literals, operators). - Parser rules – Hierarchical structure of SQL (
selectStatement
,fromClause
,functionCall
).
Example (simplified):
// AccessSql.g4 (simplified)
lexer grammar AccessSqlLexer;
SELECT : 'SELECT';
FROM : 'FROM';
IIF : 'IIF';
IDENTIFIER : [a-zA-Z_][a-zA-Z0-9_]*;
NUMBER : [0-9]+;
WS : [ \t\r\n]+ -> skip;
parser grammar AccessSqlParser;
options { tokenVocab=AccessSqlLexer; }
selectStatement
: SELECT selectElements FROM tableSource (WHERE searchCondition)?
;
selectElements
: '*'
| IDENTIFIER (',' IDENTIFIER)*
;
tableSource
: IDENTIFIER
;
Notes:
- The actual grammar used was far larger and covered many Access-specific constructs (function calls, nested selects, JOIN variants, parentheses, aliases, and expression precedence).
- We iteratively added rules as we encountered new patterns in the field.
2. ANTLR Code Generation
We used the ANTLR tool to generate Java classes:
# Example command used to generate Java parser/listener code
antlr4 AccessSql.g4 -package com.myorg.migration.parser -listener -visitor
This produced:
AccessSqlLexer.java
– tokenization.AccessSqlParser.java
– parse tree building.AccessSqlBaseListener.java
&AccessSqlListener.java
– listener hooks for tree traversal.- Visitor variants when
-visitor
is supplied (we primarily used listeners for event-based transformations).
3. Transformation via Listeners
ANTLR's listeners provide callbacks like enterXxx
and exitXxx
for parse tree nodes.
We implemented a transformation layer in Java that:
- Walked the parse tree using the generated listener.
- Built a transformed SQL string (or an intermediate representation) by appending or substituting tokens.
- Applied dialect-specific mappings (functions, types, date arithmetic).
- Collected unparseable or ambiguous queries for manual review.
Example listener snippet (simplified):
public class AccessToSqlServerListener extends AccessSqlBaseListener {
private StringBuilder output = new StringBuilder();
@Override
public void enterFunctionCall(AccessSqlParser.FunctionCallContext ctx) {
String funcName = ctx.getChild(0).getText().toUpperCase();
if ("IIF".equals(funcName)) {
// ctx.arguments() is illustrative — actual parse tree navigation uses context methods
String arg1 = ctx.args().get(0).getText();
String arg2 = ctx.args().get(1).getText();
String arg3 = ctx.args().get(2).getText();
output.append("(CASE WHEN ").append(arg1)
.append(" THEN ").append(arg2)
.append(" ELSE ").append(arg3)
.append(" END)");
} else {
// default: copy function as-is or map other known functions
output.append(ctx.getText());
}
}
public String getTransformedSql() {
return output.toString();
}
}
Function & Type Mappings (examples):
IIF(cond, t, f)
→CASE WHEN cond THEN t ELSE f END
NZ(expr, default)
→ISNULL(expr, default)
(orCOALESCE(expr, default)
)DateAdd(interval, num, date)
→DATEADD(interval, num, date)
- Access
TEXT
/MEMO
→ SQL ServerVARCHAR
/TEXT
(orNVARCHAR
based on collations)
Reserved keywords & identifiers:
- We detected collisions with SQL Server reserved keywords and applied quoting (
[identifier]
) or renamed columns via a mapping table when schema changes required it.
4. Pipeline Overview
End-to-end processing (conceptual):
[Access SQL Query]
|
Lexer
|
Parser
|
Parse Tree
|
Listener/Visitor (Transformations)
|
[SQL Server Compatible Query]
Implementation outline (pseudocode):
for each database in dbList parallel:
queries = extractQueries(database);
for each query in queries:
CharStream input = CharStreams.fromString(query);
AccessSqlLexer lexer = new AccessSqlLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
AccessSqlParser parser = new AccessSqlParser(tokens);
ParseTree tree = parser.selectStatement(); // or general statement rule
ParseTreeWalker walker = new ParseTreeWalker();
AccessToSqlServerListener listener = new AccessToSqlServerListener(mappingConfig);
walker.walk(listener, tree);
String transformed = listener.getTransformedSql();
writeOutput(transformed);
Bulk Processing
- Implemented as a Java CLI tool.
- Each database processed in its own thread/process to maximize throughput and reduce per-database overhead.
- Conversion logs captured:
- Successfully transformed queries.
- Queries needing manual inspection (errors, ambiguous transforms).
- Metrics for accuracy and time.
Testing & Validation
To ensure correctness we built an automated validation framework:
- Execute original Access query against Access DB (using ODBC/JDBC).
- Execute transformed query against SQL Server.
- Compare result sets:
- Row counts
- Column-wise value equivalence with type-aware comparisons
- Tolerances for floating point/date formatting where applicable
Mismatch cases were recorded and used to:
- Update grammar rules.
- Improve transformation logic.
- Add special-case handlers.
This iterative testing loop increased automated accuracy to 96%+.
Challenges
- Incremental grammar growth — we began with a core grammar and expanded as new syntactic variants were discovered.
- Edge function translations — constructs like
Switch(...)
, nestedIIF
, or vendor-specific pseudo-functions needed careful AST handling. - Ambiguous expressions — when multiple interpretation routes existed, we flagged for manual review instead of guessing.
- Schema mapping — required a separate mapping configuration (CSV/DB) linking old table/column names to the new SQL Server schema.
Performance & Outcome
- 96% automated conversion accuracy (improved over time with added grammar rules and mappings).
- 8,000+ developer hours saved versus manual rewriting.
- Fully migrated 130+ databases with only a small percentage of queries requiring manual tweaks.
Lessons Learned
- Start with a minimal, well-tested grammar and iterate frequently.
- Automate validation — side-by-side execution and result comparison is essential.
- Parallelize at database granularity for best performance versus complexity tradeoffs.
- Keep SME collaboration ongoing for dialect edge cases and mapping decisions.
- Log everything — errors, ambiguous transforms, and metrics to guide further grammar development.
Final Takeaway
ANTLR proved to be a powerful and pragmatic choice for large-scale SQL dialect translation. The combination of:
- a custom Access SQL grammar,
- ANTLR’s lexer/parser/listener architecture,
- a parallelized Java CLI pipeline, and
- an automated validation loop
enabled us to automate a migration that would otherwise have required months of manual effort.