From Access to SQL Server at Scale: How We Automated 130+ DB Migrations with ANTLR

Tushar,Sat Aug 09 2025•java database

When our organization decided to retire Microsoft Access from our technology stack, we faced a migration challenge of scale: 130+ Access databases, each containing 50–100 queries.

These queries weren’t trivial — they used Access-specific SQL dialect features like IIF, NZ, and DateAdd, alongside joins and stored procedures.

Why Not Manual or Existing Tools?

Manual conversion = thousands of developer hours.
Existing migration tools (like SSMS’s upsizing wizard) failed on complex Access syntax.
We needed fully automated conversion, not a semi-automated solution that still required manual rework.

We chose ANTLR + Java to build a custom, extensible query translator.

Why ANTLR?

ANTLR (Another Tool for Language Recognition) is a parser generator that turns a formal grammar into Java (or other language) code capable of:

Lexical Analysis – Splitting the input into tokens.
Parsing – Building a parse tree from those tokens.
Tree Walking – Using Listeners/Visitors to traverse the tree and perform transformations.

We considered regex and JavaCC, but:

Regex fails on nested structures and context-sensitive syntax.
JavaCC is less modern and lacks the tooling ecosystem.
ANTLR offered mature tooling (IntelliJ plugin, parse tree visualizer), extensibility, and runtime performance.

We used the latest ANTLR release (2024), writing an Access SQL grammar from scratch.

How ANTLR Works in Our Migration

1. Grammar Definition

We wrote .g4 grammar files defining:

Lexer rules – Regex-like patterns for tokens (SELECT, FROM, WHERE, identifiers, literals, operators).
Parser rules – Hierarchical structure of SQL (selectStatement, fromClause, functionCall).

Example (simplified):

// AccessSql.g4 (simplified)
lexer grammar AccessSqlLexer;

SELECT : 'SELECT';
FROM   : 'FROM';
IIF    : 'IIF';
IDENTIFIER : [a-zA-Z_][a-zA-Z0-9_]*;
NUMBER : [0-9]+;
WS     : [ \t\r\n]+ -> skip;

parser grammar AccessSqlParser;

options { tokenVocab=AccessSqlLexer; }

selectStatement
    : SELECT selectElements FROM tableSource (WHERE searchCondition)?
    ;

selectElements
    : '*'
    | IDENTIFIER (',' IDENTIFIER)*
    ;

tableSource
    : IDENTIFIER
    ;

Notes:

The actual grammar used was far larger and covered many Access-specific constructs (function calls, nested selects, JOIN variants, parentheses, aliases, and expression precedence).

We iteratively added rules as we encountered new patterns in the field.

2. ANTLR Code Generation

We used the ANTLR tool to generate Java classes:

# Example command used to generate Java parser/listener code
antlr4 AccessSql.g4 -package com.myorg.migration.parser -listener -visitor

This produced:

AccessSqlLexer.java – tokenization.
AccessSqlParser.java – parse tree building.
AccessSqlBaseListener.java & AccessSqlListener.java – listener hooks for tree traversal.
Visitor variants when -visitor is supplied (we primarily used listeners for event-based transformations).

3. Transformation via Listeners

ANTLR's listeners provide callbacks like enterXxx and exitXxx for parse tree nodes.

We implemented a transformation layer in Java that:

Walked the parse tree using the generated listener.
Built a transformed SQL string (or an intermediate representation) by appending or substituting tokens.
Applied dialect-specific mappings (functions, types, date arithmetic).
Collected unparseable or ambiguous queries for manual review.

Example listener snippet (simplified):

public class AccessToSqlServerListener extends AccessSqlBaseListener {
    private StringBuilder output = new StringBuilder();
 
    @Override
    public void enterFunctionCall(AccessSqlParser.FunctionCallContext ctx) {
        String funcName = ctx.getChild(0).getText().toUpperCase();
        if ("IIF".equals(funcName)) {
            // ctx.arguments() is illustrative — actual parse tree navigation uses context methods
            String arg1 = ctx.args().get(0).getText();
            String arg2 = ctx.args().get(1).getText();
            String arg3 = ctx.args().get(2).getText();
            output.append("(CASE WHEN ").append(arg1)
                  .append(" THEN ").append(arg2)
                  .append(" ELSE ").append(arg3)
                  .append(" END)");
        } else {
            // default: copy function as-is or map other known functions
            output.append(ctx.getText());
        }
    }
 
    public String getTransformedSql() {
        return output.toString();
    }
}

Function & Type Mappings (examples):

IIF(cond, t, f) → CASE WHEN cond THEN t ELSE f END
NZ(expr, default) → ISNULL(expr, default) (or COALESCE(expr, default))
DateAdd(interval, num, date) → DATEADD(interval, num, date)
Access TEXT/MEMO → SQL Server VARCHAR/TEXT (or NVARCHAR based on collations)

Reserved keywords & identifiers:

We detected collisions with SQL Server reserved keywords and applied quoting ([identifier]) or renamed columns via a mapping table when schema changes required it.

4. Pipeline Overview

End-to-end processing (conceptual):

[Access SQL Query]
      |
   Lexer
      |
   Parser
      |
  Parse Tree
      |
 Listener/Visitor (Transformations)
      |
[SQL Server Compatible Query]

Implementation outline (pseudocode):

for each database in dbList parallel:
    queries = extractQueries(database);
    for each query in queries:
        CharStream input = CharStreams.fromString(query);
        AccessSqlLexer lexer = new AccessSqlLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        AccessSqlParser parser = new AccessSqlParser(tokens);
        ParseTree tree = parser.selectStatement(); // or general statement rule
        ParseTreeWalker walker = new ParseTreeWalker();
        AccessToSqlServerListener listener = new AccessToSqlServerListener(mappingConfig);
        walker.walk(listener, tree);
        String transformed = listener.getTransformedSql();
        writeOutput(transformed);

Bulk Processing

Implemented as a Java CLI tool.
Each database processed in its own thread/process to maximize throughput and reduce per-database overhead.
Conversion logs captured:
- Successfully transformed queries.
- Queries needing manual inspection (errors, ambiguous transforms).
- Metrics for accuracy and time.

Testing & Validation

To ensure correctness we built an automated validation framework:

Execute original Access query against Access DB (using ODBC/JDBC).
Execute transformed query against SQL Server.
Compare result sets:
- Row counts
- Column-wise value equivalence with type-aware comparisons
- Tolerances for floating point/date formatting where applicable

Mismatch cases were recorded and used to:

Update grammar rules.
Improve transformation logic.
Add special-case handlers.

This iterative testing loop increased automated accuracy to 96%+.

Challenges

Incremental grammar growth — we began with a core grammar and expanded as new syntactic variants were discovered.
Edge function translations — constructs like Switch(...), nested IIF, or vendor-specific pseudo-functions needed careful AST handling.
Ambiguous expressions — when multiple interpretation routes existed, we flagged for manual review instead of guessing.
Schema mapping — required a separate mapping configuration (CSV/DB) linking old table/column names to the new SQL Server schema.

Performance & Outcome

96% automated conversion accuracy (improved over time with added grammar rules and mappings).
8,000+ developer hours saved versus manual rewriting.
Fully migrated 130+ databases with only a small percentage of queries requiring manual tweaks.

Lessons Learned

Start with a minimal, well-tested grammar and iterate frequently.
Automate validation — side-by-side execution and result comparison is essential.
Parallelize at database granularity for best performance versus complexity tradeoffs.
Keep SME collaboration ongoing for dialect edge cases and mapping decisions.
Log everything — errors, ambiguous transforms, and metrics to guide further grammar development.

Final Takeaway

ANTLR proved to be a powerful and pragmatic choice for large-scale SQL dialect translation. The combination of:

a custom Access SQL grammar,
ANTLR’s lexer/parser/listener architecture,
a parallelized Java CLI pipeline, and
an automated validation loop

enabled us to automate a migration that would otherwise have required months of manual effort.