From Access to SQL Server at Scale: How We Automated 130+ DB Migrations with ANTLR

Tushar,javadatabase

When our organization decided to retire Microsoft Access from our technology stack, we faced a migration challenge of scale: 130+ Access databases, each containing 50–100 queries.

These queries weren’t trivial — they used Access-specific SQL dialect features like IIF, NZ, and DateAdd, alongside joins and stored procedures.


Table of Contents


Why Not Manual or Existing Tools?

We chose ANTLR + Java to build a custom, extensible query translator.


Why ANTLR?

ANTLR (Another Tool for Language Recognition) is a parser generator that turns a formal grammar into Java (or other language) code capable of:

We considered regex and JavaCC, but:

We used the latest ANTLR release (2024), writing an Access SQL grammar from scratch.


How ANTLR Works in Our Migration

1. Grammar Definition

We wrote .g4 grammar files defining:

Example (simplified):

// AccessSql.g4 (simplified)
lexer grammar AccessSqlLexer;

SELECT : 'SELECT';
FROM   : 'FROM';
IIF    : 'IIF';
IDENTIFIER : [a-zA-Z_][a-zA-Z0-9_]*;
NUMBER : [0-9]+;
WS     : [ \t\r\n]+ -> skip;

parser grammar AccessSqlParser;

options { tokenVocab=AccessSqlLexer; }

selectStatement
    : SELECT selectElements FROM tableSource (WHERE searchCondition)?
    ;

selectElements
    : '*'
    | IDENTIFIER (',' IDENTIFIER)*
    ;

tableSource
    : IDENTIFIER
    ;

Notes:

  • The actual grammar used was far larger and covered many Access-specific constructs (function calls, nested selects, JOIN variants, parentheses, aliases, and expression precedence).
  • We iteratively added rules as we encountered new patterns in the field.

2. ANTLR Code Generation

We used the ANTLR tool to generate Java classes:

# Example command used to generate Java parser/listener code
antlr4 AccessSql.g4 -package com.myorg.migration.parser -listener -visitor

This produced:


3. Transformation via Listeners

ANTLR's listeners provide callbacks like enterXxx and exitXxx for parse tree nodes.

We implemented a transformation layer in Java that:

  1. Walked the parse tree using the generated listener.
  2. Built a transformed SQL string (or an intermediate representation) by appending or substituting tokens.
  3. Applied dialect-specific mappings (functions, types, date arithmetic).
  4. Collected unparseable or ambiguous queries for manual review.

Example listener snippet (simplified):

public class AccessToSqlServerListener extends AccessSqlBaseListener {
    private StringBuilder output = new StringBuilder();
 
    @Override
    public void enterFunctionCall(AccessSqlParser.FunctionCallContext ctx) {
        String funcName = ctx.getChild(0).getText().toUpperCase();
        if ("IIF".equals(funcName)) {
            // ctx.arguments() is illustrative — actual parse tree navigation uses context methods
            String arg1 = ctx.args().get(0).getText();
            String arg2 = ctx.args().get(1).getText();
            String arg3 = ctx.args().get(2).getText();
            output.append("(CASE WHEN ").append(arg1)
                  .append(" THEN ").append(arg2)
                  .append(" ELSE ").append(arg3)
                  .append(" END)");
        } else {
            // default: copy function as-is or map other known functions
            output.append(ctx.getText());
        }
    }
 
    public String getTransformedSql() {
        return output.toString();
    }
}

Function & Type Mappings (examples):

Reserved keywords & identifiers:


4. Pipeline Overview

End-to-end processing (conceptual):

[Access SQL Query]
      |
   Lexer
      |
   Parser
      |
  Parse Tree
      |
 Listener/Visitor (Transformations)
      |
[SQL Server Compatible Query]

Implementation outline (pseudocode):

for each database in dbList parallel:
    queries = extractQueries(database);
    for each query in queries:
        CharStream input = CharStreams.fromString(query);
        AccessSqlLexer lexer = new AccessSqlLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        AccessSqlParser parser = new AccessSqlParser(tokens);
        ParseTree tree = parser.selectStatement(); // or general statement rule
        ParseTreeWalker walker = new ParseTreeWalker();
        AccessToSqlServerListener listener = new AccessToSqlServerListener(mappingConfig);
        walker.walk(listener, tree);
        String transformed = listener.getTransformedSql();
        writeOutput(transformed);

Bulk Processing


Testing & Validation

To ensure correctness we built an automated validation framework:

  1. Execute original Access query against Access DB (using ODBC/JDBC).
  2. Execute transformed query against SQL Server.
  3. Compare result sets:
    • Row counts
    • Column-wise value equivalence with type-aware comparisons
    • Tolerances for floating point/date formatting where applicable

Mismatch cases were recorded and used to:

This iterative testing loop increased automated accuracy to 96%+.


Challenges


Performance & Outcome


Lessons Learned

  1. Start with a minimal, well-tested grammar and iterate frequently.
  2. Automate validation — side-by-side execution and result comparison is essential.
  3. Parallelize at database granularity for best performance versus complexity tradeoffs.
  4. Keep SME collaboration ongoing for dialect edge cases and mapping decisions.
  5. Log everything — errors, ambiguous transforms, and metrics to guide further grammar development.

Final Takeaway

ANTLR proved to be a powerful and pragmatic choice for large-scale SQL dialect translation. The combination of:

enabled us to automate a migration that would otherwise have required months of manual effort.