Connect Four Ply Dataset

2025-04-22

Connect 4 is a solved game in the m,n,k family. On a 7 column, 6 row board, players alternately drop a token into a column, attempting to establish four tokens of the same color along any row, column, or diagonal. John Tromp created a labeled dataset of all valid eight ply positions with their perfect play win results. We present a dataset that presents all sequences of plies that lead to each position along with their associated win result class.

Download Link

csv.gz (~8m) sha1sum: 1e683b7af9496f171ad7f1bed67491708f7de55b

Background

Connect Four is a two-player, perfect knowledge game invented by Howard Wexler and sold by the Milton Bradley firm starting in 1974. As of 2025, the game is sold by Hasbro. The game has been solved both weakly (James D. Allen and Victor Allis, independently, in 1988) and strongly (John Tromp in 1995). Tromp donated his dataset of “all legal 8-ply positions in the game of connect-4 in which neither player has won yet, and in which the next move is not forced.”

A given board position may have multiple sequences of plies (individual moves) that result in that configuration. For example, the board:

6       o       
5       x       
4       o       
3       x       
2     o o       
1     x x       
  a b c d e f g

may be created with any of the four sequences:

x:c1 o:c2 x:d1 o:d2 x:d3 o:d4 x:d5 o:d6 
x:d1 o:d2 x:c1 o:c2 x:d3 o:d4 x:d5 o:d6 
x:d1 o:d2 x:d3 o:d4 x:c1 o:c2 x:d5 o:d6 
x:d1 o:d2 x:d3 o:d4 x:d5 o:d6 x:c1 o:c2

The constraint is that, since pieces fall to the lowest row, a piece underneath another piece must have been dropped earlier in the sequence. Thus, c1 must be before any c2, but c1 may be before or after d1. Within this dataset, x always indicates the first player and o the second player.

As another example, this board configuration features an o move that is not dependent on an x move:

6       o       
5       x       
4       o       
3       x       
2       o       
1   x   x   o   
  a b c d e f g

Within the list of sequences, the f1 move can fall into any of the o moves (2nd, 4th, 6th, or 8th):

x:b1 o:f1 x:d1 o:d2 x:d3 o:d4 x:d5 o:d6 
x:d1 o:f1 x:b1 o:d2 x:d3 o:d4 x:d5 o:d6 
x:d1 o:d2 x:b1 o:f1 x:d3 o:d4 x:d5 o:d6 
x:d1 o:d2 x:d3 o:f1 x:b1 o:d4 x:d5 o:d6 
x:d1 o:d2 x:d3 o:d4 x:b1 o:f1 x:d5 o:d6 
x:d1 o:d2 x:d3 o:d4 x:d5 o:f1 x:b1 o:d6 
x:d1 o:d2 x:d3 o:d4 x:d5 o:d6 x:b1 o:f1

Dataset Information

The dataset consists of 2,031,148 data rows and one header row. The columns are separated by commas (,) and the columns consist of:

row [1, 67557] one-indexed source row identifier for the Connect-4 dataset. Numbers do not include any separators.
ply1 [a-g][1-6] two characters identifying the column and row of the ply
ply2 same
ply3
ply4
ply5
ply6
ply7
ply8 same
x_result one of: win, loss, or draw, indicating the first player’s perfect play result

The board is configured with the lower-left position being a1 and the upper-right position being g6. Pictorially:

6
5
4
3
2
1
  a b c d e f g

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.

Methodology

To construct this dataset, we transform each row of Tromp’s eight ply position dataset into a series of ply sequences. We use a breadth-first search to build all possible sequences using the position board as a constraint. Pieces from the first player along the bottom are used as the initial plies. As pieces are played, we add additional possible plies that were dependent on previous plies.

We wrote the program in Rust. The key data structures for modeling the game are:

#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
pub enum Player {
    X,
    O
}

#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
pub struct Placement {
    // zero-indexed; printed as 0 => row 1 is at the bottom, row 6 is the top row
    row: u8,
    // piece is placed on top of this column, [0-6] corresponding to [a-g]
    column: u8
}

#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
pub struct Ply {
    player: Player,
    placement: Placement
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Board {
    // columns[0] is column 'a' while columns[6] is 'g', with plies extending from row 1..6
    columns: [Vec<Player>; 7]
}

Although memory consumption for each of these key structures could be reduced further (e.g. a Placement and Ply could be represented within a single u8), memory is not a constraint and convenience wins out. The Copy traits tell the compiler that these structures can be cheaply copied.

After building a Board from the Tromp dataset, we create a Constraints instance to record the potential first plies for each player that have no predecessor requirements and a map of the predecessor requirements for other plies.

struct Constraints {
    starts_x: Vec<Ply>,
    starts_o: Vec<Ply>,
    precedes: HashMap<Ply, Ply>
}

Idiomatically, the initial potential plies for the two players would be better represented as a slice than as a vector.

impl Constraints {
    fn from_board(board: Board) -> Constraints {
        let mut starts_x: Vec<Ply> = vec![];
        let mut starts_o: Vec<Ply> = vec![];
        let mut precedes: HashMap<Ply, Ply> = HashMap::new();

        for (col, column) in board.columns.iter().enumerate() {
            for (row, player) in column.iter().enumerate() {
                if row == 0 {
                    if *player == Player::X {
                        starts_x.push(Ply::new(Player::X, row, col))
                    } else if *player == Player::O {
                        starts_o.push(Ply::new(Player::O, row, col))
                    }
                } else {
                    let prev = board.at(row-1, col).unwrap();
                    precedes.insert(Ply::new(*prev, row-1, col), Ply::new(*player, row, col));
                }
            }
        }

        Constraints { starts_x, starts_o, precedes }
    }
}

We iteratively build up solutions one layer at a time. The PartialSolution struct stores the chosen plies so far and x_moves and o_moves tracks potential plies for each player.

struct PartialSolution {
    chosen: Vec<Ply>,
    x_moves: Vec<Ply>,
    o_moves: Vec<Ply>
}

PartialSolution features two methods to simplify state management. The moves method returns a slice of potential plies, calculating the current player based on the number of plies already taken. The next method creates a new PartialSolution instance based on applying a ply to the current state. The ply must be removed from the lists of available plies and plies freed up must be added to the potential list.

impl PartialSolution {

    fn moves(self: &PartialSolution) -> &[Ply] {
        if self.chosen.len() % 2 == 0 {
            self.x_moves.as_slice()
        } else {
            self.o_moves.as_slice()
        }
    }

    fn next(self: &PartialSolution, ply: Ply, precedes: &HashMap<Ply, Ply>) -> PartialSolution {
        let mut new_chosen = Vec::from_iter(self.chosen.iter().cloned());
        new_chosen.push(ply);

        let mut new_x_moves: Vec<Ply> = Vec::from_iter(self.x_moves.iter().cloned());
        let mut new_o_moves: Vec<Ply> = Vec::from_iter(self.o_moves.iter().cloned());
        match precedes.get(&ply) {
            Some(p) if p.player == Player::X => new_x_moves.push(*p),
            Some(p) => new_o_moves.push(*p),
            None => ()
        }
        if ply.player == Player::X {
            let pos = new_x_moves.iter().position(|p| *p == ply).unwrap();
            new_x_moves.remove(pos);
        } else {
            let pos = new_o_moves.iter().position(|p| *p == ply).unwrap();
            new_o_moves.remove(pos);
        }

        PartialSolution { chosen: new_chosen, x_moves: new_x_moves, o_moves: new_o_moves }
    }
}

The function all_solutions implements the breadth-first search and returns a vector of a sequence of plies (Vec<Ply>). For a given board, the number of possible sequences is fairly small (an average of 30 solutions and max of 360 for each board), so we are not obligated to conserve memory and use an iterator or depth-first search approach to reduce storage of intermediate data.

pub fn all_solutions(board: Board) -> Vec<Vec<Ply>> {
    let constraints = Box::new(Constraints::from_board(board));
    let base = Box::new(PartialSolution::new(constraints.starts_x.as_slice(), constraints.starts_o.as_slice()));
    let mut prev_partials: Vec<PartialSolution> = vec![];
    let mut curr_partials: Vec<PartialSolution> = vec![];

    for initial in constraints.starts_x.iter() {
        let partial = base.next(*initial, &constraints.precedes);
        curr_partials.push(partial);
    }

    for _ in 1..8 {
        swap(&mut prev_partials, &mut curr_partials);
        curr_partials.clear();

        for partial in prev_partials.iter() {
            for ply in partial.moves().iter() {
                let new_partial = partial.next(*ply, &constraints.precedes);
                curr_partials.push(new_partial);
            }
        }
    }

    let mut solutions: Vec<Vec<Ply>> = vec![];
    for partial in curr_partials.iter() {
        solutions.push(partial.chosen.clone());
    }
    solutions
}