Solving Sudoku with Computer Vision
I decided today that it was time to get some mental exercise, and that I wanted to create a program that could take a picture of a Sudoku puzzle, and then solve it. A quick Google search will reveal that this idea is pretty far from being novel. I'm not the first person to do this, and I surely will not be the last. Nonetheless, I think working through the problem was an enjoyable experience, and a nice way to stretch my problem-solving skills. So let's just hop into how I did it.
First, I wanted to preprocess the image. The first step in reading the digits from the board is to actually find where the board is in the image. There are tools we have at our disposal for doing that, but none of them will perform well on an image like this. Instead, we should binarize the image, and clean it up as best we can.
#-- Preprocess the image so that it is a clean black and white image
#-- that the puzzle can be extracted from
def preprocessImage(img, visualize=False):
#-- Cnvert the image to grayscale, blur it, and apply adaptive thresholding
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blur = cv2.medianBlur(gray, 3)
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 15, 15)
#-- Invert the image and apply morphological transformations to fill in small gaps in the lines
inverted = cv2.bitwise_not(thresh)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
closed = cv2.dilate(inverted, kernel, iterations=1)
if visualize: cv2.imshow("Preprocessed", closed)
return closed
This process will get as a black and white image that looks like the following:
Okay, good start. Next, what we want to do is apply OpenCV's findContours() function. But we don't just want to rely on the contours that we get -- we can do better than that. So, what we'll do is rely on the approxPolyDP() function to simplify the contours. Then, we will sort the contours by size, and pick the biggest contour that has exactly four vertices. Consider what this approach is doing: we are seeking a rectangle in the greater image. Moreover, we are seeking the largest rectangle that we can find, because that's likely to be the puzzle within the image. I wrote the following code to do this:
#-- Find the warped rectangles in the image, and then return the largest
#-- of these rectangles, assuming it meets a minimum size requirement
def isolatePuzzle(image, visualize=False):
CONTOUR_MINIMUM_SIZE = 10000
color = cv2.cvtColor(image, cv2.COLOR_GRAY2BGR)
blurred = cv2.GaussianBlur(image, (5,5), 0) #-- Blurring the puzzle can help with warped "wavy" edges
#-- Apply canny edge detection, and then find the contours in the image
edges = cv2.Canny(blurred, 50, 150)
contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = sorted(contours, key=cv2.contourArea, reverse=True)
for contour in contours:
#-- Simplify the contour while retaining its shape
epsilon = 0.02 * cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, epsilon, True)
area = cv2.contourArea(contour)
#-- Disregard small contours, or contours without 4 vertices
if len(approx) == 4 and area >= CONTOUR_MINIMUM_SIZE:
if visualize:
cv2.drawContours(color, [approx], -1, (0, 0, 255), 2)
cv2.imshow("Isolated Puzzle", color)
return approx
return None
This works well, as you can see:
We have detected the puzzle in the image! The next step is to warp it such that we get a new image containing only this puzzle. Moreover, the puzzle should be fit into the shape of a rectangle. We can use OpenCV's warpPerspective function to do this, with only a little bit of data-wrangling.
#-- Take an image, and the contour defining where the puzzle is located on the page, and warp
#-- that portion of the image such that the puzzle is in a straight-on view, in its own mat
def warpPuzzle(image, contour):
# Convert the approximated contour to a 4-point format, in the right order
rect = np.array(contour).reshape(4, 2)
rect = sortRectanglePoints(rect)
# Compute the width and height of the new image
(tl, tr, br, bl) = rect
widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
maxWidth = max(int(widthA), int(widthB))
heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
maxHeight = max(int(heightA), int(heightB))
# Destination points for the warp
dst = np.array([
[0, 0],
[maxWidth - 1, 0],
[maxWidth - 1, maxHeight - 1],
[0, maxHeight - 1]], dtype="float32")
#-- Apply transformation and return
transform_matrix = cv2.getPerspectiveTransform(rect, dst)
warped = cv2.warpPerspective(image, transform_matrix, (maxWidth, maxHeight))
return warped
#-- Helper function for warpPuzzle function
#-- This makes sure the four coordinates of the puzzle's bounding rectangle
#-- are in the right order.
def sortRectanglePoints(pts):
rect = np.zeros((4, 2), dtype="float32")
s = pts.sum(axis=1)
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
diff = np.diff(pts, axis=1)
rect[1] = pts[np.argmin(diff)]
rect[3] = pts[np.argmax(diff)]
return rect
This gives us a nice, front-facing puzzle mat (although we are still using the pre-processed version of it because that will come in handy in the next step):
Let's change gears, now. We have extracted and pre-processed the puzzle. Now we need to get the digits. This is actually pretty straight-forward because... well, because we can rely on a pre-trained model for these purposes. I chose EasyOCR. The code to extract text using EasyOCR is dead simple, just two lines. Far more code is spent defining how and where to store them.
Moving into the next step, we need to actually write an algorithm to solve the Sudoku puzzles. That means we need a way to store the puzzles as data. Because a Sudoku puzzle has 81 boxes, it makes sense to store it as an array of length 81. With that in mind, the following function detects numbers in the image, and places them in the correct spots of an array. All blank values are represented by zeroes.
#-- Function takes an image (in the format produced by warpPuzzle()), and returns an array of
#-- length 81 representing the Sudoku board that was read in from the image
def detectNumbers(img, verbose=False):
# Enlarge the image slightly for better detection
copied = img.copy()
height, width = img.shape[:2]
copied = cv2.resize(img, (int(1.5*width) ,int(1.5*height)), interpolation=cv2.INTER_LINEAR)
# Divide the image into 81 cells
buckets = [0] * 81
height, width = copied.shape[:2]
cell_width = width // 9
cell_height = height // 9
# Make a temporary image file and read it with EasyOCR
cv2.imwrite('temp_image.png', copied)
reader = easyocr.Reader(['en'])
result = reader.readtext('temp_image.png', allowlist='0123456789')
for each in result:
# Get the center coordinates of the detected text box, and use it to find the row/column of the digit
center_x = int((each[0][0][0] + each[0][3][0]) / 2)
center_y = int((each[0][0][1] + each[0][3][1]) / 2)
col = center_x // cell_width
row = center_y // cell_height
# Store the detected value in the appropriate bucket
bucket_index = row * 9 + col
value = each[1]
buckets[bucket_index] = value
if verbose: print(f"Detected value '{value}' at bucket {bucket_index} (row {row}, col {col})")
os.remove('temp_image.png')
return buckets
We are now almost all of the way home. Solving a Sudoku is actually not a difficult task. It's little more than running a depth-first search of a tree. We just need a function that validates whether a given Sudoku board is valid or invalid. So with that, here are the two final pieces of important code:
#-- Takes a 1-Dimensional array of numbers representing a Sudoku puzzle,
#-- going rows first then columns, with 0s representing blank spots, and
#-- solves it. If it cannot be solved, returns the original array
def solveSudoku(puzzle_arr):
def solve():
for i in range(9):
for j in range(9):
if puzzle_arr[i * 9 + j] == 0:
for num in range(1, 10):
if isValidSudoku(puzzle_arr, i, j, num):
puzzle_arr[i * 9 + j] = num
if solve(): return True
puzzle_arr[i * 9 + j] = 0
return False # No valid solutions
return True
done = solve()
if done: return puzzle_arr
#-- Helper function returns True if puzzle is valid, or False if it is invalid
def isValidSudoku(board, row, col, num):
# Check the row
for i in range(9):
if board[row * 9 + i] == num:
return False
# Check the column
for i in range(9):
if board[i * 9 + col] == num:
return False
# Check the 3x3 subgrid
start_row = (row // 3) * 3
start_col = (col // 3) * 3
for i in range(3):
for j in range(3):
if board[(start_row + i) * 9 + (start_col + j)] == num:
return False
return True
And that's how it works! I wrote a quick function to format and print the solution. If you were curious, the puzzle we've been looking at has the following solution:
If you'd like to take a look at any of the code, I have the whole thing uploaded to GitHub here.
Comments
Post a Comment