Sudoku: Bagging a Difficulty Metric & Building Up Puzzles
|
|
|
- Coleen Cross
- 9 years ago
- Views:
Transcription
1 Sudoku: Bagging a Difficulty Metric & Building Up Puzzles February 18, 2008 Abstract Since its inception in 1979, Sudoku has experienced rapidly rising popularity and global interest. We present a difficulty rating metric and three puzzle generation algorithms for the popular Sudoku puzzle. Our metric focuses on the human difficulty of a puzzle, predicting the time an average Sudoku solver will require with R 2 = 0.71, rather than calculating an arbitrary computational metric. Furthermore, in predicting a continuous parameter, the metric is both extensible and versatile. It outperforms all other publicly available metrics we considered in predicting human difficulty of a large set of puzzles. We also present 3 puzzle generation algorithms. The first, a library based algorithm, provides puzzles in O(1) time. The second uses Latin Squares to construct a valid solution grid, and then builds up the puzzle leading to that solution from a blank grid. The third is a template-based generator that can be used to create puzzles with a desired initial configuration, mimicing popular human-constructed Sudoku puzzles. Each of these algorithms, particularly the first two, rapidly provides puzzles with a broad range of difficulties. Furthermore, each method is capable of generating a large number of new puzzles. Finally, we propose a rapid method of transforming a single valid puzzle into many. The set of valid Sudoku puzzles is closed under several transformations, including row permutation within a band, for example. Using these transformations, we can convert a single puzzle into over 1 trillion, further expanding the efficacy of our generators
2 Team 2280 Page 2 of 22 Contents 1 Introduction History Popularity Definition of Sudoku 4 3 Prior Work 5 4 Goals and Motivation Difficulty Metric Puzzle Generation Data 6 6 Our Difficulty Metric - Predicted Solve Time Preliminary Model and Complexity Analysis via Bootstrapping Meta-Model via Pseudo-Bagging Analysis of Meta-Model Practical Analysis of our metric Applicability Comparison to Other Difficulty Metrics Our Puzzle Generation Algorithms Puzzle Transformation Generator A: Library-based Generator Algorithm Applicability Complexity Sample Puzzles Generator B: Build-up from Solution Latin Squares for Solution Generation Intelligent Brute Force Solver Algorithm Complexity Sample Puzzles Generator C: Template Generation Algorithm Complexity Sample Puzzles Conclusion 21
3 Team 2280 Page 3 of 22 List of Figures 1 Sudoku Puzzle and Solution Box plots of Distribution of Difficulty Measures for Puzzles of 4 Difficulties. Average Solve Time better differentiates between puzzle difficulties Relationship between Actual Solve Time and Predicted Solve Time, in minutes. Solve time is our difficulty metric Bootstrapping Results: Performance of Model on Test Set Bootstrapping Results: Performance of Meta-Model on Test Set Valid 9x9 Sudoku Permutations Transformation of a Generated Puzzle Library Generator - Difficulty Distribution Library Generator - Sample Puzzles of 4 Different Difficulties Producing a Sudoku Solution from Latin Squares Flow Chart of Bottom Up Generator Build up Generator - Sample Puzzles of 4 Different Difficulties Build Up Generator - Difficulty Distribution Template Generator - Sample MCM Puzzles of 4 Different Difficulties Template Generator - Difficulty Distribution for MCM puzzles
4 Team 2280 Page 4 of 22 1 Introduction Since its inception in 1979, Sudoku has experienced rapidly rising popularity and global interest. Sudoku puzzles are simple to understand, simple to try, though often not simple to solve. 1.1 History Sudoku puzzles are related to number puzzles throughout history. The set of valid 9x9 Sudoku solutions are a subset of the 9x9 Latin Squares (invented in 1983 by Euler) and similar to the set of 9x9 Magic Squares, a puzzle invented prior to the 10th century AD. The modern puzzle was first seen in 1979 in Dell Magazines called Number Place. Invented by the American architect Howard Garns, the puzzles soon became popular in Japan after being published by Nikoli, the first Japanese puzzle magazine [9]. Sudoku is short for Su-ji wa dokushin ni kagiru in Japanese, meaning the numbers must be single. 1.2 Popularity Sudoku has become a pop culture phenomenon dubbed the fasting growing puzzle in the world in 2005 by the world media [11]. Hundreds, if not thousands, of web sites exist to provide puzzle-addicts with their daily fix; nearly every modern newspaper includes puzzles on a daily or weekly basis. As a result, the need for efficient and plentiful generation of puzzles has become a particularly profitable venture - CNN Money reports that one firm received well over $1 million in revenue in less than a year from a puzzle generating program! [8] 2 Definition of Sudoku A traditional Sudoku puzzle (see Fig 1(a)) consists of a 9x9 grid of cells divided into 9 3x3 subsections called blocks. A Sudoku solution (see Fig 1(b)) is such a grid with each cell containing a single number, meeting the follwoing conditions: Every cell contains a single number between 1 and 9 Every row contains each of the numbers 1-9 exactly once Every column contains each of the numbers 1-9 exactly once Every box contains each of the numbers 1-9 exactly once A Sudoku puzzle is a Sudoku grid with some of the numbers filled in, called givens. The givens in a valid Sudoku puzzle must satisfy the following conditions: The givens do not violate any of the above rules of Sudoku
5 Team 2280 Page 5 of 22 There must be a single Sudoku solution that contains the set of givens, i.e. the set of givens must lead to a unique solution (a) The Puzzle (b) The Solution Figure 1: Sudoku Puzzle and Solution While variants abound (e.g. Killer-, Samurai-, and Hyper-Sudoku), for simplicity we consider only traditional 9x9 Sudoku puzzles. 3 Prior Work The combinatorics, algorithms, and solving techniques of Sudoku are well studied. In 2005, Felgenhauer and Jarvis showed that of the x9 Latin Squares, form valid Sudoku grids [3]. Royle hypothesizes the minimum number of clues required to produce a puzzle with unique solution is 17 [10]. Yato and Seta show that solving a Sudoku puzzle is NP-complete [12]. Solving techniques for humans and computers abound. Fowler details many logical strategies humans use to solve Sudoku puzzles, in 7 categories. We omit a lengthy discussion of these strategies. Suffice it to say that they range from the simple - examination of other elements in a given row/column - to the exotic - Swordfish, X-wing, and Forcing Chains techniques [4]. Computational solving techniques range from brute force algorithms and recursive backtracking (a classic introductory computer science assignment), to stochastic methods, constraint solving algorithms, integer programming, genetic algorithms, and computer learning [1] [5] [6] [7]. Difficulty rating algorithms are widely acknowledged to be more effective at determining computational difficulty rather than apparent difficulty to a human solver. Metrics for determining human-difficulty tend to calculate a function of the techniques described by Fowler - more difficult puzzles will tend to require more difficult techniques more times. However, the choice of weights is often arbitrarily set by the programmer.
6 Team 2280 Page 6 of 22 Publicly available generating algorithms vary in their approaches. The grand majority, however, use a randomized approach to generation. They repeatedly place a random number in a random position and then check the solvability of the puzzle until the puzzle has a unique solution. According to Fowler, solving a Sudoku puzzle is easier than generating one, and generating a puzzle is easier than evaluating its difficulty [4]. This is confirmed by Sudoku aficionados online forums. 4 Goals and Motivation Goal - to provide a difficulty metric and puzzle generation algorithm that are humanapplicable rather than arbitrary and esoteric. 4.1 Difficulty Metric We seek to provide a difficultly metric that is: accurate - well-correlated to human difficulty data versatile - widely distributes Sudoku puzzles into difficulty levels extensible - the number and difficulty of the levels can be changed. This can be achieved by making the metric continuous rather than discrete, and using thresholding to group into discrete levels 4.2 Puzzle Generation We seek to provide a puzzle generation algorithm that is: efficient - provides puzzles quickly customizable - the user should be able to specify difficulty and receive a puzzle of that difficulty in response versatile - offers a range of difficulty levels scalable - can provide a few or many puzzles of a given difficulty 5 Data Actual data derived from many individuals solving many puzzles is necessary to effectively evaluate the human-difficulty of a puzzle. Fortunately, the popularity of Sudoku puzzles makes this data easily available online. The web site sudoku.org.uk records average solve time and the number of individuals to solve the Daily Sudoku puzzles, with puzzles rated for 4 difficulties. First, we consider whether number of solvers or average solve time is a more effective measure of difficulty.
7 Team 2280 Page 7 of 22 (a) Number of Solvers (b) Average Solve Time Figure 2: Box plots of Distribution of Difficulty Measures for Puzzles of 4 Difficulties. Average Solve Time better differentiates between puzzle difficulties As seen in Fig 2, the time taken to solve a puzzle better differentiates between puzzle difficulties. Time to solve also provides a more widely distributed true difficulty. Intuitively, solve time is a reasonable, accurate measure. Puzzle solvers often to consider a puzzle s difficulty to be the amount of time they spend working on it. 6 Our Difficulty Metric - Predicted Solve Time We provide an predictor of Average Solve Time as a measure of difficulty, using a multiple regression model considering the number and type of solving logical strategies used to solve a puzzle. The program serate (Sudoku Explainer) by Nicolas Juillerat is used to provide a list of strategies used. 6.1 Preliminary Model and Complexity Our algorithm is as follows: Generation of the Model Use serate to generate a list of the usage of 31 logical strategies for each of 915 puzzles from sudoku.org.uk Use linear multiple regression to calculate model coefficients Application of the Model on a Puzzle P Use serate to generate a list of the usage of 31 logical strategies required to solve P
8 Team 2280 Page 8 of 22 Apply the linear model to P The algorithm is efficient in that it precomputes the 32 model coefficients. Calculation of the predictor given the strategies used is O(1). Computation of the metric is dominated by the determination of the logical strategies used to solve it, computed by serate. Serate considers each possible logical strategy at each stage of solving the puzzle, hence is computationally intensive. On a modern computer, 15 puzzles are processed in 1 second. 6.2 Analysis via Bootstrapping The model predicts solve time well when tested on the training data (see Fig 3, R 2 = 0.711). Figure 3: Relationship between Actual Solve Time and Predicted Solve Time, in minutes. Solve time is our difficulty metric Bootstrapping is a technique in which a set of data is repeatedly divided into two groups - a training set and a test set. The training set is used to build a model, which is evaluated on the test set. By generating a distribution of performance, the technique provides a measure of the performance of a model on unseen data, that is, data not used to build the model. Bootstrapping was used to evaluate the efficacy of the model in predicting the difficulty previously unseen data. The training set size was 604 samples, or 2/3 of the dataset.
9 Team 2280 Page 9 of 22 Performance on the test set, as measured by the coefficient of determination R 2, can be seen in Fig 6.2. The mean R 2 is , with a 95% confidence interval CI = [0.6078, ], indicating a strong relationship between the predicted and actual solve time on the test set. Figure 4: Bootstrapping Results: Performance of Model on Test Set 6.3 Meta-Model via Pseudo-Bagging Bagging, or bootstrap aggregating, can be used to improve the stability and accuracy of regression models, reducing variance and helping to avoid overfitting [2]. We use a slight modification to the traditional bagging procedure here - instead of having each bootstrap model contribute equally to the final predictor, we weight the contributions by the R 2 of that model, so better models contribute more. Because our model is linear, this does NOT increase the complexity of the meta-model. The coefficient of a predictor x i in the meta-model is the weighted sum of the coefficients in each model. The final meta-model is only a list of 32 coefficients. The meta-model was generated from the 1 million bootstrap iterations described in the previous section.
10 Team 2280 Page 10 of Analysis of Meta-Model Further bootstrapping was used to evaluate the efficacy of the meta-model. Due to computational limits, we could not iterate over 1-million-iteration meta-models. Rather, to estimate the effectiveness of this technique in general, 100 bootstrap iterations were conducted, generating a 10,000-iteration meta-model each time. In each outer iteration, 311 random samples were reserved for testing and 604 samples were passed to the meta-model generator. The generator used 398 of those samples for training and the remaining 206 samples for test, to determine the model weights. Each meta-model was then evaluated on the 311 test samples. The distribution of performance on the test set can be seen in Fig 6.4 Figure 5: Bootstrapping Results: Performance of Meta-Model on Test Set 7 Practical Analysis of our metric 7.1 Applicability Our metric is accurate - well-correlated to human solve times, with R 2 = 0.71 versatile - widely distributes Sudoku puzzles into difficulty, as solve times are well distributed
11 Team 2280 Page 11 of 22 extensible - by computing a continuous parameter, allowing specification of difficult bounds 7.2 Comparison to Other Difficulty Metrics We consider the correlation between the difficulty rating and actual average solve time for four publicly available metrics and four naive metrics we designed in Table 1. Table 2 describes the function of these metrics. Our metric focuses on human difficulty rather than arbitrary computational metrics, so out performs the other metrics we consider in predicting difficult for a human solver. Metric R 2 Our Final Metric mcm Others Metrics serate suexrat suexrate Fowler #r Our Naive Metrics Singles Cell# Singles SumPoss Intelligent BruteForce Seed Count Table 1: Performance of Various Metrics
12 Team 2280 Page 12 of 22 Metric mcm2280 serate suexrat9 suexrate Fowler #r Singles Cell# Singles SumPoss Intelligent BruteForce Seed Count Description Uses a linear meta-model considering the number and type of strategy necessary to solve a puzzle Uses the number and type of strategy necessary to solve a puzzle, with weights set by the programmer Improved version of suexrate, but faster and less canonical Conducts a depth-first search with minimal logic and takes the number of nodes in the search tree Uses the number and type of strategy necessary to solve a puzzle, grouping some strategies together, with the weights set by the programmer Counts the number of cells solvable by the simplest techniques Evaluates the number of possibilities remaining for each cell using the simplest techniques and takes the sum over all cells Considers the number of recursive calls when solving a puzzle using recursive backtracking Counts the number of givens in the puzzle Table 2: Description of Metrics 8 Our Puzzle Generation Algorithms An effective way to generate many puzzles is to transform an existing one. We first discuss the valid transformations on a Sudoku puzzle and then provide 3 algorithms to generate puzzles of desired difficulties. 8.1 Puzzle Transformation Much as a Latin Square can be permuted to form a new Latin Square, so can a completed Sudoku be permuted to form a new Sudoku. However, care must be taken to ensure that the permutations do not place more than one of a given number into a block. We restrict permuting rows or columns to permutations within a band or stack to prevent this from occurring. A band is a set of three blocks that are horizontally contiguous, and a stack is the vertical equivalent. This results in the following valid permutations: (see Fig 6) 1. Row permutation within a band 2. Column permutation within a stack 3. Symbol permutation: exchanging all instances of a given number with another, for instance replacing all 1 s with 4 s and all 4 s with 1 s
13 Team 2280 Page 13 of Transposition In addition, we allow the following two permutations: 5. Band Permutation: Swapping two of the bands 6. Stack Permutation: Swapping two of the stacks Figure 6: Valid 9x9 Sudoku Permutations Transforming a puzzle, while maintaining its difficulty and other properties, dramatically alters its appearance and creates a new puzzle, different from the original. The magnitude of change can be seen in Fig 7, a puzzle we generate using the Build Up Generator B, before and after transformation. 8.2 Generator A: Library-based Generator A library based puzzle generator is extremely efficient and generates a large spread of puzzle difficulties Algorithm Preprocessing Large sets of valid Sudoku puzzles are available. We produce a library from 100,000 puzzles available online by preprocessing the difficulty of each puzzle. Generation Using a Hash Table, a puzzle of requested difficulty can be retrieved and transformed nearly instantaneously.
14 Team 2280 Page 14 of (a) Before Transformation (b) After Transformation Figure 7: Transformation of a Generated Puzzle Applicability Our library consists of 100,000 puzzles with a range of difficulties (Fig 8). Figure 8: Library Generator - Difficulty Distribution
15 Team 2280 Page 15 of 22 Furthermore, each puzzle can be transformed to generate 2 9! 6 8 = 1, 218, 998, 108, 160 other puzzles. At a rate of one puzzle per second, it would take a solver 38 millenia to complete these puzzles. Note that our library consists of 100,000 puzzles before transformation Complexity By creating a library of puzzles with their difficulties, we avoid having to compute difficulty and checking validity, both time-consuming operations, on fly. If the puzzles are stored in a Hash Table indexed by their difficulty, retrieving a puzzle of desired difficulty is O(1). Transforming the puzzle also takes O(1) time Sample Puzzles (a) Predicted Difficulty: 2.81 minutes (c) Predicted Difficulty: minutes (b) Predicted Difficulty: minutes (d) Predicted Difficulty: minutes Figure 9: Library Generator - Sample Puzzles of 4 Different Difficulties
16 Team 2280 Page 16 of Generator B: Build-up from Solution Instead of picking random numbers to seed a blank puzzle, we start with a valid solution and generate a puzzle leading to that solution Latin Squares for Solution Generation To quickly and effectively generate a solution, we use the 12 unique 3x3 Latin Squares. Select nine 3x3 Latin Squares, with replacement. Place each of these squares into one of the blocks in a blank grid. Select another 3x3 Latin Square and match each cell with the corresponding block in the Sudoku grid. (Fig 10(a)). Each cell now has a pair of numbers. Treat these pairs as base 3 numbers, and convert to base 10, adding 1 (Fig 10(b)). Each cell now has the numbers 1-9. However blocks contain duplicates. Swap the 2nd & 4th rows, 3rd & 7th rows, and 6th & 8th rows preserving the row and column properties, and adding the desired property for blocks. This produces a valid Sudoku solution (Fig 10(c)). (a) Selection of Latin Squares (b) Conversion to Base 10 (c) Sudoku Solution Figure 10: Producing a Sudoku Solution from Latin Squares Intelligent Brute Force Solver We use a created an intelligent recursive-backtracking brute-force sudoku solver to determine the number of solutions for a set of givens to ensure uniqueness. Instead of recursing on the next cell sequentially, it calculates the number of possible values in all cells via the simplest logic techniques, and recurses on the most constrained cell. In practice, this Intelligent Brute Force algorithm performs on average 300 times fewer recursive calls than a conventional recursive-backtracking brute-force algorithm.
17 Team 2280 Page 17 of Algorithm This generator Uses latin squares to generate a valid Sudoku solution Places 40 values from that solution on a blank board Iteratively removes values, checking that the puzzle is still valid, until the appropriate difficulty is generated. Transforms the generated puzzle Figure 11: Flow Chart of Bottom Up Generator Complexity The complexity of this algorithm is defined by the number of calls to the Intelligent Brute Forcer to check solution uniqueness. We have calculated amortized statistics for the generation of a puzzle: 3 puzzles generated per second, before transformation More than 1.2 trillion puzzles can be produced from each nearly instantaneously 103 calls to the Intelligent Brute Forcer per generated puzzle 2.3 failed placements of the initial 40 values, per puzzle generated
18 Team 2280 Page 18 of Sample Puzzles (a) Predicted Difficulty: 8.23 minutes (c) Predicted Difficulty: minutes (b) Predicted Difficulty: minutes (d) Predicted Difficulty: minutes Figure 12: Build up Generator - Sample Puzzles of 4 Different Difficulties This algorithm generates puzzles with a range of difficulties (Fig 13). Figure 13: Build Up Generator - Difficulty Distribution
19 Team 2280 Page 19 of Generator C: Template Generation Puzzles provided by the puzzle magazine Nikoli are prefered by some Sudoku enthusiasts because they are all human-generated. This is often reflected in their symmetry or their conformation to a specific pattern of locations of givens. We provide an algorithm with this functionality Algorithm Read a template from the user Iterate: Generate a valid solution using Latin Squares Place the appropriate values in the template locations Return if the puzzle has a unique solution Tranform the puzzle ONLY with symbol permutation Complexity The performance of this algorithm varies substantially with the template provided. In the worst case, if there are no valid puzzles with that pattern, it will run in O(9 n ) time, where n is the number of locations in the template. If sufficiently many cells are given in the template, however, it is efficient. The MCM puzzle on the first page and those below were generated via this algorithm. It produced approximately 1 puzzle per minute, before transformation. Each puzzle could be symbol permuted to 9! = 362, 880 other valid puzzles. We note that we generate puzzles from one class of solution grids, namely those generated by Latin Squares. It is possible that there exists a valid puzzle of another class that our algorithm will not find Sample Puzzles Because these puzzles are more constrained, this algorithm generates puzzles with a smaller range of difficulties.
20 Team 2280 Page 20 of (a) Predicted Difficulty: minutes (c) Predicted Difficulty: minutes (b) Predicted Difficulty: minutes (d) Predicted Difficulty: minutes Figure 14: Template Generator - Sample MCM Puzzles of 4 Different Difficulties Figure 15: Template Generator - Difficulty Distribution for MCM puzzles
21 Team 2280 Page 21 of 22 9 Conclusion Sudoku puzzle rating and generation algorithms abound. Our model differs in that we focus on the human-applicability of our rating and generation algorithms. Rather than defining parameters used to rate difficulty by hand, we use multiple regression and bootstrap aggregation to better model the predicted time an average solver will require to complete a puzzle. We present difficulty rating metric that outperforms all other metrics we consider in predicting the average solve time for a set of 915 puzzles. Furthermore, we demonstrate that this metric is extensible and versatile. We also provide a suite of puzzle generation algorithms. We provide a library based algorithm that provides many puzzles in O(1) time; a generator that uses a solution to build up a puzzle; and a template-based generator that can be used to mimic popular human-constructed Sudoku puzzles. Each of these algorithms, particularly the first two, rapidly provide puzzles with a broad range of difficulties. Furthermore, each method is capable of generating a large number of new puzzles. Further work could explore a more direct puzzle generation algorithm that makes use of the unique nature of our difficulty metric. By applying the logical strategies we consider in reverse to selectively remove givens from a completed grid, a puzzle that requires certain strategies to solve could be constructed, effectively tailoring the puzzle to conform to a desired difficulty. This could potentially generate puzzles of a desired difficulty more efficiently.
22 Team 2280 Page 22 of 22 References [1] C. Agerbeck and M. O. Hansen. A multi-agent approach to solving NP-complete problems. Master s thesis, Informatics and Mathematical Modelling, Technical University of Denmark, DTU, Richard Petersens Plads, Building 321, DK-2800 Kgs. Lyngby, Supervised by Assoc. Prof. Thomas Bolander, IMM, DTU. [2] Leo Breiman. Bagging predictors. Machine Learning, 24(2): , [3] B. Felgenhauer and F. Jarvis. Enumerating possible sudoku grids [4] G. Fowler. A 9x9 sudoku solver and generator, Jan [5] Martin Henz and Hoang-Minh Tuong. Sudokusat a tool for analyzing difficult sudoku puzzles. In Proceedings of the First International Workshop on Applications with Artificial Intelligence, Studies in Computational Intelligence, Patras, Greece, Springer-Verlag, Berlin. to appear. [6] Ines Lynce Ist and Joel Ouaknine. Sudoku as a sat problem. [7] Timo Mantere and Janne Koljonen. Solving and rating sudoku puzzles with genetic algorithms. In Proceedings of the 12th Finnish Artificial Intelligence Conference STeP, pages 86 92, [8] P. La Monica. Much ado about sudoku. CNNMoney.com, Sept [9] Ltd Nikoli Co. web nikoli - enjoy pencil puzzles!, Feb [10] G. Royle. Minimum sudoku. [11] sud0ku.com. Sudoku - popularity in the media, Feb [12] T. Yato and T. Seta. Complexity and completeness of finding another solution and its application to puzzles.
Genetic Algorithms and Sudoku
Genetic Algorithms and Sudoku Dr. John M. Weiss Department of Mathematics and Computer Science South Dakota School of Mines and Technology (SDSM&T) Rapid City, SD 57701-3995 [email protected] MICS 2009
Mathematics of Sudoku I
Mathematics of Sudoku I Bertram Felgenhauer Frazer Jarvis January, 00 Introduction Sudoku puzzles became extremely popular in Britain from late 00. Sudoku, or Su Doku, is a Japanese word (or phrase) meaning
Enumerating possible Sudoku grids
Enumerating possible Sudoku grids Bertram Felgenhauer Department of Computer Science TU Dresden 00 Dresden Germany [email protected] Frazer Jarvis Department of Pure Mathematics University of Sheffield,
Methods for Solving Sudoku Puzzles
Methods for Solving Sudoku Puzzles CSCI - 5454, CU Boulder Anshul Kanakia [email protected] John Klingner [email protected] 1 Introduction 1.1 A Brief History This number puzzle is relatively
Sudoku puzzles and how to solve them
Sudoku puzzles and how to solve them Andries E. Brouwer 2006-05-31 1 Sudoku Figure 1: Two puzzles the second one is difficult A Sudoku puzzle (of classical type ) consists of a 9-by-9 matrix partitioned
Sudoku Puzzles Generating: from Easy to Evil
Team # 3485 Page 1 of 20 Sudoku Puzzles Generating: from Easy to Evil Abstract As Sudoku puzzle becomes worldwide popular among many players in different intellectual levels, the task is to devise an algorithm
PARALLELIZED SUDOKU SOLVING ALGORITHM USING OpenMP
PARALLELIZED SUDOKU SOLVING ALGORITHM USING OpenMP Sruthi Sankar CSE 633: Parallel Algorithms Spring 2014 Professor: Dr. Russ Miller Sudoku: the puzzle A standard Sudoku puzzles contains 81 grids :9 rows
An Integer Programming Model for the Sudoku Problem
An Integer Programming Model for the Sudoku Problem Andrew C. Bartlett Timothy P. Chartier Amy N. Langville Timothy D. Rankin May 3, 2008 Abstract Sudoku is the recent craze in logic puzzles. Players must
A Review of Sudoku Solving using Patterns
International Journal of Scientific and Research Publications, Volume 3, Issue 5, May 2013 1 A Review of Sudoku Solving using Patterns Rohit Iyer*, Amrish Jhaveri*, Krutika Parab* *B.E (Computers), Vidyalankar
A search based Sudoku solver
A search based Sudoku solver Tristan Cazenave Labo IA Dept. Informatique Université Paris 8, 93526, Saint-Denis, France, [email protected] Abstract. Sudoku is a popular puzzle. In this paper we
Kenken For Teachers. Tom Davis [email protected] http://www.geometer.org/mathcircles June 27, 2010. Abstract
Kenken For Teachers Tom Davis [email protected] http://www.geometer.org/mathcircles June 7, 00 Abstract Kenken is a puzzle whose solution requires a combination of logic and simple arithmetic skills.
Sudoku Puzzles: Medium
Puzzles: Medium The modern version of was invented in 1979 by Howard Garns in USA (where it was called `Number Place'). It became really popular in Japan in the 1980s and in the UK since late 2004. It
Decision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
Sudoku as a SAT Problem
Sudoku as a SAT Problem Inês Lynce IST/INESC-ID, Technical University of Lisbon, Portugal [email protected] Joël Ouaknine Oxford University Computing Laboratory, UK [email protected] Abstract Sudoku
Determining Degree Of Difficulty In Rogo, A TSP-based Paper Puzzle
Determining Degree Of Difficulty In Rogo, A TSP-based Paper Puzzle Dr Nicola Petty, Dr Shane Dye Department of Management University of Canterbury New Zealand {shane.dye,nicola.petty}@canterbury.ac.nz
Counting, Generating, and Solving Sudoku
Counting, Generating, and Solving Sudoku Mathias Weller April 21, 2008 Abstract In this work, we give an overview of the research done so far on the topic of Sudoku grid enumeration, solving, and generating
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Analysis of Micromouse Maze Solving Algorithms
1 Analysis of Micromouse Maze Solving Algorithms David M. Willardson ECE 557: Learning from Data, Spring 2001 Abstract This project involves a simulation of a mouse that is to find its way through a maze.
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Follow links Class Use and other Permissions. For more information, send email to: [email protected]
COPYRIGHT NOTICE: David A. Kendrick, P. Ruben Mercado, and Hans M. Amman: Computational Economics is published by Princeton University Press and copyrighted, 2006, by Princeton University Press. All rights
Data Visualization Techniques
Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The
Multiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
1 Review of Newton Polynomials
cs: introduction to numerical analysis 0/0/0 Lecture 8: Polynomial Interpolation: Using Newton Polynomials and Error Analysis Instructor: Professor Amos Ron Scribes: Giordano Fusco, Mark Cowlishaw, Nathanael
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Model Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
Random Forest Based Imbalanced Data Cleaning and Classification
Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
Data Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
Data Visualization Techniques
Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The
L25: Ensemble learning
L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna
Software that writes Software Stochastic, Evolutionary, MultiRun Strategy Auto-Generation. TRADING SYSTEM LAB Product Description Version 1.
Software that writes Software Stochastic, Evolutionary, MultiRun Strategy Auto-Generation TRADING SYSTEM LAB Product Description Version 1.1 08/08/10 Trading System Lab (TSL) will automatically generate
Beating the MLB Moneyline
Beating the MLB Moneyline Leland Chen [email protected] Andrew He [email protected] 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series
Design and Analysis of ACO algorithms for edge matching problems
Design and Analysis of ACO algorithms for edge matching problems Carl Martin Dissing Söderlind Kgs. Lyngby 2010 DTU Informatics Department of Informatics and Mathematical Modelling Technical University
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
NP-Completeness and Cook s Theorem
NP-Completeness and Cook s Theorem Lecture notes for COM3412 Logic and Computation 15th January 2002 1 NP decision problems The decision problem D L for a formal language L Σ is the computational task:
COPYRIGHT 2006 SCIENTIFIC AMERICAN, INC.
YOUR CHALLENGE: Insert a number from to in each cell without repeating any of those digits in the same row, column or subgrid ( box). The solutions to the moderately difficult Sudoku in the center and
Gamesman: A Graphical Game Analysis System
Gamesman: A Graphical Game Analysis System Dan Garcia Abstract We present Gamesman, a graphical system for implementing, learning, analyzing and playing small finite two-person
Sudoku Creation and Grading
Sudoku Creation and Grading Andrew C. Stuart February 2007 Updated Jan 2012 Introduction The purpose of this paper is to explain how my sudoku puzzles are created and how they are graded; the two most
Product Geometric Crossover for the Sudoku Puzzle
Product Geometric Crossover for the Sudoku Puzzle Alberto Moraglio Dept. of Computer Science University of Essex, UK [email protected] Julian Togelius Dept. of Computer Science University of Essex, UK
New Improvements in Optimal Rectangle Packing
New Improvements in Optimal Rectangle Packing Eric Huang and Richard E. Korf Computer Science Department University of California, Los Angeles Los Angeles, CA 90095 [email protected], [email protected]
Benchmarking Open-Source Tree Learners in R/RWeka
Benchmarking Open-Source Tree Learners in R/RWeka Michael Schauerhuber 1, Achim Zeileis 1, David Meyer 2, Kurt Hornik 1 Department of Statistics and Mathematics 1 Institute for Management Information Systems
Sudoku Madness. Team 3: Matt Crain, John Cheng, and Rabih Sallman
Sudoku Madness Team 3: Matt Crain, John Cheng, and Rabih Sallman I. Problem Description Standard Sudoku is a logic-based puzzle in which the user must fill a 9 x 9 board with the appropriate digits so
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Semantic Analysis of Business Process Executions
Semantic Analysis of Business Process Executions Fabio Casati, Ming-Chien Shan Software Technology Laboratory HP Laboratories Palo Alto HPL-2001-328 December 17 th, 2001* E-mail: [casati, shan] @hpl.hp.com
Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary
Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:
Introduction Solvability Rules Computer Solution Implementation. Connect Four. March 9, 2010. Connect Four
March 9, 2010 is a tic-tac-toe like game in which two players drop discs into a 7x6 board. The first player to get four in a row (either vertically, horizontally, or diagonally) wins. The game was first
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
Note on growth and growth accounting
CHAPTER 0 Note on growth and growth accounting 1. Growth and the growth rate In this section aspects of the mathematical concept of the rate of growth used in growth models and in the empirical analysis
Data Mining Techniques for Prognosis in Pancreatic Cancer
Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree
Big Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski [email protected] Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
CART 6.0 Feature Matrix
CART 6.0 Feature Matri Enhanced Descriptive Statistics Full summary statistics Brief summary statistics Stratified summary statistics Charts and histograms Improved User Interface New setup activity window
A Note on Maximum Independent Sets in Rectangle Intersection Graphs
A Note on Maximum Independent Sets in Rectangle Intersection Graphs Timothy M. Chan School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada [email protected] September 12,
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Lecture 3: Finding integer solutions to systems of linear equations
Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture
A Sarsa based Autonomous Stock Trading Agent
A Sarsa based Autonomous Stock Trading Agent Achal Augustine The University of Texas at Austin Department of Computer Science Austin, TX 78712 USA [email protected] Abstract This paper describes an autonomous
OPTIMIZATION AND FORECASTING WITH FINANCIAL TIME SERIES
OPTIMIZATION AND FORECASTING WITH FINANCIAL TIME SERIES Allan Din Geneva Research Collaboration Notes from seminar at CERN, June 25, 2002 General scope of GRC research activities Econophysics paradigm
Multivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
Excel -- Creating Charts
Excel -- Creating Charts The saying goes, A picture is worth a thousand words, and so true. Professional looking charts give visual enhancement to your statistics, fiscal reports or presentation. Excel
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
Using multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
Make Better Decisions with Optimization
ABSTRACT Paper SAS1785-2015 Make Better Decisions with Optimization David R. Duling, SAS Institute Inc. Automated decision making systems are now found everywhere, from your bank to your government to
Don t Slow Me Down with that Calculator Cliff Petrak (Teacher Emeritus) Brother Rice H.S. Chicago [email protected]
Don t Slow Me Down with that Calculator Cliff Petrak (Teacher Emeritus) Brother Rice H.S. Chicago [email protected] In any computation, we have four ideal objectives to meet: 1) Arriving at the correct
Introduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
Kapitel 1 Multiplication of Long Integers (Faster than Long Multiplication)
Kapitel 1 Multiplication of Long Integers (Faster than Long Multiplication) Arno Eigenwillig und Kurt Mehlhorn An algorithm for multiplication of integers is taught already in primary school: To multiply
SUDOKU SOLVING TIPS GETTING STARTED
TTN STRT SUOKU SOLVN TPS Terms you will need to know lock: One of the nine x sections that make up the Sudoku grid. andidate(s): The possible numbers that could be in a cell. ell: single square in a Sudoku
Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits
Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Solving convex MINLP problems with AIMMS
Solving convex MINLP problems with AIMMS By Marcel Hunting Paragon Decision Technology BV An AIMMS White Paper August, 2012 Abstract This document describes the Quesada and Grossman algorithm that is implemented
The correlation coefficient
The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative
A simple three dimensional Column bar chart can be produced from the following example spreadsheet. Note that cell A1 is left blank.
Department of Library Services Creating Charts in Excel 2007 www.library.dmu.ac.uk Using the Microsoft Excel 2007 chart creation system you can quickly produce professional looking charts. This help sheet
Simple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
An Empirical Study of Two MIS Algorithms
An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. [email protected],
COMPUTING CLOUD MOTION USING A CORRELATION RELAXATION ALGORITHM Improving Estimation by Exploiting Problem Knowledge Q. X. WU
COMPUTING CLOUD MOTION USING A CORRELATION RELAXATION ALGORITHM Improving Estimation by Exploiting Problem Knowledge Q. X. WU Image Processing Group, Landcare Research New Zealand P.O. Box 38491, Wellington
Guessing Game: NP-Complete?
Guessing Game: NP-Complete? 1. LONGEST-PATH: Given a graph G = (V, E), does there exists a simple path of length at least k edges? YES 2. SHORTEST-PATH: Given a graph G = (V, E), does there exists a simple
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
INTEGER PROGRAMMING. Integer Programming. Prototype example. BIP model. BIP models
Integer Programming INTEGER PROGRAMMING In many problems the decision variables must have integer values. Example: assign people, machines, and vehicles to activities in integer quantities. If this is
Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
2) Write in detail the issues in the design of code generator.
COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage
AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics
Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
Irrational Numbers. A. Rational Numbers 1. Before we discuss irrational numbers, it would probably be a good idea to define rational numbers.
Irrational Numbers A. Rational Numbers 1. Before we discuss irrational numbers, it would probably be a good idea to define rational numbers. Definition: Rational Number A rational number is a number that
HYBRID GENETIC ALGORITHMS FOR SCHEDULING ADVERTISEMENTS ON A WEB PAGE
HYBRID GENETIC ALGORITHMS FOR SCHEDULING ADVERTISEMENTS ON A WEB PAGE Subodha Kumar University of Washington [email protected] Varghese S. Jacob University of Texas at Dallas [email protected]
Artificial Intelligence in Retail Site Selection
Artificial Intelligence in Retail Site Selection Building Smart Retail Performance Models to Increase Forecast Accuracy By Richard M. Fenker, Ph.D. Abstract The term Artificial Intelligence or AI has been
White Paper. Data Mining for Business
White Paper Data Mining for Business January 2010 Contents 1. INTRODUCTION... 3 2. WHY IS DATA MINING IMPORTANT?... 3 FUNDAMENTALS... 3 Example 1...3 Example 2...3 3. OPERATIONAL CONSIDERATIONS... 4 ORGANISATIONAL
Operation Count; Numerical Linear Algebra
10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point
Decision Mathematics D1 Advanced/Advanced Subsidiary. Tuesday 5 June 2007 Afternoon Time: 1 hour 30 minutes
Paper Reference(s) 6689/01 Edexcel GCE Decision Mathematics D1 Advanced/Advanced Subsidiary Tuesday 5 June 2007 Afternoon Time: 1 hour 30 minutes Materials required for examination Nil Items included with
Conducting an empirical analysis of economic data can be rewarding and
CHAPTER 10 Conducting a Regression Study Using Economic Data Conducting an empirical analysis of economic data can be rewarding and informative. If you follow some basic guidelines, it is possible to use
