Option 1: empirical network analysis. Task: find data, analyze data (and visualize it), then interpret.

Programming project Task Option 1: empirical network analysis. Task: find data, analyze data (and visualize it), then interpret. Obtaining data This project focuses upon cocktail ingredients. Data was sourced from ingredients in cocktail recipes included in the International Bartenders Association official cocktails list (http://en.wikipedia.org/wiki/iba_official_cocktail). The data already existed as ingredient lists on Wikipedia pages for each cocktail, which needed to be translated into a format for social network analysis. The data was entered into the data laboratory in Gephi by myself. Each ingredient was entered as a node. An edge was added each time a pair of ingredients were used together in a cocktail. The graph was undirected, and comprised a total of 73 nodes and 227 edges. Data analysis Three questions were used to structure application of a selection of metrics to the data. How does the network compare to models of social networks? The network consists of a single connected component. It has a diameter of 6, and an average path length of 2.66.

It has a clustering coefficient of 0.65. A random graph with 73 nodes and corresponding clustering coefficient for 227 edges (i.e. 0.043188737) would have a much lower clustering coefficient (~0.015), larger diameter (~10), and longer shortest path (~3.85). Which ingredients are the most useful to have in my cocktail cabinet? This question was first addressed by looking at the degree of ingredients, assuming that a higher degree being used more frequently in combination with other ingredients would indicate that a node is an ingredient in a greater number of cocktail recipes. Average degree was calculated in Gephi, and the nodes sorted in order of degree in the Data Laboratory view. The top five highest degree ingredients were as follows: Ingredient Degree Cointreau 23 Lemon juice 23 Vodka 21 White rum 21 Angostura bitters 19 It could also be addressed by considering centrality of nodes within the network. Eigenvector centrality was calculated in Gephi, and yielded a similar top five ingredients, although vodka is no longer present and gin is instead: Ingredient Eigenvector centrality Lemon juice 1 Cointreau 0.952 Gin 0.904

White rum 0.858 Angostura bitters 0.749 Are there groups of different cocktail types? This was addressed by running modularity in Gephi, which yielded seven communities within the graph. Modularity class was then used to colour code nodes on the network graph:

This clearly shows different communities of types of cocktail, including for example wine cocktails in red, rum cocktails in turquoise, whiskey cocktails in green, savoury sauce cocktails in yellow.

Interpretation A limitation of this data is that by looking at links between ingredients, information about specific cocktails is lost. The information presented here has potential to be used to recommend novel cocktails based on a persons current favourite cocktails. It could also be used to experiment with making new cocktail recipes; the communities give an indication of which spirits go best with which mixers and other ingredients, giving inspiration for new combinations to try (i.e. by looking for structural holes Burt, 1995). References Burt, R.S. (1995) Structural holes: the social structure of competition. Harvard University Press.