Computational Tools for Big Data Python Libraries
|
|
|
- Alexandra Harrison
- 10 years ago
- Views:
Transcription
1 Computational Tools for Big Data Python Libraries Finn Årup Nielsen DTU Compute Technical University of Denmark September 15, 2015
2 Overview Numpy numerical arrays with fast computation Scipy computation science functions Scikit-learn (sklearn) machine learning Pandas Annotated numpy arrays Cython write C program in Python Finn Årup Nielsen 1 September 15, 2015
3 Python numerics The problem with Python: >>> [1, 2, 3] * 3 Finn Årup Nielsen 2 September 15, 2015
4 Python numerics The problem with Python: >>> [1, 2, 3] * 3 [1, 2, 3, 1, 2, 3, 1, 2, 3] # Not [3, 6, 9]! Finn Årup Nielsen 3 September 15, 2015
5 Python numerics The problem with Python: >>> [1, 2, 3] * 3 [1, 2, 3, 1, 2, 3, 1, 2, 3] # Not [3, 6, 9]! >>> [1, 2, 3] + 1 # Wants [2, 3, 4]... Finn Årup Nielsen 4 September 15, 2015
6 Python numerics The problem with Python: >>> [1, 2, 3] * 3 [1, 2, 3, 1, 2, 3, 1, 2, 3] # Not [3, 6, 9]! >>> [1, 2, 3] + 1 # Wants [2, 3, 4]... Traceback ( most recent call last ): File "<stdin >", line 1, in <module > TypeError : can only concatenate list ( not " int ") to list Finn Årup Nielsen 5 September 15, 2015
7 Python numerics The problem with Python: >>> [1, 2, 3] * 3 [1, 2, 3, 1, 2, 3, 1, 2, 3] # Not [3, 6, 9]! >>> [1, 2, 3] + 1 # Wants [2, 3, 4]... Traceback ( most recent call last ): File "<stdin >", line 1, in <module > TypeError : can only concatenate list ( not " int ") to list >>> # Matrix multiplication >>> [[1, 2], [3, 4]] * [[5, 6], [7, 8]] Finn Årup Nielsen 6 September 15, 2015
8 Python numerics The problem with Python: >>> [1, 2, 3] * 3 [1, 2, 3, 1, 2, 3, 1, 2, 3] # Not [3, 6, 9]! >>> [1, 2, 3] + 1 # Wants [2, 3, 4]... Traceback ( most recent call last ): File "<stdin >", line 1, in <module > TypeError : can only concatenate list ( not " int ") to list >>> # Matrix multiplication >>> [[1, 2], [3, 4]] * [[5, 6], [7, 8]] Traceback ( most recent call last ): File "<stdin >", line 1, in <module > TypeError : can t multiply sequence by non - int of type list Finn Årup Nielsen 7 September 15, 2015
9 map(), reduce() and filter() Poor-man s vector operations: the map() built-in function: >>> map ( lambda x: 3*x, [1, 2, 3]) [3, 6, 9] lambda is for an anonymous function, x the input argument, and 3*x the function and the return argument. Also possible with ordinary functions: >>> from math import sin, pow >>> map (sin, [1, 2, 3]) [ , , ] >>> map (pow, [1, 2, 3], [3, 2, 1]) [1.0, 4.0, 3.0] Finn Årup Nielsen 8 September 15, 2015
10 ... or with comprehensions List comprehensions: >>> [3* x for x in [1, 2, 3]] [3, 6, 9] Generator >>> (3* x for x in [1, 2, 3]) < generator object < genexpr > at 0 x7f9060a72eb0 > Finn Årup Nielsen 9 September 15, 2015
11 Problem But control structures are usually slow in Python. Finn Årup Nielsen 10 September 15, 2015
12 Solution: Numpy >>> from numpy import * Finn Årup Nielsen 11 September 15, 2015
13 Solution: Numpy >>> from numpy import * >>> array ([1, 2, 3]) * 3 Finn Årup Nielsen 12 September 15, 2015
14 Solution: Numpy >>> from numpy import * >>> array ([1, 2, 3]) * 3 array ([3, 6, 9]) Finn Årup Nielsen 13 September 15, 2015
15 Solution: Numpy >>> from numpy import * >>> array ([1, 2, 3]) * 3 array ([3, 6, 9]) >>> array ([1, 2, 3]) + 1 Finn Årup Nielsen 14 September 15, 2015
16 Solution: Numpy >>> from numpy import * >>> array ([1, 2, 3]) * 3 array ([3, 6, 9]) >>> array ([1, 2, 3]) + 1 array ([2, 3, 4]) Finn Årup Nielsen 15 September 15, 2015
17 Solution: Numpy >>> from numpy import * >>> array ([1, 2, 3]) * 3 array ([3, 6, 9]) >>> array ([1, 2, 3]) + 1 array ([2, 3, 4]) >>> array ([[1, 2], [3, 4]]) * [[5, 6], [7, 8]] Finn Årup Nielsen 16 September 15, 2015
18 Solution: Numpy >>> from numpy import * >>> array ([1, 2, 3]) * 3 array ([3, 6, 9]) >>> array ([1, 2, 3]) + 1 array ([2, 3, 4]) >>> array ([[1, 2], [3, 4]]) * [[5, 6], [7, 8]] array ([[ 5, 12], [21, 32]]) Finn Årup Nielsen 17 September 15, 2015
19 Solution: Numpy >>> from numpy import * >>> array ([1, 2, 3]) * 3 array ([3, 6, 9]) >>> array ([1, 2, 3]) + 1 array ([2, 3, 4]) >>> array ([[1, 2], [3, 4]]) * [[5, 6], [7, 8]] array ([[ 5, 12], [21, 32]]) This is elementwise multiplication... Finn Årup Nielsen 18 September 15, 2015
20 Solution: Numpy >>> from numpy import * >>> array ([1, 2, 3]) * 3 array ([3, 6, 9]) >>> array ([1, 2, 3]) + 1 array ([2, 3, 4]) >>> array ([[1, 2], [3, 4]]) * [[5, 6], [7, 8]] array ([[ 5, 12], [21, 32]]) This is elementwise multiplication... >>> matrix ([[1, 2], [3, 4]]) * [[5, 6], [7, 8]] Finn Årup Nielsen 19 September 15, 2015
21 Solution: Numpy >>> from numpy import * >>> array ([1, 2, 3]) * 3 array ([3, 6, 9]) >>> array ([1, 2, 3]) + 1 array ([2, 3, 4]) >>> array ([[1, 2], [3, 4]]) * [[5, 6], [7, 8]] array ([[ 5, 12], [21, 32]]) This is elementwise multiplication... >>> matrix ([[1, 2], [3, 4]]) * [[5, 6], [7, 8]] matrix ([[19, 22], [43, 50]]) Finn Årup Nielsen 20 September 15, 2015
22 Numpy arrays... There are two basic types array and matrix: An array may be a vector (one-dimensional array) >>> from numpy import * >>> array ([1, 2, 3, 4]) array ([1, 2, 3, 4]) Or a matrix (a two-dimensional array) >>> array ([[1, 2], [3, 4]]) array ([[1, 2], [3, 4]]) Finn Årup Nielsen 21 September 15, 2015
23 ... Numpy arrays... Or a higher-order tensor. Here a 2-by-2-by-2 tensor: >>> array ([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) array ([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) N, 1 N and N 1 array-s are different: >>> array ([[1, 2, 3, 4]])[2] # There is no third row Traceback ( most recent call last ): File "<stdin >", line 1, in? IndexError : index out of bounds Finn Årup Nielsen 22 September 15, 2015
24 ... Numpy arrays A matrix is always two-dimensional, e.g., this >>> matrix ([1, 2, 3, 4]) matrix ([[1, 2, 3, 4]]) is a two-dimensional data structure with one row and four columns... and with a 2-by-2 matrix: >>> matrix ([[1, 2], [3, 4]]) matrix ([[1, 2], [3, 4]]) Finn Årup Nielsen 23 September 15, 2015
25 Numpy arrays copy To copy by reference use asmatrix() or mat() (from an array) or asarray() (from a matrix) >>> a = array ([1, 2, 3, 4]) >>> m = asmatrix ( a) # Copy as reference >>> a [0] = 1000 >>> m matrix ([[1000, 2, 3, 4]]) To copy elements use matrix() or array() >>> a = array ([1, 2, 3, 4]) >>> m = matrix ( a) # Copy elements >>> a [0] = 1000 >>> m matrix ([[1, 2, 3, 4]]) Finn Årup Nielsen 24 September 15, 2015
26 Datatypes Elements are 4 bytes integers or 8 bytes float per default: >>> array ([1, 2, 3, 4]). itemsize # Number of bytes for each 4 >>> array ([1., 2., 3., 4.]). itemsize 8 >>> array ([1., 2, 3, 4]). itemsize # Not heterogeneous 8 array and matrix can be called with datatype to set it otherwise: >>> array ([1, 2], int8 ). itemsize 1 >>> array ([1, 2], float32 ). itemsize 4 Finn Årup Nielsen 25 September 15, 2015
27 Initialization of arrays Functions ones(), zeros(), eye() (also identity()), linspace() work as expected (from Matlab), though the first argument for ones() and zeros() should contain the size in a list or tuple: >>> zeros ((1, 2)) # zeros (1, 2) doesn t work array ([[ 0., 0.]]) To generated a list of increasing numbers: >>> r_ [1:11] array ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) >>> arange (1, 11) array ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) Finn Årup Nielsen 26 September 15, 2015
28 Diagonal... The diagonal() function works both for matrix and two-dimensional array: >>> diagonal ( matrix ([[1, 2], [3, 4]])) array ([1, 4]) Note now the vector/matrix is an one-dimensional array It works for higher order arrays: >>> diagonal ( array ([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])) array ([[1, 7], [2, 8]]) diagonal() does not work for one-dimensional arrays. Finn Årup Nielsen 27 September 15, 2015
29 ... Diagonal Yet another function: diag(). Works for one- and two-dimensional matrix and array >>> m = matrix ([[1, 2], [3, 4]]) >>> d = diag (m) >>> d array ([1, 4]) >>> diag (d) array ([[1, 0], [0, 4]]) Like Matlab s diag(). It is also possible to specify the diagonal: diag(m,1) Finn Årup Nielsen 28 September 15, 2015
30 Matrix transpose Matrix transpose is different with Python s array and matrix and Matlab >>> A = array ([[1+1 j, 1+2 j], [2+1j, 2+2 j ]]); A array ([[ 1.+1.j, j], [ 2.+1.j, j ]]).T for array and matrix (like Matlab. ) and.h for matrix (like Matlab : >>> A. T # No conjucation. Also : A. transpose () or transpose ( A) array ([[ 1.+1.j, j], [ 1.+2.j, j ]]) >>> mat ( A). H # Complex conjugate transpose. Or: A. conj (). T matrix ([[ j, j], [ j, j ]]) Finn Årup Nielsen 29 September 15, 2015
31 Matrix transpose and copy >>> A = np. array ([[1, 2], [3, 4]]) >>> B = A.T >>> B [0,0] = 45 >>> A array ([[45, 2], [ 3, 4]]) B is a reference to the transposed version. Finn Årup Nielsen 30 September 15, 2015
32 Matrix transpose and copy >>> A = np. array ([[1, 2], [3, 4]]) >>> B = A.T >>> B [0,0] = 45 >>> A array ([[45, 2], [ 3, 4]]) B is a reference to the transposed version. >>> A = np. array ([[1, 2], [3, 4]]) >>> B = A.T. copy () # Copy! >>> B [0,0] = 45 >>> A array ([[1, 2], [3, 4]]) Finn Årup Nielsen 31 September 15, 2015
33 Sizes and reshaping A 2-by-2-by-2 tensor: >>> a = array ([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) >>> a. ndim # Number of dimensions 3 >>> a. shape # Number of elements in each dimension (2, 2, 2) >>> a. shape = (1, 8); a # Reshaping : 1- by -8 matrix array ([[1, 2, 3, 4, 5, 6, 7, 8]]) >>> a. shape = 8; a # 8- element vector array ([1, 2, 3, 4, 5, 6, 7, 8]) There is a related function for the last line: >>> a. flatten () # Always copy array ([1, 2, 3, 4, 5, 6, 7, 8]) Finn Årup Nielsen 32 September 15, 2015
34 Indexing... Ordinary lists: >>> L = [[1, 2], [3, 4]] >>> L[1, 1] What happens here? Finn Årup Nielsen 33 September 15, 2015
35 Indexing... Ordinary lists: >>> L = [[1, 2], [3, 4]] >>> L[1, 1] What happens here? Traceback ( most recent call last ): File "<stdin >", line 1, in? TypeError : list indices must be integers Should have been L[1][1]. With matrices: >>> mat (L)[1, 1] Finn Årup Nielsen 34 September 15, 2015
36 Indexing... Ordinary lists: >>> L = [[1, 2], [3, 4]] >>> L[1, 1] What happens here? Traceback ( most recent call last ): File "<stdin >", line 1, in? TypeError : list indices must be integers Should have been L[1][1]. With Numpy matrices: >>> mat (L)[1, 1] 4 Finn Årup Nielsen 35 September 15, 2015
37 ... Indexing... >>> mat (L )[1][1] What happens here? Finn Årup Nielsen 36 September 15, 2015
38 ... Indexing... >>> mat (L )[1][1] What happens here? Error message: IndexError : index out of bounds >>> mat (L )[1] Finn Årup Nielsen 37 September 15, 2015
39 ... Indexing... >>> mat (L )[1][1] What happens here? Error message: IndexError : index out of bounds >>> mat (L )[1] matrix ([[3, 4]]) >>> asarray (L )[1] array ([3, 4]) >>> asarray (L )[1][1] 4 >>> asarray (L )[1,1] 4 Finn Årup Nielsen 38 September 15, 2015
40 Indexing with multiple indices >>> A = matrix ([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> i = [1, 2] # Second and third row / column >>> A[i,i] matrix ([[5, 9]]) # Elements from diagonal >>> A[ ix_ (i,i)] # Block matrix matrix ([[5, 6], [8, 9]]) Take out third, fourth and fifth column with wrapping: >>> A. take ([2, 3, 4], axis =1, mode = wrap ) matrix ([[3, 1, 2], [6, 4, 5], [9, 7, 8]]) Finn Årup Nielsen 39 September 15, 2015
41 Conditionals & concatenation Construct a matrix based on a condition (first input argument) >>> where ( mod (A. flatten (), 2), 1, 0) # Find odd numbers matrix ([[1, 0, 1, 0, 1, 0, 1, 0, 1]]) Horizontal concatenate as in Matlab [ A A ] and note tuple input! >>> B = concatenate ((A, A), axis =1) # Or: >>> B = bmat ( A A ) # Note string input. Or: >>> hstack ((A, A)) matrix ([[1, 2, 3, 1, 2, 3], [4, 5, 6, 4, 5, 6], [7, 8, 9, 7, 8, 9]]) For concatenating rows use vstack(), bmat( A ; A ) or concatenate() Finn Årup Nielsen 40 September 15, 2015
42 Random initialization with random Python with one element at a time with the random module: >>> import random >>> random. random () # Float between 0 and >>> random. gauss (mu =10, sigma =3) Other probability functions: beta, exponential, gamma, lognormal, Pareto, randint, randrange, uniform, Von Mises and Weibull. >>> a = [1, 2, 3, 4]; random. shuffle (a); a [3, 1, 4, 2] Other functions choice, sample, etc. Finn Årup Nielsen 41 September 15, 2015
43 Random initialization with numpy Initialize an array with random numbers between zero and one: >>> import numpy. random >>> numpy. random. random ((2,3)) # Tuple input! array ([[ , , ], [ , , ]]) Standard Gaussian (normal distribution): >>> numpy. random. randn (2, 3) # Individual inp array ([[ , , ], [ , , ]]) >>> N = numpy. random. standard_normal ((2, 3)) # Tuple input! >>> N = numpy. random. normal (0, 1, (2, 3)) Finn Årup Nielsen 42 September 15, 2015
44 Multiplications and divisions... Numpy multiplication and divisions are confusing. >>> from numpy import * >>> A = array ([[1, 2], [3, 4]]) With array the default is elementwise: >>> A * A # * is elementwise, Matlab : A.* A array ([[ 1, 4], [ 9, 16]]) >>> dot (A, A) # dot () is matrix multiplication array ([[ 7, 10], [15, 22]]) Finn Årup Nielsen 43 September 15, 2015
45 ... Multiplications and divisions With matrix the default is matrix multiplication >>> mat (A) * mat (A) # Here * is matrix multiplicatio matrix ([[ 7, 10], [15, 22]]) >>> multiply ( mat ( A), mat ( A)) # multiply is elementwise multip matrix ([[ 1, 4], [ 9, 16]]) >>> dot ( mat (A), mat (A)) # dot () is matrix multiplication matrix ([[ 7, 10], [15, 22]]) >>> mat ( A) / mat ( A) # Division always elementwise matrix ([[1, 1], [1, 1]]) Finn Årup Nielsen 44 September 15, 2015
46 Matrix inversion Ordinary matrix inversion available as inv() in the linalg module of numpy >>> linalg. inv ( mat ([[2, 1], [1, 2]])) matrix ([[ , ], [ , ]]) Pseudo-inverse linalg.pinv() for singular matrices: >>> linalg. pinv ( mat ([[2, 0], [0, 0]])) matrix ([[ 0.5, 0. ], [ 0., 0. ]]) Finn Årup Nielsen 45 September 15, 2015
47 Singular value decomposition svd() function in the linalg module returns by default three argument: >>> from numpy. linalg import svd >>> U, s, V = svd ( mat ([[1, 0, 0], [0, 0, 0]])) gives a 2-by-2 matrix, a 2-vector with singular values and a 3-by-3 matrix: >>> U * diag ( s) * V # Gives not aligned error >>> U, s, V = svd ( mat ([[1, 0, 0], [0, 0, 0]]), full_matrices = False ) >>> U * diag (s) * V # Now ok: V. shape == (2, 3) Note V is transposed compared to Matlab! If only the singular values are required use compute uv argument: >>> s = svd ( mat ([[1, 0, 0], [0, 0, 0]]), compute_uv = False ) Finn Årup Nielsen 46 September 15, 2015
48 Non-negative matrix factorization from numpy import mat, random, multiply def nmf (M, components =5, iterations =5000): """ Factorize non - negative matrix. """ W = mat ( random. rand (M. shape [0], components )) H = mat ( random. rand ( components, M. shape [1])) for n in range ( iterations ): H = multiply (H, (W.T * M) / (W.T * W * H )) W = multiply (W, (M * H.T) / (W * (H * H.T) )) print "%d/%d" % (n, iterations ) return (W, H) Two matrices are returned. Note needs to be set to some appropriate for the dataset. Finn Årup Nielsen 47 September 15, 2015
49 Scipy Finn Årup Nielsen 48 September 15, 2015
50 Scipy Scipy (scientific python) has many submodules: Subpackage Function examples Description cluster vq.keans, vq.vq, hierarchy.dendrogram Clustering algorithms fftpack fft, ifft, fftfreq, convolve Fast Fourier transform, etc. io loadmat Input/output functions optimize fmin, fmin cg, brent Function optimization signal butter, welch Signal processing spatial ConvexHull, Voronoi, distance.cityblock Functions for spatial data stats nanmean, chi2, kendalltau Statistical functions... Finn Årup Nielsen 49 September 15, 2015
51 Optimization with scipy.optimize scipy.optimize contains functions for mathematical function optimization: fmin() is Nelder-Mead simplex algorithm for function optimization without derivatives. >>> import scipy. optimize, math >>> scipy. optimize. fmin ( math.cos, [1]) Optimization terminated successfully. Current function value : Iterations : 19 Function evaluations : 38 array ([ ]) Other optimizers: fmin cg(), leastsq(),... Finn Årup Nielsen 50 September 15, 2015
52 scipy optimization Custom multidimensional function (the Rosenbrock function aka banana function): def rosenbrock ((x, y )): a, b = 1, 100 return (a - x) ** 2 + b * (y - x ** 2) ** 2 Minimum of the function is at (1, 1). Optimize with Scipy and its general minimize function: >>> from scipy. optimize import minimize >>> result = minimize ( rosenbrock, x0 =(0, 0)) >>> result. x array ([ , ]) Here the gradient is estimated numerically. Finn Årup Nielsen 51 September 15, 2015
53 >>> result status : 2 success : False njev : 32 nfev : 140 hess_inv : array ([[ , ], [ , ]]) fun : e -11 x: array ([ , ]) message : Desired error not necessarily achieved due to precision loss. jac : array ([ e -05, e -06]) Not quite there yet, trying Nelder-Mead method with many iterations: >>> minimize ( rosenbrock, x0 =(0, 0), method = Nelder - Mead, tol =0, options ={ maxiter : 1000}) status : 0 nfev : 324 success : True fun : 0.0 x: array ([ 1., 1.]) message : Optimization terminated successfully. nit : 168 Finn Årup Nielsen 52 September 15, 2015
54 Statistics with scipy.stats Showing the χ 2 probablity density function with different degrees of freedom: from pylab import * from scipy.stats import chi2 x = linspace(0.1, 25, 200) for dof in [1, 2, 3, 5, 10, 50]: plot(x, chi2.pdf(x, dof)) Other functions such as missing data mean: >>> x = [1, nan, 3, 4, 5, 6, 7, 8, nan, 10] >>> scipy.stats.stats.nanmean(x) 5.5 Finn Årup Nielsen 53 September 15, 2015
55 Random initialization with scipy.stats 10 discrete distributions and 81 continuous distributions >>> scipy. stats. uniform. rvs ( size =(2, 3)) # Uniform array ([[ , , ], [ , , ]]) >>> scipy. stats. norm. rvs ( size =(2, 3)) # Gaussian array ([[ , , ], [ , , ]]) Finn Årup Nielsen 54 September 15, 2015
56 Kernel density estimation from scipy.stats... from pylab import *; import scipy. stats x0 = array ([1.1, 1.2, 2, 6, 6, 5, 6.3]) x = arange (-3, 10, 0.1) plot (x, scipy. stats. gaussian_kde (x0 ). evaluate (x)) Bandwidth computed via scott s factor...? Estimation in higher dimensions possible Finn Årup Nielsen 55 September 15, 2015
57 ... Kernel density estimation >>> class kde ( scipy. stats. gaussian_kde ):... def covariance_factor ( dummy ): return >>> plot (x, kde (x0 ). evaluate (x)) Defining a new class kde from the gaussian kde class from the scipy.stats.gaussian kde module with Finn Årup Nielsen 56 September 15, 2015
58 Fourier transformation >>> x = asarray ([[1] * 4, [0] * 4] * 100). flatten () What is x? Finn Årup Nielsen 57 September 15, 2015
59 Fourier transformation... >>> x = asarray ([[1] * 4, [0] * 4] * 100). flatten () What is x? >>> x [:20] array ([1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, Zoom in on the first 51 data points of the square wave signal With the pylab module: >>> plot(x[:50], linewidth=2) >>> axis((0, 50, -1, 2)) Finn Årup Nielsen 58 September 15, 2015
60 ... Fourier transformation with FFTPACK Forward Fourier transformation with the fft() from scipy.fftpack module (also loaded with numpy.fft, pylab: >>> abs(fft(x)) Back to time domain with inverse Fourier transformation: >>> from scipy.fftpack import * >>> abs(ifft(fft(x)))[:6].round() array([ 1., 1., 1., 1., 0., 0.]) Other: fft2(), fftn() Finn Årup Nielsen 59 September 15, 2015
61 sklearn aka scikit-learn Finn Årup Nielsen 60 September 15, 2015
62 Machine learning in Python Name KLoC GS-cites Reference SciPy.linalg Statsmodels (Seabold and Perktold, 2010) Scikit-learn (Pedregosa et al., 2011) PyMVPA (Hanke et al., 2009a; Hanke et al., 2009b) Orange (Demšar et al., 2013) Mlpy 75 8 (Albanese et al., 2012) MDP (Zito et al., 2008) PyBrain (Schaul et al., 2010) Pylearn2 61 Bob 31 Gensim (Řehůřek and Sojka, 2010) NLTK (Bird et al., 2009) PyPR? Caffe... (Statistics not completely updated) Finn Årup Nielsen 61 September 15, 2015
63 Scikit-learn/sklearn In sklearn each algorithm is implemented as an object. Here, e.g., non-negative matrix factorization (NMF) with data in a numpy.array called datamatrix: from sklearn. decomposition import NMF decomposer = NMF ( n_components =2) # Instancing an algorithm decomposer. fit ( datamatrix ) # Parameter estimation or K-means clustering algorithm: from sklearn. cluster import KMeans clusterer = KMeans ( n_clusters =2) clusterer. fit ( datamatrix ) Finn Årup Nielsen 62 September 15, 2015
64 sklearn object methods Sklearn algorithm objects shares methods, such as: Name Input Description get params Get parameter set params Parameters Set parameters decision function X fit X, y Estimate model parameters fit predict X Performs clustering and return cluster labels fit transform X, (y) Fit and transform inverse transform Y Opposite operation of transform predict X Estimate output predict proba X score X, y Coefficient of determination transform X Transform data, e.g., through dimensionality reduction Estimated parameters and estimation metadata is available in attributes with trailing underscore, e.g., components_. Finn Årup Nielsen 63 September 15, 2015
65 sklearn object methods The common interface means that machine learning classifiers can be used interchangeably, see Classifier comparison SciKit-learn example. From an example by Gael Varoquaux and Andreas Muller. Finn Årup Nielsen 64 September 15, 2015
66 sklearn example Non-negative matrix factorization with a data matrix called feature_matrix: from sklearn. decomposition import NMF decomposer = NMF ( n_components =2) decomposer. fit ( feature_matrix ) transformed = decomposer. transform ( feature_matrix ) Sizes of data matrix and resulting factoritation matrices: >>> fe ature_ma trix. shape (94, 256) >>> transformed. shape (94, 2) >>> decomposer. components_. shape (2, 256) Finn Årup Nielsen 65 September 15, 2015
67 Pandas Finn Årup Nielsen 66 September 15, 2015
68 Pandas dataframe Index a b c A = yes no ok Table represented in a Pandas DataFrame: >>> import pandas as pd >>> A = pd. DataFrame ([[4, 5, yes ], [6.5, 7, no ], [8, 9, ok ]], index =[2, 3, 6], columns =[ a, b, c ]) >>> A a b c yes no ok Finn Årup Nielsen 67 September 15, 2015
69 Pandas column indexing >>> A.c 2 yes 3 no 6 ok Name : c, dtype : object >>> A[ c ] 2 yes 3 no 6 ok Name : c, dtype : object >>> A.ix [:, c ] 2 yes 3 no 6 ok Name : c, dtype : object Finn Årup Nielsen 68 September 15, 2015
70 Pandas row indexing >>> A.ix [6, :] a 8 b 9 c ok Name : 6, dtype : object >>> A. iloc [2, :] a 8 b 9 c ok Name : 6, dtype : object And an element: >>> A.ix [6, b ] 9 Finn Årup Nielsen 69 September 15, 2015
71 Conditional indexing >>> A a b c yes no ok >>> A.ix [(A.a > 7.0) (A.c == yes ), [ b, c ]] b c 2 5 yes 6 9 ok Finn Årup Nielsen 70 September 15, 2015
72 Pandas join and merge Index a b Index a c A = B = (1) In two Pandas DataFrames: >>> import pandas as pd >>> A = pd. DataFrame ([[4, 5], [6, 7]], index =[1, 2], columns =[ a, b ]) >>> B = pd. DataFrame ([[8, 9], [10, 11]], index =[1, 3], columns =[ a, c ]) Finn Årup Nielsen 71 September 15, 2015
73 Pandas concat and merge >>> pd. concat (( A, B)) # implicit outer join a b c NaN NaN 1 8 NaN NaN 11 >>> pd. concat ((A, B), join = inner ) a Finn Årup Nielsen 72 September 15, 2015
74 Pandas concat and merge >>> A. merge (B, how = inner, left_index =True, right_index = True ) a_x b a_y c >>> A. merge (B, how = outer, left_index =True, right_index = True ) a_x b a_y c NaN NaN 3 NaN NaN >>> A. merge (B, how = left, left_index =True, right_index = True ) a_x b a_y c NaN NaN Finn Årup Nielsen 73 September 15, 2015
75 Descriptive statistics >>> A. describe () # see also.sum,.std,... a b count mean std min % % % max Note the result is only for numerical columns Finn Årup Nielsen 74 September 15, 2015
76 Pandas input and output read_csv Comma-separated values file, works for URLs read_sql Read from SQL database read_excel Microsoft Excel and a number of other formats. Also note that the db.py package can interface between a SQL database and Pandas. Finn Årup Nielsen 75 September 15, 2015
77 Other Pandas datatypes Pandas Series is an annotated vector: >>> pd. Series ([1, 2, 4], index =[ a, b, f ]) a 1 b 2 f 4 dtype : int64 Pandas Panel is a three-dimensional structure: >>> pd. Panel ([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) <class pandas. core. panel. Panel > Dimensions : 2 ( items ) x 2 ( major_axis ) x 2 ( minor_axis ) Items axis : 0 to 1 Major_axis axis : 0 to 1 Minor_axis axis : 0 to 1 Finn Årup Nielsen 76 September 15, 2015
78 Pandas conversion to Numpy >>> A a b c yes no ok >>> A. values array ([[4.0, 5, yes ], [6.5, 7, no ], [8.0, 9, ok ]], dtype = object ) >>> A.ix [:, :2]. values array ([[ 4., 5. ], [ 6.5, 7. ], [ 8., 9. ]]) Finn Årup Nielsen 77 September 15, 2015
79 Cython Finn Årup Nielsen 78 September 15, 2015
80 Calling C et al. You can call C, C++ and Fortran functions from Python: Either with manual wrapping or by using a automated wrapper: SWIG, Boost.Python, CFFI. or by direct calling existing libraries via ctypes You can make C-programs in Python with Cython or Pyrex and calling compiled modules from Python Finn Årup Nielsen 79 September 15, 2015
81 Why calling C et al.? Why calling C et al.? Because you already have code in that language. Because ordinary Python is not fast enough. Finn Årup Nielsen 80 September 15, 2015
82 Why calling C et al.? Why calling C et al.? Because you already have code in that language. Because ordinary Python is not fast enough. Cython useful here! Finn Årup Nielsen 81 September 15, 2015
83 Cython Write a Python file (possibly with extended Cython syntax for static types), compile to C and compile the C. Cython is a fork of Pyrex. Simplest example with compilation of a python file helloworld.py, containing print("hello, World"): $ cython --embed helloworld.py $ gcc -I/usr/include/python2.7 -o helloworld helloworld.c -lpython2.7 $./helloworld More: You can compile to a module instead (callable from Python); you can include static types in the Python code to make it faster (often these files have the extension *.pyx). Finn Årup Nielsen 82 September 15, 2015
84 Calling Cython module from Python hello.pyx Python file with def world (): return " Hello, World " Compile to C and compile to module (here for Linux): $ cython hello.pyx $ gcc -shared -pthread -fpic -fwrapv -O2 -Wall -fno-strict-aliasing \ -I/usr/include/python2.7 -o hello.so hello.c -lpython2.7 Back in Python use the hello module: >>> import hello >>> hello. world () # Call the function in the module Hello, World Finn Årup Nielsen 83 September 15, 2015
85 Other compilation methods For simple Cython modules compiling automagically: >>> import pyximport >>> pyximport. install () ( None, < pyximport. pyximport. PyxImporter object at 0 x7f7dc39f20d >>> import hello >>> hello. world () Hello, World Finn Årup Nielsen 84 September 15, 2015
86 Other compilation methods For simple Cython modules compiling automagically: >>> import pyximport >>> pyximport. install () ( None, < pyximport. pyximport. PyxImporter object at 0 x7f7dc39f20d >>> import hello >>> hello. world () Hello, World Otherwise construct a setup.py file with: from distutils. core import setup from Cython. Build import cythonize setup ( name = Hello, ext_modules = cythonize (" hello. pyx ")) and compile with $ python setup.py build_ext --inplace Finn Årup Nielsen 85 September 15, 2015
87 Optional types... slow.py with plain python def count_ascendings ( alist ): count = 0 for first, second in zip ( alist [: -1], alist [1:]): if first < second : count += 1 return count fast.pyx with integer type (note cdef int count!): def count_ascendings ( alist ): cdef int count = 0 for first, second in zip ( alist [: -1], alist [1:]): if first < second : count += 1 return count Finn Årup Nielsen 86 September 15, 2015
88 ... Optional types Profiling the plain Python and the Cython version: >>> import timeit >>> import pyximport ; pyximport. install () >>> timeit. timeit (" slow. count_ascendings ( range (1000)) ", setup =" import slow ", number =10000) >>> timeit. timeit (" fast. count_ascendings ( range (1000)) ", setup =" import fast ", number =10000) The Cython version without static types yield a timing between the plain Python and the statically typed Cython (1.06 seconds here). Finn Årup Nielsen 87 September 15, 2015
89 Six advices from Hans Petter Langtangen Section Optimization of Python code (Langtangen, 2005, p. 426+) Avoid loops, use NumPy (see also my blog) Avoid prefix in often called functions, i.e., sin instead of math.sin Plain functions run faster than class methods Don t use NumPy for scalar arguments Use xrange instead of range (in Python < 3) if-else is faster than try-except (sometimes!) Finn Årup Nielsen 88 September 15, 2015
90 More information for Matlab Users MATLAB commands in numerical Python (Numpy), Vidar Bronken Gundersen, mathesaurus.sf.net Guide to NumPy (Oliphant, 2006) Videolectures.net: John D. Hunter overview of Matplotlib hunter mat/ Finn Årup Nielsen 89 September 15, 2015
91 Emerging Python modules for big data blaze and dask castra pyspark Finn Årup Nielsen 90 September 15, 2015
92 References References Albanese, D., Visintainer, R., Merler, S., Riccadonna, S., Jurman, G., and Furlanello, C. (2012). mlpy: machine learning Python. ArXiv. Bird, S., Klein, E., and Loper, E. (2009). Natural Language Processing with Python. O Reilly, Sebastopol, California. ISBN The canonical book for the NLTK package for natural language processing in the Python programming language. Corpora, part-of-speech tagging and machine learning classification are among the topics covered. Demšar, J., Curk, T., Erjavec, A., Črt Gorup, Hočevar, T., Milutinovi, M., Možina, M., Polajnar, M., Toplak, M., Stari, A., Štajdohar, M., Umek, L., Žagar, L., Žbontar, J., Žitnik, M., and Zupan, B. (2013). Orange: data mining toolbox in Python. Journal of Machine Learning Research, 14: PMID:. DOI:. WOBIB:. Hanke, M., Halchenko, Y. O., Sederberg, P. B., Hanson, S. J., Haxby, J. V., and Pollmann, S. (2009a). PyMVPA: a Python toolbox for multivariate pattern analysis of fmri data. Neuroinformatics, 7(1): PMID:. DOI: /s y. WOBIB:. Hanke, M., Halchenko, Y. O., Sederberg, P. B., Olivetti, E., Frund, I., Rieger, J. W., Herrmann, C. S., Haxby, J. V., Hanson, S. J., and Pollmann, S. (2009b). PyMVPA: a unifying approach to the analysis of neuroscientific data. Frontiers in neuroinformatics, 3:3. PMID: DOI: /neuro WOBIB:. Langtangen, H. P. (2005). Python Scripting for Computational Science, volume 3 of Texts in Computational Science and Engineering. Springer. ISBN Lee, D. D. and Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Leen, T. K., Dietterich, T. G., and Tresp, V., editors, Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference, pages , Cambridge, Massachusetts. MIT Press. CiteSeer: lee00algorithms.html. Finn Årup Nielsen 91 September 15, 2015
93 References Oliphant, T. E. (2006). Guide to NumPy. Trelgol Publishing. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Édouard Duchesnay (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12: PMID:. DOI:. WOBIB:. Schaul, T., Bayer, J., Wierstra, D., Sun, Y., Felder, M., Sehnke, F., Rückstie, T., and Schmidhuber, J. (2010). Pybrain. Journal of Machine Learning Research, 11: PMID:. DOI:. WOBIB:. Presents the PyBrain Python machine learning package. Seabold, S. and Perktold, J. (2010). Statsmodels: econometric and statistical modeling with python. In Proceedings of the 9th Python in Science Conference. Description of the statsmodels package for Python. Segaran, T. (2007). Programming Collective Intelligence. O Reilly, Sebastopol, California. Zito, T., Wilbert, N., Wiskott, L., and Berkes, P. (2008). Modular toolkit for data processing (mdp): a python data processing framework. Frontiers in Neuroinformatics, 2:8. PMID:. DOI: /neuro WOBIB:. Řehůřek, R. and Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of LREC 2010 workshop New Challenges for NLP Frameworks. Finn Årup Nielsen 92 September 15, 2015
Orange: Data Mining Toolbox in Python
Journal of Machine Learning Research 14 (2013) 2349-2353 Submitted 3/13; Published 8/13 Orange: Data Mining Toolbox in Python Janez Demšar Tomaž Curk Aleš Erjavec Črt Gorup Tomaž Hočevar Mitar Milutinovič
Part VI. Scientific Computing in Python
Part VI Scientific Computing in Python Compact Course @ GRS, June 03-07, 2013 80 More on Maths Module math Constants pi and e Functions that operate on int and float All return values float ceil (x) floor
Intro to scientific programming (with Python) Pietro Berkes, Brandeis University
Intro to scientific programming (with Python) Pietro Berkes, Brandeis University Next 4 lessons: Outline Scientific programming: best practices Classical learning (Hoepfield network) Probabilistic learning
Scientific Programming in Python
UCSD March 9, 2009 What is Python? Python in a very high level (scripting) language which has gained widespread popularity in recent years. It is: What is Python? Python in a very high level (scripting)
CRASH COURSE PYTHON. Het begint met een idee
CRASH COURSE PYTHON nr. Het begint met een idee This talk Not a programming course For data analysts, who want to learn Python For optimizers, who are fed up with Matlab 2 Python Scripting language expensive
CUDAMat: a CUDA-based matrix class for Python
Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 November 25, 2009 UTML TR 2009 004 CUDAMat: a CUDA-based
Exercise 0. Although Python(x,y) comes already with a great variety of scientic Python packages, we might have to install additional dependencies:
Exercise 0 Deadline: None Computer Setup Windows Download Python(x,y) via http://code.google.com/p/pythonxy/wiki/downloads and install it. Make sure that before installation the installer does not complain
(!' ) "' # "*# "!(!' +,
MATLAB is a numeric computation software for engineering and scientific calculations. The name MATLAB stands for MATRIX LABORATORY. MATLAB is primarily a tool for matrix computations. It was developed
Tentative NumPy Tutorial
Tentative NumPy Tutorial This tutorial is unfinished. The original authors were not NumPy experts nor native English speakers so it needs reviewing. Please do not hesitate to click the edit button. You
Simulation Tools. Python for MATLAB Users I. Claus Führer. Automn 2009. Claus Führer Simulation Tools Automn 2009 1 / 65
Simulation Tools Python for MATLAB Users I Claus Führer Automn 2009 Claus Führer Simulation Tools Automn 2009 1 / 65 1 Preface 2 Python vs Other Languages 3 Examples and Demo 4 Python Basics Basic Operations
Linear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka [email protected] http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
Introduction to Matlab
Introduction to Matlab Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. [email protected] Course Objective This course provides
CIS 192: Lecture 13 Scientific Computing and Unit Testing
CIS 192: Lecture 13 Scientific Computing and Unit Testing Lili Dworkin University of Pennsylvania Scientific Computing I Python is really popular in the scientific and statistical computing world I Why?
Advanced Functions and Modules
Advanced Functions and Modules CB2-101 Introduction to Scientific Computing November 19, 2015 Emidio Capriotti http://biofold.org/ Institute for Mathematical Modeling of Biological Systems Department of
Unlocking the True Value of Hadoop with Open Data Science
Unlocking the True Value of Hadoop with Open Data Science Kristopher Overholt Solution Architect Big Data Tech 2016 MinneAnalytics June 7, 2016 Overview Overview of Open Data Science Python and the Big
Car Insurance. Prvák, Tomi, Havri
Car Insurance Prvák, Tomi, Havri Sumo report - expectations Sumo report - reality Bc. Jan Tomášek Deeper look into data set Column approach Reminder What the hell is this competition about??? Attributes
6.170 Tutorial 3 - Ruby Basics
6.170 Tutorial 3 - Ruby Basics Prerequisites 1. Have Ruby installed on your computer a. If you use Mac/Linux, Ruby should already be preinstalled on your machine. b. If you have a Windows Machine, you
Assignment 2: Option Pricing and the Black-Scholes formula The University of British Columbia Science One CS 2015-2016 Instructor: Michael Gelbart
Assignment 2: Option Pricing and the Black-Scholes formula The University of British Columbia Science One CS 2015-2016 Instructor: Michael Gelbart Overview Due Thursday, November 12th at 11:59pm Last updated
Introduction to Python
1 Daniel Lucio March 2016 Creator of Python https://en.wikipedia.org/wiki/guido_van_rossum 2 Python Timeline Implementation Started v1.0 v1.6 v2.1 v2.3 v2.5 v3.0 v3.1 v3.2 v3.4 1980 1991 1997 2004 2010
Programming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
2+2 Just type and press enter and the answer comes up ans = 4
Demonstration Red text = commands entered in the command window Black text = Matlab responses Blue text = comments 2+2 Just type and press enter and the answer comes up 4 sin(4)^2.5728 The elementary functions
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
Python programming guide for Earth Scientists. Maarten J. Waterloo and Vincent E.A. Post
Python programming guide for Earth Scientists Maarten J. Waterloo and Vincent E.A. Post Amsterdam Critical Zone Hydrology group September 2015 Cover page: Meteorological tower in an abandoned agricultural
Chapter 7: Additional Topics
Chapter 7: Additional Topics In this chapter we ll briefly cover selected advanced topics in fortran programming. All the topics come in handy to add extra functionality to programs, but the feature you
Machine Learning in Python with scikit-learn. O Reilly Webcast Aug. 2014
Machine Learning in Python with scikit-learn O Reilly Webcast Aug. 2014 Outline Machine Learning refresher scikit-learn How the project is structured Some improvements released in 0.15 Ongoing work for
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation
Sources: On the Web: Slides will be available on:
C programming Introduction The basics of algorithms Structure of a C code, compilation step Constant, variable type, variable scope Expression and operators: assignment, arithmetic operators, comparison,
Analysis of System Performance IN2072 Chapter M Matlab Tutorial
Chair for Network Architectures and Services Prof. Carle Department of Computer Science TU München Analysis of System Performance IN2072 Chapter M Matlab Tutorial Dr. Alexander Klein Prof. Dr.-Ing. Georg
Introduction to Python for Text Analysis
Introduction to Python for Text Analysis Jennifer Pan Institute for Quantitative Social Science Harvard University (Political Science Methods Workshop, February 21 2014) *Much credit to Andy Hall and Learning
Beginner s Matlab Tutorial
Christopher Lum [email protected] Introduction Beginner s Matlab Tutorial This document is designed to act as a tutorial for an individual who has had no prior experience with Matlab. For any questions
The Image Deblurring Problem
page 1 Chapter 1 The Image Deblurring Problem You cannot depend on your eyes when your imagination is out of focus. Mark Twain When we use a camera, we want the recorded image to be a faithful representation
Python and Cython. a dynamic language. installing Cython factorization again. working with numpy
1 2 Python and 3 MCS 275 Lecture 39 Programming Tools and File Management Jan Verschelde, 19 April 2010 Python and 1 2 3 The Development Cycle compilation versus interpretation Traditional build cycle:
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Python for Chemistry in 21 days
minutes Python for Chemistry in 21 days Dr. Noel O'Boyle Dr. John Mitchell and Prof. Peter Murray-Rust UCC Talk, Sept 2005 Available at http://www-mitchell.ch.cam.ac.uk/noel/ Introduction This talk will
13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.
3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms
AMATH 352 Lecture 3 MATLAB Tutorial Starting MATLAB Entering Variables
AMATH 352 Lecture 3 MATLAB Tutorial MATLAB (short for MATrix LABoratory) is a very useful piece of software for numerical analysis. It provides an environment for computation and the visualization. Learning
Gamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
THE NAS KERNEL BENCHMARK PROGRAM
THE NAS KERNEL BENCHMARK PROGRAM David H. Bailey and John T. Barton Numerical Aerodynamic Simulations Systems Division NASA Ames Research Center June 13, 1986 SUMMARY A benchmark test program that measures
How long is the vector? >> length(x) >> d=size(x) % What are the entries in the matrix d?
MATLAB : A TUTORIAL 1. Creating vectors..................................... 2 2. Evaluating functions y = f(x), manipulating vectors. 4 3. Plotting............................................ 5 4. Miscellaneous
1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
Python for Scientific Computing. http://bender.astro.sunysb.edu/classes/python-science
http://bender.astro.sunysb.edu/classes/python-science Course Goals Simply: to learn how to use python to do Numerical analysis Data analysis Plotting and visualizations Symbol mathematics Write applications...
Simple big data, in Python. Gaël Varoquaux
Simple big data, in Python Gaël Varoquaux Simple big data, in Python Gaël Varoquaux This is a lie! Please allow me to introduce myself Physicist gone bad I m a man of wealth and taste I ve been around
The NumPy array: a structure for efficient numerical computation
The NumPy array: a structure for efficient numerical computation Stéfan van der Walt, Stellenbosch University South Africa S. Chris Colbert, Enthought USA Gael Varoquaux, INRIA Saclay France arxiv:1102.1523v1
Improving Credit Card Fraud Detection with Calibrated Probabilities
Improving Credit Card Fraud Detection with Calibrated Probabilities Alejandro Correa Bahnsen, Aleksandar Stojanovic, Djamila Aouada and Björn Ottersten Interdisciplinary Centre for Security, Reliability
Optimizing and interfacing with Cython. Konrad HINSEN Centre de Biophysique Moléculaire (Orléans) and Synchrotron Soleil (St Aubin)
Optimizing and interfacing with Cython Konrad HINSEN Centre de Biophysique Moléculaire (Orléans) and Synchrotron Soleil (St Aubin) Extension modules Python permits modules to be written in C. Such modules
Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics
INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree
Programming Using Python
Introduction to Computation and Programming Using Python Revised and Expanded Edition John V. Guttag The MIT Press Cambridge, Massachusetts London, England CONTENTS PREFACE xiii ACKNOWLEDGMENTS xv 1 GETTING
u = [ 2 4 5] has one row with three components (a 3 v = [2 4 5] has three rows separated by semicolons (a 3 w = 2:5 generates the row vector w = [ 2 3
MATLAB Tutorial You need a small numb e r of basic commands to start using MATLAB. This short tutorial describes those fundamental commands. You need to create vectors and matrices, to change them, and
MACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
Big Data Paradigms in Python
Big Data Paradigms in Python San Diego Data Science and R Users Group January 2014 Kevin Davenport! http://kldavenport.com [email protected] @KevinLDavenport Thank you to our sponsors: Setting up
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
Classification of Electrocardiogram Anomalies
Classification of Electrocardiogram Anomalies Karthik Ravichandran, David Chiasson, Kunle Oyedele Stanford University Abstract Electrocardiography time series are a popular non-invasive tool used to classify
SmartArrays and Java Frequently Asked Questions
SmartArrays and Java Frequently Asked Questions What are SmartArrays? A SmartArray is an intelligent multidimensional array of data. Intelligent means that it has built-in knowledge of how to perform operations
William E. Hart Carl Laird Jean-Paul Watson David L. Woodruff. Pyomo Optimization. Modeling in Python. ^ Springer
William E Hart Carl Laird Jean-Paul Watson David L Woodruff Pyomo Optimization Modeling in Python ^ Springer Contents 1 Introduction 1 11 Mathematical Modeling 1 12 Modeling Languages for Optimization
APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
Basic Concepts in Matlab
Basic Concepts in Matlab Michael G. Kay Fitts Dept. of Industrial and Systems Engineering North Carolina State University Raleigh, NC 769-7906, USA [email protected] September 00 Contents. The Matlab Environment.
Computational Mathematics with Python
Boolean Arrays Classes Computational Mathematics with Python Basics Olivier Verdier and Claus Führer 2009-03-24 Olivier Verdier and Claus Führer Computational Mathematics with Python 2009-03-24 1 / 40
We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share.
LING115 Lecture Note Session #4 Python (1) 1. Introduction As we have seen in previous sessions, we can use Linux shell commands to do simple text processing. We now know, for example, how to count words.
MPI Hands-On List of the exercises
MPI Hands-On List of the exercises 1 MPI Hands-On Exercise 1: MPI Environment.... 2 2 MPI Hands-On Exercise 2: Ping-pong...3 3 MPI Hands-On Exercise 3: Collective communications and reductions... 5 4 MPI
Computational Mathematics with Python
Computational Mathematics with Python Basics Claus Führer, Jan Erik Solem, Olivier Verdier Spring 2010 Claus Führer, Jan Erik Solem, Olivier Verdier Computational Mathematics with Python Spring 2010 1
Performance Monitoring using Pecos Release 0.1
Performance Monitoring using Pecos Release 0.1 Apr 19, 2016 Contents 1 Overview 1 2 Installation 1 3 Simple example 2 4 Time series data 5 5 Translation dictionary 5 6 Time filter 6 7 Quality control tests
Structural Health Monitoring Tools (SHMTools)
Structural Health Monitoring Tools (SHMTools) Getting Started LANL/UCSD Engineering Institute LA-CC-14-046 c Copyright 2014, Los Alamos National Security, LLC All rights reserved. May 30, 2014 Contents
CME 193: Introduction to Scientific Python Lecture 8: Unit testing, more modules, wrap up
CME 193: Introduction to Scientific Python Lecture 8: Unit testing, more modules, wrap up Sven Schmit stanford.edu/~schmit/cme193 8: Unit testing, more modules, wrap up 8-1 Contents Unit testing More modules
K-Means Clustering Tutorial
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
Financial Econometrics MFE MATLAB Introduction. Kevin Sheppard University of Oxford
Financial Econometrics MFE MATLAB Introduction Kevin Sheppard University of Oxford October 21, 2013 2007-2013 Kevin Sheppard 2 Contents Introduction i 1 Getting Started 1 2 Basic Input and Operators 5
PGR Computing Programming Skills
PGR Computing Programming Skills Dr. I. Hawke 2008 1 Introduction The purpose of computing is to do something faster, more efficiently and more reliably than you could as a human do it. One obvious point
Modeling with Python
H Modeling with Python In this appendix a brief description of the Python programming language will be given plus a brief introduction to the Antimony reaction network format and libroadrunner. Python
Introduction to Python
Introduction to Python Sophia Bethany Coban Problem Solving By Computer March 26, 2014 Introduction to Python Python is a general-purpose, high-level programming language. It offers readable codes, and
Package MDM. February 19, 2015
Type Package Title Multinomial Diversity Model Version 1.3 Date 2013-06-28 Package MDM February 19, 2015 Author Glenn De'ath ; Code for mdm was adapted from multinom in the nnet package
Introduction Our choice Example Problem Final slide :-) Python + FEM. Introduction to SFE. Robert Cimrman
Python + FEM Introduction to SFE Robert Cimrman Department of Mechanics & New Technologies Research Centre University of West Bohemia Plzeň, Czech Republic April 3, 2007, Plzeň 1/22 Outline 1 Introduction
Data Visualization. Christopher Simpkins [email protected]
Data Visualization Christopher Simpkins [email protected] Data Visualization Data visualization is an activity in the exploratory data analysis process in which we try to figure out what story
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Python. Python. 1 Python. M.Ulvrova, L.Pouilloux (ENS LYON) Informatique L3 Automne 2011 1 / 25
Python 1 Python M.Ulvrova, L.Pouilloux (ENS LYON) Informatique L3 Automne 2011 1 / 25 Python makes you fly M.Ulvrova, L.Pouilloux (ENS LYON) Informatique L3 Automne 2011 2 / 25 Let s start ipython vs python
Lecture 5: Singular Value Decomposition SVD (1)
EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system
G563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P.
SQL databases An introduction AMP: Apache, mysql, PHP This installations installs the Apache webserver, the PHP scripting language, and the mysql database on your computer: Apache: runs in the background
Python programming Testing
Python programming Testing Finn Årup Nielsen DTU Compute Technical University of Denmark September 8, 2014 Overview Testing frameworks: unittest, nose, py.test, doctest Coverage Testing of numerical computations
1 Topic. 2 Scilab. 2.1 What is Scilab?
1 Topic Data Mining with Scilab. I know the name "Scilab" for a long time (http://www.scilab.org/en). For me, it is a tool for numerical analysis. It seemed not interesting in the context of the statistical
1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
Exercise 4 Learning Python language fundamentals
Exercise 4 Learning Python language fundamentals Work with numbers Python can be used as a powerful calculator. Practicing math calculations in Python will help you not only perform these tasks, but also
WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math
Textbook Correlation WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math Following Directions Unit FIRST QUARTER AND SECOND QUARTER Logic Unit
Computational Optical Imaging - Optique Numerique. -- Deconvolution --
Computational Optical Imaging - Optique Numerique -- Deconvolution -- Winter 2014 Ivo Ihrke Deconvolution Ivo Ihrke Outline Deconvolution Theory example 1D deconvolution Fourier method Algebraic method
Python Loops and String Manipulation
WEEK TWO Python Loops and String Manipulation Last week, we showed you some basic Python programming and gave you some intriguing problems to solve. But it is hard to do anything really exciting until
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
Computational Foundations of Cognitive Science
Computational Foundations of Cognitive Science Lecture 15: Convolutions and Kernels Frank Keller School of Informatics University of Edinburgh [email protected] February 23, 2010 Frank Keller Computational
Package PRISMA. February 15, 2013
Package PRISMA February 15, 2013 Type Package Title Protocol Inspection and State Machine Analysis Version 0.1-0 Date 2012-10-02 Depends R (>= 2.10), Matrix, gplots, ggplot2 Author Tammo Krueger, Nicole
CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace
Java Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
Package MixGHD. June 26, 2015
Type Package Package MixGHD June 26, 2015 Title Model Based Clustering, Classification and Discriminant Analysis Using the Mixture of Generalized Hyperbolic Distributions Version 1.7 Date 2015-6-15 Author
MATLAB Tutorial. Chapter 6. Writing and calling functions
MATLAB Tutorial Chapter 6. Writing and calling functions In this chapter we discuss how to structure a program with multiple source code files. First, an explanation of how code files work in MATLAB is
Computational Mathematics with Python
Numerical Analysis, Lund University, 2011 1 Computational Mathematics with Python Chapter 1: Basics Numerical Analysis, Lund University Claus Führer, Jan Erik Solem, Olivier Verdier, Tony Stillfjord Spring
SQL Server Array Library 2010-11 László Dobos, Alexander S. Szalay
SQL Server Array Library 2010-11 László Dobos, Alexander S. Szalay The Johns Hopkins University, Department of Physics and Astronomy Eötvös University, Department of Physics of Complex Systems http://voservices.net/sqlarray,
Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication
Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication Thomas Reilly Data Physics Corporation 1741 Technology Drive, Suite 260 San Jose, CA 95110 (408) 216-8440 This paper
Programming Languages & Tools
4 Programming Languages & Tools Almost any programming language one is familiar with can be used for computational work (despite the fact that some people believe strongly that their own favorite programming
Matrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2016 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues
Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the
Chemical and Biological Engineering Calculations using Python 3. Jeffrey J. Heys
Chemical and Biological Engineering Calculations using Python 3 Jeffrey J. Heys Copyright c 2014 Jeffrey Heys All rights reserved. This version is being made available at no cost. Please acknowledge access
