280 likes | 503 Views
Amis Consulting LLP. 1977-1981 : Research in Combustion/Fluids. 1983-1991 : Scientific computing image processing 1992-1997 : UK Healthcare / Imperial College 1997-2003 : Dotcom Boom (and bust) !!! 2003- : Financial Systems .
E N D
Amis Consulting LLP 1977-1981 : Research in Combustion/Fluids. 1983-1991 : Scientific computing image processing 1992-1997 : UK Healthcare / Imperial College 1997-2003 : Dotcom Boom (and bust) !!! 2003- : Financial Systems. Currently involved in High Performance Computing and (of course) Big Data and well as all the other stuff.
Worked with a variety of technologies • Languages (in anger) : Fortran / C / Ada / Perl / Python / Lisp / Java / PHP / Groovy / NodeJS … our GOTO languages remain C and Perl but ??? • Back-ends:Unix (not just Linux) and Windows (so some .NET) • Databases : Both relational and the NoSQL (Redis, Mongo Neo4J) • Moving into the cloud: AWS: Map-reduce, Redshift, Google App Server
Then along came R … • At Kings in late-2000’s • Interest was in HPC (mainly CUDA) applied to financial systems. • Started using Matlab but was looking for a similar type package for personal/company usage . • Gnu/Octave and R both fitted the bill, R won – at the time. • Looked at (and impressed by) Python
History Gang of “four”: Jeff Bezanson, Virah Shah Stefan Karpinski, Alan Edelman Started at MIT in 2010 First release February, 2012 Still actively maintained by G4 MIT using Julia in courses (on youtube)
What happened to Ada? Designed 1977/83 for US DoDin order to supercede 100’s of languages DoD used. Mandated its use in 1987. Dropped the mandate in 1997. Still used in air traffic control systems such as iFacts, GNATS. Nearest meetup group is in Stockholm.
Runners and Riders Current field: • Runners: Matlab, R, Python • Riders: C/C++, Java • Outsiders: Scala, Clojure • Non-starter: Perl
What makes a good Data Science Language? (1) • Be a general purpose language with a sizable user community and an array of general purpose libraries, including good GUI libraries, networking and web frameworks. • Be free, open-source and platform independent. • Be fast and efficient. • Have a good, well-designed library for scientific computing, including non-uniform random number generation and linear algebra. • Have a strong type system, and be statically typed with good compile-time type checking and type safety. • Have reasonable type inference. • Have a REPL for interactive use
What makes a good Data Science Language? (2) • Have good tool support - including build tools, doc tools, testing tools, and an intelligent IDE. • Have excellent support for functional programming, including support for immutability and immutable data structures and “monadic” design • Allow imperative programming for occasions where it makes sense. • Be designed with concurrency and parallelism in mind, having excellent language and library support for building really scalable concurrent and parallel applications. • Have excellent built-in data capabilities. • Have comprehensive math and statistical routines.
Comparison with Matlab • Julia syntax is similar to Matlab but its construction is purposely very different. • Matlab has only one data structure (the matrix) and is optimised for matrix operations. Other native computations can be very slow. • The focus on matrices lead to some important differences in MATLAB’s design compared to GP programming languages such as Julia. • Julia uses similar matrix syntax to Matlab but also incorporates list comprehensions.
Comparison with R • Origins as open-source clone of S+. • Still seen as a “statistical” DSL. • R is single threaded and hard to speed up. • Introduced the data frame structure which is also present in Julia • Julia also has an RDatasets package. • R has very good graphic and data visualisation support. • Julia has a Google group: julia-stats. • Julia can call R modules using the Rif package.
Comparison with Python • Python now seen by many as the Data Science language. • Strength lies in its community support. • Modules such as numpy, scipy, matplotlib and pandas are very powerful. • Speed up using PyPy • Mature frameworks such as Django • Julia approach is co-operation not confrontation via the PyCalland also IJuliaIPython
What makes Julia special? • It is written in Julia, apart from a small core, and the code is available to look at. • The designers are data scientists and not tied to companies such as Google (Go) or Mozilla (Rust). • It has been designed for parallelism / distributed computation • It takes every opportunity to cooperate rather than confront. • Julia intends to combine the best from MATLAB, R and Python into one language that is to be consistent, well designed and fast.
Special features • Easy installation • JIT compilation • Built-in package manager • Coroutines and green threads • Multiple dispatch • Dynamic type system • Meta programming with Lisp-like macros • Call C functions directly • Call Python functions: (PyCall) • Best-of-breed C and Fortran libraries • Unicode support
The ones to read … • Parallel computing • http:// julia.readthedocs/en/latest/manual/parallel-computing • Metaprogramming • http://docs.julialang.org/en/latest/manual/metaprogramming • Networking and streams • http://docs.julialang.org/en/latest/manual/networking-and-streams • Calling C and Fortran code • http:// julia.readthedocs.org/en/latest/manual/calling-c-and-fortran-code
Modules and packages • Julia has its own built-in package manager • There are (now) 250+ packages. • These include: • Statistics • Graphics • System tools • Database • Web and Cloud • Simulation • Its quite easy to add your own package (via GITHub)
100+ contributors, 1000+ mailing list subscribers, 175+ packages AWS, ArgParse, BSplines, Benchmark, BinDeps, BioSeq, BloomFilters, Cairo, Calculus, Calendar, Cartesian, Catalan, ChainedVectors, ChemicalKinetics, Clang, Clp, ClusterManagers, Clustering, Codecs, CoinMP, Color, Compose, ContinuedFractions, Cpp, Cubature, Curl, DICOM, DWARF, DataFrames, DataStructures, Datetime, Debug, DecisionTree, Devectorize, DictUtils, DictViews, DiscreteFactor, Distance,Distributions, DualNumbers, ELF, Elliptic, Example, ExpressionUtils, FITSIO, FactCheck, FastaIO, FastaRead, FileFind, FunctionalCollections, FunctionalUtils, GLFW, GLM, GLPK, GLUT, GSL,GZip, Gadfly, Gaston, GeoIP, GeometricMCMC,GetC, GoogleCharts, Graphs, Grid, Gtk, Gurobi, HDF5, HDFS, HTTP, HTTPClient, Hadamard, HttpCommon, HttpParser, HttpServer,HypothesisTests, ICU, ImageView,Images, ImmutableArrays, IniFile, Iterators, Ito, JSON, JudyDicts, JuliaWebRepl, KLDivergence, LIBSVM, Languages, LazySequences, LibCURL, LibExpat, LinProgGLPK, Loss, MAT, MATLAB, MCMC, MDCT, MLBase,MNIST, MarketTechnicals, MathProg, MathProgBase, Meddle, Memoize, Meshes, Metis, MixedModels,Monads, Mongo, Mongrel2, Morsel, Mustache, NHST, NIfTI, NLopt, Named, NetCDF, NumericExtensions, NumericFunctors, ODBC, ODE, OpenGL, OpenSSL, Optim, Options, PLX, PTools, PatternDispatch, Phylo,Phylogenetics, Polynomial, Profile, ProgressMeter, ProjectTemplate, PyCall, PyPlot, PySide, Quandl,QuickCheck, RDatasets, REPL, RNGTest, RPMmd, RandomMatrices, Readline, Regression, Resampling, Rif, Rmath, RobustStats, Roots, SDE, SDL, SVM, SemidefiniteProgramming, SimJulia, SimpleMCMC, Sims,Sodium, Soundex, Sqlite, Stats, StrPack, Sundials, SymPy, TOML, Terminals, TextAnalysis, TextWrap, TimeModels, TimeSeries, Tk, TopicModels, TradingInstrument,Trie, URLParse, UTF16, Units, ValueDispatch, WAV, WebSockets, Winston, YAML, ZMQ, Zlib
Julia does have graphics! Winston(Standard 2D graphics) Gadfly(Like 'gg2plot') Gaston(Uses gnuplot as graphics engine) PyPlot(Uses IPython/matplotlib.py) Plotly(http://plot.ly/api)
Simulated Stock Market julia> plothist(randn(100000), 100) julia> plot(cumsum(randn(10000)))
What’s missing? • Cached package loading • At present all modules are compiled on the fly • Preloading would reduce startup times • Better database connectivity • Uses ODBC • Simple d/b support via SQLite • No native Oracle, MySQL or Postgresql • More comprehensive NoSQL support • Packages for Mongo, Redis. • JSON package helps with CouchDB, Neo4j
Familiar syntax for Matlab/Octave users function randmatstat (t; n=10) v = zeros(t) w = zeros(t) for i = 1:t a = randn(n,n) b = randn(n,n) c = randn(n,n) d = randn(n,n) P = [a b c d] Q = [a b; c d] v[i] = trace((P'*P)^4) w[i] = trace((Q'*Q)^4) end std(v)/mean(v), std(w)/mean(w) end
Simulating an Asian Option S0 = 100; # Spot price K = 102; # Strike price r = 0.05; # Risk free rate q = 0.0; # Dividend yield v = 0.2; # Volatility tma = 0.25; # Time to maturity T = 100; # Number of time steps dt = tma/T; # Time increment S = zeros(Float64,T); S[1] = S0; dW = randn(T)*sqrt(dt); [ S[t] = S[t-1] * (1 + (r - q - 0.5*v*v)*dt + v*dW[t] + 0.5*v*v*dW[t]*dW[t]) for t=2:T ] x = linspace(1, T, length(T)); p = FramedPlot(title = "Random Walk, drift 5%, volatility 2%") add(p, Curve(x,S,color="red")) display(p)
Going further … • Start with the julia.org website • Install Julia and read the documentation • Look at the training material • http://julialang.org/teaching/ • Try the Julia Studio • Read/subscribe to Google-groups sites • julia-users, julia-stats, julia-opt, julia-dev • Join the LJuUG • http://www.meetup.com/London-Julia-User-Group
My Benchmarks Results for 100,000 runs of 100 steps, (c ~ 0.73 s) Samsung RV711 laptop with an i5 processor and 4Gb RAM running Centos 6.5 (Final)