It's alive!

Machine Learning writes your code

Dominic Elm

Uri Shaked

@elmd_

@UriShaked

How Everything

Started

@UriShaked

@elmd_

@UriShaked

ngVikings 2019

Angular Connect 2018

How to AI in JS? - Assim Hussain

Thank You Assim!

@elmd_

@UriShaked

ngVikings 2019

@elmd_

@UriShaked

ngVikings 2019

Given a function signature, can we create a model that will predict the body of that function?

RESEARCH QUESTION

@elmd_

@UriShaked

ngVikings 2019

Dominic Elm

@elmd_

@UriShaked

Who ARe we?

Software Engineer 

Trainer & Consultant

@thoughtram

@stackblitz

Uri Shaked

Google Developer Expert

Community Organizer

Machine Learning 101

@elmd_

@UriShaked

ngVikings 2019

@elmd_

@UriShaked

ngVikings 2019

@elmd_

@UriShaked

email = 'How to be a Millionaire in 4 weeks'

if (email contains 'Millionaire')
  markAsSpam(email)
else if (email contains '...')
  ...
else if (email contains '...')
  ...
data = [
  ('How to be a Millionaire in 4 weeks', SPAM),
  ('...', NO_SPAM),
  ('...', NO_SPAM),
  ('...', SPAM),
  ...
]


for example in data:
  classify data
  optimize

Traditional Program

ML Program

ngVikings 2019

@elmd_

@UriShaked

Neural Networks???

ngVikings 2019

@elmd_

@UriShaked

...

120

4

24.4

square meters

#bedrooms

0.2

0.1

120 x 0.2

4 x 0.1

+

ngVikings 2019

@elmd_

@UriShaked

...

120

4

24.4

square meters

#bedrooms

0.2

0.1

120 x 0.2

4 x 0.1

+

15

9.4

ERROR

ngVikings 2019

@elmd_

@UriShaked

...

120

4

12.2

square meters

#bedrooms

0.1

0.05

120 x 0.1

4 x 0.05

+

15

-2.8

ERROR

ngVikings 2019

@elmd_

@UriShaked

Input

Hidden

Output

ngVikings 2019

@elmd_

@UriShaked

HOW DO WE PREDICT FUNCTION BODIES?

ngVikings 2019

MODEL

@elmd_

@UriShaked

ngVikings 2019

function greet(name: string)

?

function greet(name: string) {
  const prefix = name.length < 10 ? 'Hi' : 'Hello';
  return prefix + name;
}

@elmd_

@UriShaked

ngVikings 2019

{
function greet(name: string) {
  const prefix = name.length < 10 ? 'Hi' : 'Hello';
  return prefix + name;
}

MODEL

function greet(name: string)

@elmd_

@UriShaked

ngVikings 2019

const
function greet(name: string) {
  const prefix = name.length < 10 ? 'Hi' : 'Hello';
  return prefix + name;
}

MODEL

function greet(name: string)

@elmd_

@UriShaked

ngVikings 2019

prefix
function greet(name: string) {
  const prefix = name.length < 10 ? 'Hi' : 'Hello';
  return prefix + name;
}

MODEL

function greet(name: string)

@elmd_

@UriShaked

ngVikings 2019

Gather Data

Clean Data

Choose Model

Training

Evaluation

1

2

3

4

5

ML Approach

@elmd_

@UriShaked

ngVikings 2019

@elmd_

@UriShaked

Gathering Data

ngVikings 2019

1

How can we quickly gather a lot of function examples?

Look at open source projects on GitHub

@elmd_

@UriShaked

Gathering Data

ngVikings 2019

1

We filtered only TypeScript files and extracted 324,280 TypeScript functions and collected them in a huge JSON file.

Using Google BigQuery we can run an SQL query to fetch all the code on GitHub in under a minute!

@elmd_

@UriShaked

ngVikings 2019

CLEANING Data

2

function greet(name: string) {
  const prefix = name.length < 10 ? 'Hi' : 'Hello';
  return prefix + name;
}

@elmd_

@UriShaked

ngVikings 2019

CLEANING Data

2

2

Prepare model inputs

1

Preprocess raw dataset

function greet(name: string) {
  const prefix = name.length < 10 ? 'Hi' : 'Hello';
  return prefix + name;
}
function greet(name: string)

Split signature from body

{
  const prefix = name.length < 10 ? 'Hi' : 'Hello';
  return prefix + name;
}

@elmd_

@UriShaked

ngVikings 2019

CLEANING Data

2

2

Prepare model inputs

1

Preprocess raw dataset

function greet($arg0$: string)

Rename function parameters

{
  const prefix = $arg0$.length < 10 ? 'Hi' : 'Hello';
  return prefix + $arg0$;
}

@elmd_

@UriShaked

ngVikings 2019

CLEANING Data

2

2

Prepare model inputs

1

Preprocess raw dataset

function greet($arg0$: string)

Rename identifiers and literals

{
  const id0 = $arg0$.id1 < 2 ? '3' : '4';
  return id0 + $arg0$;
}

@elmd_

@UriShaked

ngVikings 2019

CLEANING Data

2

2

Prepare model inputs

1

Preprocess raw dataset

function greet ( $arg0$ : string )

Add spaces

{
  const id0 = $arg0$ . id1 < 2 ? '3' : '4' ;
  return id0 + $arg0$ ;
}

@elmd_

@UriShaked

ngVikings 2019

CLEANING Data

2

2

Prepare model inputs

1

Preprocess raw dataset

function greet ( $arg0$ : string )

Add START and END symbols

START {
  const id0 = $arg0$ . id1 < 2 ? '3' : '4' ;
  return id0 + $arg0$ ;
} END

@elmd_

@UriShaked

ngVikings 2019

CLEANING DATA

2

2

Prepare model inputs

1

Preprocess raw dataset

  • Tokenize
  • Text to Sequence
  • Add padding
  • Create inputs and outputs

@elmd_

@UriShaked

ngVikings 2019

CLEANING DATA

2

2

Prepare model inputs

1

Preprocess raw dataset

Tokenization = Chopping the function body into pieces (tokens)

function greet ( $arg0$ : string )
START {
  const id0 = $arg0$ . id1 < 2 ? '3' : '4' ;
  return id0 + $arg0$ ;
} END
dict = {
  'function': 1,
  'greet': 2,
  '(': 3,
  '$arg0$': 4,
  ':': 5,
  'string': 6,
  ')': 7,
  'START': 8,
  '{': 9,
  ...
}

@elmd_

@UriShaked

ngVikings 2019

CLEANING DATA

2

2

Prepare model inputs

1

Preprocess raw dataset

Text to Sequence

function greet ( $arg0$ : string )
[1, 2, 3, 4, 5, 6, 7]

Add Padding

[0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7]
[1, 2, 3, 4, 5, 6, 7]
function isPrime ( $arg0$ : number )
[1, 13, 3, 4, 5, 23, 7]

@elmd_

@UriShaked

ngVikings 2019

CLEANING DATA

2

2

Prepare model inputs

1

Preprocess raw dataset

Create Model Inputs and Outputs (X1, X2 and Y)

Inputs

Ouput

function greet ( $arg0$ : string )
START
{
function greet ( $arg0$ : string )
START {
const
function greet ( $arg0$ : string )
START { const
id0

Signature (X1)

Sequence (X2)

Next Token(Y)

@elmd_

@UriShaked

ngVikings 2019

CLEANING DATA

2

2

Prepare model inputs

1

Preprocess raw dataset

Encode Output

{

Next Token(Y)

[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]

One Hot Encoding

9
string
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
6

@elmd_

@UriShaked

ngVikings 2019

Choose MOdel

3

Look at Similar Problems

@elmd_

@UriShaked

ngVikings 2019

Choose MOdel

3

Machine Translation

@elmd_

@UriShaked

ngVikings 2019

Choose MOdel

3

Using Tensorflow

@elmd_

@UriShaked

ngVikings 2019

Choose MOdel

3

@elmd_

@UriShaked

ngVikings 2019

Training the Model

4

Google Colab

@elmd_

@UriShaked

ngVikings 2019

Training the Model

4

Google Cloud TPU (TensorFlow Processing Unit)

@elmd_

@UriShaked

ngVikings 2019

Evaluation

5

Evaluating the performance of the model

DEMO TIME

@elmd_

@UriShaked

ngVikings 2019

TakeAways

  • Take advantage of the cloud 
  • Look for solutions to similar problems
  • Data Processing makes a big chunk of the work

@elmd_

@UriShaked

ngVikings 2019

Thank You

@elmd_

@UriShaked

ngVikings 2019

ENJoy your lunch

🍱

@elmd_

@UriShaked

ngVikings 2019

Backlog

@elmd_

@UriShaked

ngVikings 2019

@elmd_

@UriShaked

AI

ML

Deep Learning

ngVikings 2019

@elmd_

@UriShaked

Artificial Intelligence

Machine Learning

Deep Learning

?

?

?

ngVikings 2019

@elmd_

@UriShaked

Artificial Intelligence

...is the science of making things smart.

ngVikings 2019

@elmd_

@UriShaked

Machine Learning

...an approach to achieve AI.

Learning from data and recognizing patterns, rather then being specifically programmed.

ngVikings 2019

@elmd_

@UriShaked

Deep Learning

...specific technique for implementing ML.

Typically we use Neural Networks to implement ML and achieve AI.

ngVikings 2019

@elmd_

@UriShaked

Rule-based Systems

ngVikings 2019

vs.

Learning from Data and recognizing patterns

@elmd_

@UriShaked

ngVikings 2019

Choose MOdel

3

Encoder

Decoder

Sequence to Sequence Model (Seq2Seq)

Sequence

Sequence

It's Alive! Machine Learning Writes Your Code

By Dominic E.

It's Alive! Machine Learning Writes Your Code

  • 2,451