Movie casts and actor filmographies from tmdb.org

code
Julia
HTTP
JSON
tmdb.org
Author

Douglas Bates

Published

December 9, 2022

Movie casts and actor filmographies from tmdb.org

In the previous post I described downloading and parsing files containing summary data from files.tmdb.org, then saving the results as tables in Arrow format.

Here I will show how to access the cast of a movie and the filmography of an actor using the tmdb.org public API.

These steps are preparation for a developing a method of linking actors through appearances in the same movie, which is the underlying relationship in the parlor game Six Degrees of Kevin Bacon.

As described in the Wikipedia entry

Six Degrees of Kevin Bacon or Bacon’s Law is a parlor game where players challenge each other to arbitrarily choose an actor and then connect them to another actor via a film that both actors have appeared in together, repeating this process to try to find the shortest path that ultimately leads to prolific American actor Kevin Bacon. It rests on the assumption that anyone involved in the Hollywood film industry can be linked through their film roles to Bacon within six steps. The game’s name is a reference to “six degrees of separation”, a concept that posits that any two people on Earth are six or fewer acquaintance links apart.

Automating the process of assigning “Bacon numbers”, the distance through appearances in the same movie from Kevin Bacon to an actor, can quickly become a massive undertaking, putting a premium on efficiency in time and storage on the intermediate steps.

Load packages

First load the packages to be used

Code
using Arrow          # Apache Arrow storage format for Tables
using Dates          # time/date formats
using HTTP           # HTTP communication for clients or servers
using JSON3          # read and write JSON with a slick interface for structs
using TypedTables    # low-overhead Table wrapper
using URIs           # construct or deconstruct URI strings

API key

Queries on the tmdb.org database (as opposed to the summary file downloads described previously) require a (free) tmdb account and an API Key.

This key is usually printed as an unsigned 128-bit integer written as 32 hexadecimal digits. That is, it looks like

rand(UInt128)
0xa6bb36f3ef7832ce4a637a7c3f309ea0

but written without the 0x prefix.

Assume that this API key is stored in an environment variable TMDP_API_KEY, which is accessed as ENV["TMDB_API_KEY"]. If you prefer not to use an environment variable you can instead assign this value in your Julia startup file.

Requests to the HTTP server at tmdb.org require the API key as a query parameter in the URI so we build a base query URI as

const quri = URI(;       # named arguments only
    scheme="https",
    host="api.themoviedb.org",
    path="/3",           # URIs for version 3 of the API begin with 3 
    query=["api_key" => ENV["TMDB_API_KEY"]]
);

Because I don’t want to reveal my API key, I will suppress printing of URIs containing this key.

Form of a person’s movie credits.

The tmdb.org API documentation for a person’s movie credits shows how to structure the URI for the HTTP request and what the fields in the response are.

We will only be interested in credits as a cast member. The fields of interest to us are id, release_date, original_title, and popularity, all referring to the movie, and character, which refers to the actor/movie combination.

function getcastcredits(
    id::Integer;                # person id
    iob::IOBuffer=IOBuffer(),   # for streaming the response body
    selector::Function=@Select( # select and modify fields of each row
        id=Int32($id),
        release_date,
        original_title,
        character,
        popularity=Float32($popularity),
    ),
    template=(id=Int32(0), release_date="", original_title="", character="", popularity=0.0f0),
)
    take!(iob)                  # clear the buffer
    try
        HTTP.get(
            joinpath(quri, string("person/", id, "/movie_credits"));
            response_stream=iob,
        )
    catch
        take!(iob)              # clear the buffer which may be non-empty
        return Table(typeof(template)[])
    end
    return selector.(Table(JSON3.read(take!(iob)).cast))
end
getcastcredits (generic function with 1 method)

The body of the HTTP response is streamed through an IOBuffer which, by default, is a freshly allocated IOBuffer. The take! method for an IOBuffer returns the contents of the stream and resets the stream to be empty. However, it does not free the memory allocated for the buffer, which means that subsequent uses of the buffer reuse that memory and only allocate additional memory if needed.

To take advantage of this we pass a pre-declared (but not const because fields in the struct must be mutable) IOBuffer.

Assign an IOBuffer named iob, which can be re-used in our HTTP requests.

iob = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)

The try/catch block wrapping the call to HTTP.get is to catch errors from the request and return an empty Table of the same type as a valid response, instead of many, many lines of traceback from the failed request. For example, there is no person with an id of 15 so the HTTP.get request will fail but, in this case, the function fails gracefully.

getcastcredits(15; iob)
Table with 5 columns and 0 rows:
     id  release_date  original_title  character  popularity
   ┌────────────────────────────────────────────────────────

In the function call, id is a person id. In the response table, the contents of the id column are movie ids.

Next get Kevin Bacon’s cast credits

kbcredits = getcastcredits(4724; iob)
Table with 5 columns and 108 rows:
      id     release_date  original_title        character             ⋯
    ┌───────────────────────────────────────────────────────────────────
 1  │ 617    1998-03-20    Wild Things           Sergeant Ray Duquet…  ⋯
 2  │ 819    1996-10-18    Sleepers              Sean Nokes            ⋯
 3  │ 1788   1984-02-17    Footloose             Ren McCormack         ⋯
 4  │ 21032  1995-12-22    Balto                 Balto (voice)         ⋯
 5  │ 21332  1986-02-14    Quicksilver           Jack Casey            ⋯
 6  │ 11601  1999-09-10    Stir of Echoes        Tom Witzky            ⋯
 7  │ 11835  2007-08-31    Death Sentence        Nick Hume             ⋯
 8  │ 25405  2009-09-21    Taking Chance         LtCol Mike Strobl     ⋯
 9  │ 9362   1990-01-19    Tremors               Valentine McKee       ⋯
 10 │ 12714  1988-02-05    She's Having a Baby   Jake Briggs           ⋯
 11 │ 9692   2004-12-24    The Woodsman          Walter                ⋯
 12 │ 9729   2005-10-07    Where the Truth Lies  Lanny                 ⋯
 13 │ 15006  2007-09-01    Rails & Ties          Tom Stark             ⋯
 14 │ 61772  2008-01-01    Connected: The Powe…  Kevin Bacon           ⋯
 15 │ 63437  1997-08-02    Telling Lies in Ame…  Billy Magic           ⋯
 16 │ 45225  1989-09-15    The Big Picture       Nick Chapman          ⋯
 17 │ 46094  1994-01-07    The Air Up There      Jimmy Dolan           ⋯
 18 │ 17956  1991-02-22    He Said, She Said     Dan Hanson            ⋯
 19 │ 49792  1991-02-01    Queens Logic          Dennis                ⋯
 20 │ 49855  1987-08-28    End of the Line       Everett               ⋯
 21 │ 19209  1987-07-10    White Water Summer    Vic                   ⋯
 22 │ 76168  1983-03-06    The Demon Murder Ca…  Kenny Miller          ⋯
 23 │ 57585  2011-05-17    Elephant White        Jimmy the Brit        ⋯
 ⋮  │   ⋮         ⋮                 ⋮                     ⋮            ⋱

From v1.8.0 onward Julia provides a convenient destructuring syntax for Structs, NamedTuples and argument lists. The key is the optional ; separator between the positional arguments and the named arguments in the function call. Because the ; separator is used here to indicate that only named arguments follow, passing iob without a name is equivalent to passing iob=iob. In other words, the default name of an argument after the ; is the name of the argument itself in the context of the caller.

Filtering the table

The first couple of dozen movies in Kevin Bacon’s filmography are of the sort we would expect, except for the movie (#14 in this list) where his character is named “Kevin Bacon”.

There are more unusual entries toward the end of the list

last(kbcredits, 23)
Table with 5 columns and 23 rows:
      id      release_date  original_title        character             ⋯
    ┌────────────────────────────────────────────────────────────────────
 1  │ 586482  2019-03-01    Holly Near: Singing…  Self                  ⋯
 2  │ 691677  2011-06-05    X-Men: First Class …  Self - Sebastian Sh…  ⋯
 3  │ 726209                Leave the World Beh…                        ⋯
 4  │ 10944   2003-10-22    In the Cut            John Graham           ⋯
 5  │ 708353  2020-03-24    Find Your Groove      Self                  ⋯
 6  │ 950707  2022-03-20    Step Into… The Movi…  Self                  ⋯
 7  │ 319070  2015-01-25    Drunk Stoned Brilli…  Self / Actor          ⋯
 8  │ 459956  2017-06-20    Story of a Girl       Michael               ⋯
 9  │ 458506  2017-07-08    Tour de Pharmacy      Ditmer Klerken        ⋯
 10 │ 280180                Beverly Hills Cop: …                        ⋯
 11 │ 141498  1996-01-01    Lost Moon: The Triu…  Himself               ⋯
 12 │ 462323  1986-04-07    The Tender Age        Probation Officer (…  ⋯
 13 │ 774752  2022-11-25    The Guardians of th…  Kevin Bacon           ⋯
 14 │ 658142  1998-10-13    The Yearbook: An An…  Himself - 'Chip Dil…  ⋯
 15 │ 745713  2008-08-13    Animal House: The I…  Self                  ⋯
 16 │ 657148  2019-12-18    Live in Front of a …  Pinky Peterson        ⋯
 17 │ 881931  2021-09-09    Clint Eastwood: A C…  Self                  ⋯
 18 │ 8469    1978-07-27    Animal House          Chip Diller           ⋯
 19 │ 42146   1981-09-25    Only When I Laugh     Don                   ⋯
 20 │ 714841  2020-06-11    Picture Show: A Tri…  Self                  ⋯
 21 │ 20794   2001-11-23    Novocaine             Actor Lance Phelps    ⋯
 22 │ 54663   1979-10-05    Starting Over         Husband - Young Cou…  ⋯
 23 │ 37757   2010-05-04    Never Sleep Again: …  Self (archive foota…  ⋯

These include movies with no release date or, sometimes but not in this collection, movies with a release date in the future. Also, a list like this can include movies with the character either not named or not credited or named Self or Himself.

We will declare that such movies are not acceptable for our purposes and remove them using filter. Generally the first argument to filter is a function, which, in this case, we pass as an anonymous function written as a Do-Block

filter!(kbcredits) do r
    (isempty(r.release_date) || string(today())  r.release_date) && return false
    r.character == "Kevin Bacon" && return false
    occursin(r"^\s*$|^Self|^Himself|\(uncredited\)$", r.character) && return false
    return true
end
Table with 5 columns and 79 rows:
      id     release_date  original_title        character             ⋯
    ┌───────────────────────────────────────────────────────────────────
 1  │ 617    1998-03-20    Wild Things           Sergeant Ray Duquet…  ⋯
 2  │ 819    1996-10-18    Sleepers              Sean Nokes            ⋯
 3  │ 1788   1984-02-17    Footloose             Ren McCormack         ⋯
 4  │ 21032  1995-12-22    Balto                 Balto (voice)         ⋯
 5  │ 21332  1986-02-14    Quicksilver           Jack Casey            ⋯
 6  │ 11601  1999-09-10    Stir of Echoes        Tom Witzky            ⋯
 7  │ 11835  2007-08-31    Death Sentence        Nick Hume             ⋯
 8  │ 25405  2009-09-21    Taking Chance         LtCol Mike Strobl     ⋯
 9  │ 9362   1990-01-19    Tremors               Valentine McKee       ⋯
 10 │ 12714  1988-02-05    She's Having a Baby   Jake Briggs           ⋯
 11 │ 9692   2004-12-24    The Woodsman          Walter                ⋯
 12 │ 9729   2005-10-07    Where the Truth Lies  Lanny                 ⋯
 13 │ 15006  2007-09-01    Rails & Ties          Tom Stark             ⋯
 14 │ 63437  1997-08-02    Telling Lies in Ame…  Billy Magic           ⋯
 15 │ 45225  1989-09-15    The Big Picture       Nick Chapman          ⋯
 16 │ 46094  1994-01-07    The Air Up There      Jimmy Dolan           ⋯
 17 │ 17956  1991-02-22    He Said, She Said     Dan Hanson            ⋯
 18 │ 49792  1991-02-01    Queens Logic          Dennis                ⋯
 19 │ 49855  1987-08-28    End of the Line       Everett               ⋯
 20 │ 19209  1987-07-10    White Water Summer    Vic                   ⋯
 21 │ 76168  1983-03-06    The Demon Murder Ca…  Kenny Miller          ⋯
 22 │ 57585  2011-05-17    Elephant White        Jimmy the Brit        ⋯
 23 │ 94671  2013-07-25    Jayne Mansfield's C…  Carroll Caldwell      ⋯
 ⋮  │   ⋮         ⋮                 ⋮                     ⋮            ⋱

Here we use the mutating form of filter, named filter!, which modifies the table in place. The ! at the end of the name is an indication that this is a mutating function. This is merely a naming convention, designed to alert the user that this function may change one or more of its arguments. The ! at the end of the name has no syntactic significance.

last(kbcredits, 23)
Table with 5 columns and 23 rows:
      id      release_date  original_title        character             ⋯
    ┌────────────────────────────────────────────────────────────────────
 1  │ 881     1992-12-11    A Few Good Men        Capt. Jack Ross       ⋯
 2  │ 13776   1982-04-02    Diner                 Timothy Fenwick Jr.   ⋯
 3  │ 45132   2010-11-26    Super                 Jacques               ⋯
 4  │ 17908   2000-01-12    My Dog Skip           Jack Morris           ⋯
 5  │ 58462   2005-01-24    Loverboy              Marty                 ⋯
 6  │ 131010  1997-06-16    Destination Anywhere  Mike                  ⋯
 7  │ 388399  2016-12-12    Patriots Day          Richard DesLauriers   ⋯
 8  │ 462318  1979-12-15    The Gift              Teddy                 ⋯
 9  │ 261023  2015-09-04    Black Mass            FBI Agent Charles M…  ⋯
 10 │ 44004   1980-02-08    Hero at Large         2nd Teenager          ⋯
 11 │ 747803  2022-09-02    One Way               Fred Sr.              ⋯
 12 │ 843889  2022-06-12    Space Oddity          Jeff McAllister       ⋯
 13 │ 2609    1987-11-26    Planes, Trains and …  Taxi Racer            ⋯
 14 │ 4488    1980-05-09    Friday the 13th       Jack Burrell          ⋯
 15 │ 50646   2011-07-29    Crazy, Stupid, Love.  David Lindhagen       ⋯
 16 │ 10944   2003-10-22    In the Cut            John Graham           ⋯
 17 │ 459956  2017-06-20    Story of a Girl       Michael               ⋯
 18 │ 458506  2017-07-08    Tour de Pharmacy      Ditmer Klerken        ⋯
 19 │ 657148  2019-12-18    Live in Front of a …  Pinky Peterson        ⋯
 20 │ 8469    1978-07-27    Animal House          Chip Diller           ⋯
 21 │ 42146   1981-09-25    Only When I Laugh     Don                   ⋯
 22 │ 20794   2001-11-23    Novocaine             Actor Lance Phelps    ⋯
 23 │ 54663   1979-10-05    Starting Over         Husband - Young Cou…  ⋯

This code chunk shows creating a regular expression by prepending an r to a string - a so-called string macro call.

Saving the results

The movies in kbcredits have a “Bacon number” of 1, by definition. Because in later code we will want to check if a movie has already been assigned a Bacon number, we save the information on these movies in a Dict, which is a key-value dictionary, where the key is the movie id. The keys in the dictionary are hashed, making for quick lookups. We will augment the information in each row with its depth and the id of the actor who caused the movie to be added.

In the next stage of additions we reverse the role of person and movie and obtain the cast for each of the movies that have been added in this round. Information on these people will be added to a Dict. To keep these two Dicts associated and to record the identity of the actor from which the Dicts are derived we declare a CastChain struct

struct CastChain
    moviedict::Dict{
        Int32,
        NamedTuple{
            (:id, :release, :original_title, :lv, :pop, :addedby, :character),
            Tuple{Int32, Date, String, Int8, Float32, Int32, String}
        }
    }
    persondict::Dict{
        Int32,
        NamedTuple{
            (:id, :name, :lv, :pop, :addedby, :character),
            Tuple{Int32, String, Int8, Float32, Int32, String}
        }
    }
end

and generics addmovies! and addpersons! that have methods for this struct.

function addmovies!(ch::CastChain)
    iob = IOBuffer()
    prd = ch.persondict
    mvd = ch.moviedict
    plv = maximum(getproperty.(values(prd), :lv))  # maximum level of a person
    lv = plv + one(Int8)                           # level to be assigned to movies
    for p in values(prd)
        p.lv == plv || continue                    # skip people added in earlier rounds
        try
            HTTP.get(joinpath(quri, string("person/", p.id, "/movie_credits")), response_stream=iob)
        catch
            throw(ArgumentError("failure to get movie credits for person $(p.id)"))
        end
        recs = filter(JSON3.read(take!(iob)).cast) do r
            haskey(mvd, r.id) && return false      # skip if movie was already entered
            haskey(r, :release_date) || return false  # there are a few that don't have release_date
            (isempty(r.release_date) || string(today())  r.release_date) && return false
            haskey(r, :character) || return false
            (isnothing(r.character) || r.character == p.name) && return false
            occursin(r"^\s*$|^Self|^Himself|^Herself|\(uncredited\)$", r.character) && return false
            return true
        end
        addedby = p.id
        for r in recs
            mvd[r.id] = (;
                id=r.id,
                release=Date(r.release_date),
                original_title=r.original_title,
                lv,
                pop=r.popularity,
                addedby,
                character=r.character,
            )
        end
    end
    return ch
end
addmovies! (generic function with 1 method)
function addpersons!(ch::CastChain)
    iob = IOBuffer()
    prd = ch.persondict
    mvd = ch.moviedict
    lv = maximum(getproperty.(values(mvd), :lv))
    for m in values(mvd)
        m.lv == lv || continue
        try
            HTTP.get(joinpath(quri, string("movie/", m.id, "/credits")), response_stream=iob)
        catch
            throw(ArgumentError("failure to get credits for movie $(m.id)"))
        end
        recs = filter(JSON3.read(take!(iob)).cast) do r
            haskey(prd, r.id) && return false
            (isnothing(r.character) || r.character == r.name) && return false
            occursin(r"^\s*$|^Self|^Himself|^Herself|\(uncredited\)$", r.character) && return false
            return true
        end
        addedby = m.id
        for r in recs
            prd[r.id] = (; id=r.id, name=r.name, lv, pop=r.popularity, addedby, character=r.character)
        end
    end
    return ch
end
addpersons! (generic function with 1 method)

The constructor takes a person id (i.e. 4724 for Kevin Bacon), creates the structure and populates the level-1 movies and persons.

function CastChain(id::Integer)
    iob = IOBuffer()
    id = Int32(id)
    try
        HTTP.get(joinpath(quri, "person/$id"); response_stream=iob)
    catch
        throw(ArgumentError("unable to retrieve person details for id = $id"))
    end
    pr = JSON3.read(take!(iob))  # details of this person - we only use the name
    return addpersons!(
        addmovies!(
            CastChain(
                fieldtype(CastChain, 1)(),   # an empty Dict of the correct type
                Dict(
                    id => (;                 # a Dict with a single entry
                        id,
                        pr.name,
                        lv=Int8(0),
                        pop=Float32(pr.popularity),
                        addedby=Int32(0),
                        character=""
                    ),
                ),
            ),
        ),
    )
end
CastChain

The level-1 information for Kevin Bacon, which is the movies in which he has appeared (and that satisfy our criteria) and his fellow cast members in those movies, is created as

@time kbch = CastChain(4724);
  2.400300 seconds (148.17 k allocations: 24.157 MiB, 2.51% compilation time)

On the day that I ran this version it returned 79 movies and about 1850 actors who have a Bacon number of 1.

(length(kbch.moviedict), length(kbch.persondict))
(79, 1849)
l1ptbl = Table(collect(values(kbch.persondict))) 
Table with 6 columns and 1849 rows:
      id       name              lv  pop     addedby  character
    ┌───────────────────────────────────────────────────────────────────────────
 1  │ 14888    Bruce McGill      1   21.99   8469     Daniel Simpson Day
 2  │ 7144     Michael Flynn     1   3.34    1788     Policeman
 3  │ 9140     James Faulkner    1   8.58    49538    Swiss Bank Manager
 4  │ 60876    Walter Breaux     1   0.6     820      Vernon Bundy
 5  │ 1847933  Mat Langford      1   0.6     121950   Lester
 6  │ 951993   Mary Klug         1   2.245   261023   Mom Bulger
 7  │ 51582    Ned Vaughn        1   3.405   568      CAPCOM 2
 8  │ 1207720  Olga Barbato      1   0.6     131010   Bella
 9  │ 128629   Crystal Reed      1   11.996  50646    Amy Johnson
 10 │ 2876     Matt Dillon       1   22.736  58462    Mark
 11 │ 2447500  Danny Bohnen      1   0.692   747803   Oleg
 12 │ 2057258  Tom Maier         1   0.6     45225    Building Manager
 13 │ 4135     Robert Redford    1   18.675  464655   Reader - Declaration of I…
 14 │ 56152    Kari Wuhrer       1   15.938  13641    Correspondent
 15 │ 67773    Steve Martin      1   21.55   2609     Neal Page
 16 │ 1470474  Krista Marie Yu   1   1.51    257345   Tasha
 17 │ 1370     Brad Dourif       1   21.746  8438     Byron Stamphill
 18 │ 95982    Danae Nason       1   2.199   261023   McGuire's Secretary
 19 │ 1089410  Rebecca White     1   0.6     617      Policewoman #2
 20 │ 1133460  Paul T. Taylor    1   1.38    45132    Frank Sr.
 21 │ 7420     Harvey Fierstein  1   3.288   76168    The Demon (voice)
 22 │ 21708    Tomas Milian      1   8.374   820      Leopoldo
 23 │ 1043304  Andy Milder       1   4.635   568      GUIDO White
 ⋮  │    ⋮            ⋮          ⋮     ⋮        ⋮                 ⋮
sort!(l1ptbl; by=getproperty(:id))
Table with 6 columns and 1849 rows:
      id   name                lv  pop     addedby  character
    ┌───────────────────────────────────────────────────────────────────────────
 1  │ 20   Elizabeth Perkins   1   17.115  17956    Lorraine "Lorie" Bryer
 2  │ 31   Tom Hanks           1   96.986  574379   Narrator
 3  │ 33   Gary Sinise         1   16.466  574379   Ernie Pyle
 4  │ 64   Gary Oldman         1   32.179  8438     Glenn
 5  │ 85   Johnny Depp         1   26.459  261023   James 'Whitey' Bulger
 6  │ 103  Mark Ruffalo        1   28.859  10944    Giovanni A. Malloy
 7  │ 109  Elijah Wood         1   16.778  574379   Corp. Wilfred Hanson / Capt…
 8  │ 114  Orlando Bloom       1   15.738  458506   JuJu Pepe
 9  │ 133  Peter Sarsgaard     1   16.463  261023   Brian Halloran
 10 │ 192  Morgan Freeman      1   73.802  464655   Reader - Declaration of Ind…
 11 │ 213  Gerry Robert Byrne  1   1.16    131010   Bar Patron #2
 12 │ 228  Ed Harris           1   36.464  568      Gene Kranz
 13 │ 237  Aidan Devine        1   3.857   41952    Reporter
 14 │ 287  Brad Pitt           1   48.921  574379   Sgt. Bill Mauldin
 15 │ 335  Michael Shannon     1   40.914  9692     Rosen
 16 │ 342  Eugene Byrd         1   7.868   819      Rizzo
 17 │ 350  Laura Linney        1   16.805  322      Annabeth Markum
 18 │ 380  Robert De Niro      1   64.375  819      Father Bobby
 19 │ 382  Bob Hoskins         1   8.338   21032    Boris the Goose (voice)
 20 │ 418  Robert Patrick      1   19.357  94671    Jimbo Caldwell
 21 │ 425  Dan John Miller     1   0.84    26820    Mickey
 22 │ 429  Lucas Till          1   16.653  49538    Alex Summers / Havok
 23 │ 450  Mike Colter         1   11.571  25405    MGySgt Demetry
 ⋮  │  ⋮           ⋮           ⋮     ⋮        ⋮                  ⋮
sort!(l1ptbl; by=getproperty(:addedby))
Table with 6 columns and 1849 rows:
      id      name                lv  pop     addedby  character
    ┌───────────────────────────────────────────────────────────────────────────
 1  │ 4724    Kevin Bacon         0   29.145  0        
 2  │ 350     Laura Linney        1   16.805  322      Annabeth Markum
 3  │ 504     Tim Robbins         1   19.331  322      Dave Boyle
 4  │ 2228    Sean Penn           1   12.458  322      Jimmy Markum
 5  │ 4728    Kevin Chapman       1   7.015   322      Val Savage
 6  │ 4729    Tom Guiry           1   9.299   322      Brendan Harris
 7  │ 4730    Emmy Rossum         1   11.579  322      Katie Markum
 8  │ 4731    Andrew Mackin       1   0.6     322      John O'Shea
 9  │ 4732    Adam Nelson         1   0.692   322      Nick Savage
 10 │ 4733    Robert Wahlberg     1   6.359   322      Kevin Savage
 11 │ 4734    Jenny O'Hara        1   7.048   322      Esther Harris
 12 │ 4735    John Doman          1   8.005   322      Driver
 13 │ 4736    Cameron Bowen       1   1.387   322      Young Dave
 14 │ 4737    Jason Kelly         1   0.84    322      Young Jimmy
 15 │ 4738    Connor Paolo        1   4.644   322      Young Sean
 16 │ 4739    T. Bruce Page       1   0.6     322      Jimmy's Father
 17 │ 4740    Miles Herter        1   0.994   322      Sean's Father
 18 │ 4741    Cayden Boyd         1   3.251   322      Michael Boyle
 19 │ 4742    Tori Davis          1   1.102   322      Lauren Devine
 20 │ 4743    Jonathan Togo       1   6.139   322      Pete
 21 │ 71552   Ari Graynor         1   9.23    322      Eve Pigeon
 22 │ 97267   Susan Willis        1   3.869   322      Mrs. Prior
 23 │ 130735  José Ramón Rosario  1   2.154   322      Lt. Friel (as Jose Ramon…
 ⋮  │   ⋮             ⋮           ⋮     ⋮        ⋮                 ⋮

Early in the process a single movie, like

kbch.moviedict[322]
(id = 322, release = Date("2003-01-01"), original_title = "Mystic River", lv = 1, pop = 28.921f0, addedby = 4724, character = "Sean Devine")

can cause many actors to be added to this table.

Higher-level Bacon numbers

It takes only a few more seconds to add the movies in which the level-1 actors have appeared (via addmovies!(kbch)), producing about 24,000 level-2 movies (i.e. movies without Kevin Bacon in the cast but with a cast member who appeared in a movie with Kevin Bacon). The next stage of adding any cast members in these movies who are not already in the persondict takes about 15 to 20 minutes, not because it is compute intensive but because access to the API gets throttled when you are making many HTTP requests in quick succession.

The result is over 175,000 actors with a Bacon number of 2.

I saved the values in the persondict and moviedict as Arrow tables.

l2chain = CastChain(
    Dict(r.id => r for r in Table(Arrow.Table("./kbmovies.arrow"))),
    Dict(r.id => r for r in Table(Arrow.Table("./kbpersons.arrow")))
);
sort!(Table(collect(values(l2chain.persondict))); by=getproperty(:id))
Table with 6 columns and 178745 rows:
      id  name                lv  pop     addedby  character
    ┌──────────────────────────────────────────────────────────────────────────
 1  │ 1   George Lucas        2   12.179  306      Disappointed Man
 2  │ 2   Mark Hamill         2   29.562  23738    Void (voice)
 3  │ 3   Harrison Ford       2   35.51   109410   Branch Rickey
 4  │ 4   Carrie Fisher       2   5.444   278427   Angela as Mon Mothma (voice)
 5  │ 5   Peter Cushing       2   13.677  26680    SS Commander
 6  │ 6   Anthony Daniels     2   4.244   348350   Tak
 7  │ 7   Andrew Stanton      2   13.095  127380   Crush (voice)
 8  │ 8   Lee Unkrich         2   2.051   12       Additional Voices  (voice)
 9  │ 10  Bob Peterson        2   2.909   127380   Mr. Ray (voice)
 10 │ 11  David Reynolds      2   2.107   12278    Ernie
 11 │ 12  Alexander Gould     2   7.957   127380   Passenger Carl (voice)
 12 │ 13  Albert Brooks       2   11.123  127380   Marlin (voice)
 13 │ 14  Ellen DeGeneres     2   5.079   127380   Dory (voice)
 14 │ 18  Brad Garrett        2   11.973  127380   Bloat (voice)
 15 │ 19  Allison Janney      2   21.146  429473   Lou
 16 │ 20  Elizabeth Perkins   1   17.115  17956    Lorraine "Lorie" Bryer
 17 │ 22  Barry Humphries     2   6.786   12       Bruce (voice)
 18 │ 23  Bill Hunter         2   6.993   6972     Skipper (Qantas Sloop)
 19 │ 29  Steve Tisch         2   2.692   188222   Board Member
 20 │ 31  Tom Hanks           1   96.986  574379   Narrator
 21 │ 32  Robin Wright        2   16.604  13791    Melanie McGowan
 22 │ 33  Gary Sinise         1   16.466  574379   Ernie Pyle
 23 │ 34  Mykelti Williamson  2   13.31   666219   Truesdale
 ⋮  │ ⋮           ⋮           ⋮     ⋮        ⋮                  ⋮

We can use these two tables to track from Björk (person id = 47) to Kevin Bacon in two steps.

l2chain.persondict[47]
(id = 47, name = "Björk", lv = 2, pop = 4.426f0, addedby = 16, character = "Selma Jezkova")
l2chain.moviedict[16]
(id = 16, release = Date("2000-06-30"), original_title = "Dancer in the Dark", lv = 2, pop = 15.257f0, addedby = 6758, character = "Detective")
l2chain.persondict[6758]
(id = 6758, name = "John Randolph Jones", lv = 1, pop = 1.694f0, addedby = 2609, character = "Cab Dispatcher")
l2chain.moviedict[2609]
(id = 2609, release = Date("1987-11-26"), original_title = "Planes, Trains and Automobiles", lv = 1, pop = 22.595f0, addedby = 4724, character = "Taxi Racer")

And, in case you had forgotten,

l2chain.persondict[4724]
(id = 4724, name = "Kevin Bacon", lv = 0, pop = 29.145f0, addedby = 0, character = "")

So Björk appeared as “Selma Jezkova” in the movie “Dancer in the Dark”, released in 2000, in which John Randolph Jones appeared as “Detective”. He also appeared as “Cab Dispatcher” in the 1987 release “Planes, Trains and Automobiles”, in which Kevin Bacon appeared as “Taxi Racer”.

Similarly, Rachel Weisz appeared as “Dr. Lily Sinclair” in the 1996 movie “Chain Reaction” in which Eddie Bo Smith, Jr. appeared as “Yusef Reed”. He also appeared as “Motel Security Guard” in the 2001 release “Novocaine” in which Kevin Back appeared as “Actor Lance Phelps”.

It is unlikely in a real game of “Six Degrees of Kevin Bacon”, without access to a database like this, that the connecting characters would be those with such minor roles.

Conclusion

This exercise was mostly to show HTTP requests with query strings and more Julia programming idioms. Quite a bit of the actual code is devoted to somewhat minor details but that is often the case. I have said that the most valuable character trait for a programmer is unbounded pessimism because you spend so much of your time thinking “What can go wrong here?”, and then being surprised when something you didn’t think of goes wrong.

Putting on my statistician’s hat for a moment, consider the number of actors at each level in the person table.

counts = zeros(Int, 3)
for l in Table(collect(values(l2chain.persondict))).lv
    counts[l + 1] += 1
end
counts
3-element Vector{Int64}:
      1
   1848
 176896

If we plot these counts on a logarithmic scale, as in Figure 1

Code
using AlgebraOfGraphics
using CairoMakie
CairoMakie.activate!(; type="svg")
Code
draw(
    data((x = 0:2, y = counts)) * mapping(:x => "Bacon number", :y => "Number of actors") * visual(Scatter);
    axis=(; yscale=log10, xticks=0:2,
    yminorticks = IntervalsBetween(8),
    yminorticksvisible=true,
    yminorgridvisible=true),
    figure=(; resolution=(800,500),)
)

Figure 1: Number of actors versus Bacon number, up to a Bacon number of 2

we see the pattern of exponential growth.