In the previous post I described downloading and parsing files containing summary data from files.tmdb.org, then saving the results as tables in Arrow format.
Here I will show how to access the cast of a movie and the filmography of an actor using the tmdb.org public API.
These steps are preparation for a developing a method of linking actors through appearances in the same movie, which is the underlying relationship in the parlor game Six Degrees of Kevin Bacon.
As described in the Wikipedia entry
Six Degrees of Kevin Bacon or Bacon’s Law is a parlor game where players challenge each other to arbitrarily choose an actor and then connect them to another actor via a film that both actors have appeared in together, repeating this process to try to find the shortest path that ultimately leads to prolific American actor Kevin Bacon. It rests on the assumption that anyone involved in the Hollywood film industry can be linked through their film roles to Bacon within six steps. The game’s name is a reference to “six degrees of separation”, a concept that posits that any two people on Earth are six or fewer acquaintance links apart.
Automating the process of assigning “Bacon numbers”, the distance through appearances in the same movie from Kevin Bacon to an actor, can quickly become a massive undertaking, putting a premium on efficiency in time and storage on the intermediate steps.
Load packages
First load the packages to be used
Code
usingArrow # Apache Arrow storage format for TablesusingDates # time/date formatsusingHTTP # HTTP communication for clients or serversusingJSON3 # read and write JSON with a slick interface for structsusingTypedTables # low-overhead Table wrapperusingURIs # construct or deconstruct URI strings
API key
Queries on the tmdb.org database (as opposed to the summary file downloads described previously) require a (free) tmdb account and an API Key.
This key is usually printed as an unsigned 128-bit integer written as 32 hexadecimal digits. That is, it looks like
rand(UInt128)
0xa6bb36f3ef7832ce4a637a7c3f309ea0
but written without the 0x prefix.
Assume that this API key is stored in an environment variable TMDP_API_KEY, which is accessed as ENV["TMDB_API_KEY"]. If you prefer not to use an environment variable you can instead assign this value in your Julia startup file.
Requests to the HTTP server at tmdb.org require the API key as a query parameter in the URI so we build a base query URI as
const quri =URI(; # named arguments only scheme="https", host="api.themoviedb.org", path="/3", # URIs for version 3 of the API begin with 3 query=["api_key"=>ENV["TMDB_API_KEY"]]);
Suppress printing of URIs for queries
Because I don’t want to reveal my API key, I will suppress printing of URIs containing this key.
We will only be interested in credits as a cast member. The fields of interest to us are id, release_date, original_title, and popularity, all referring to the movie, and character, which refers to the actor/movie combination.
functiongetcastcredits( id::Integer; # person id iob::IOBuffer=IOBuffer(), # for streaming the response body selector::Function=@Select( # select and modify fields of each row id=Int32($id), release_date, original_title, character, popularity=Float32($popularity), ), template=(id=Int32(0), release_date="", original_title="", character="", popularity=0.0f0),)take!(iob) # clear the buffer try HTTP.get(joinpath(quri, string("person/", id, "/movie_credits")); response_stream=iob, ) catchtake!(iob) # clear the buffer which may be non-emptyreturnTable(typeof(template)[])endreturnselector.(Table(JSON3.read(take!(iob)).cast))end
getcastcredits (generic function with 1 method)
Streaming the response body through an IOBuffer
The body of the HTTP response is streamed through an IOBuffer which, by default, is a freshly allocated IOBuffer. The take! method for an IOBuffer returns the contents of the stream and resets the stream to be empty. However, it does not free the memory allocated for the buffer, which means that subsequent uses of the buffer reuse that memory and only allocate additional memory if needed.
To take advantage of this we pass a pre-declared (but not const because fields in the struct must be mutable) IOBuffer.
Assign an IOBuffer named iob, which can be re-used in our HTTP requests.
The try/catch block wrapping the call to HTTP.get is to catch errors from the request and return an empty Table of the same type as a valid response, instead of many, many lines of traceback from the failed request. For example, there is no person with an id of 15 so the HTTP.get request will fail but, in this case, the function fails gracefully.
getcastcredits(15; iob)
Table with 5 columns and 0 rows:
id release_date original_title character popularity
┌────────────────────────────────────────────────────────
In the function call, id is a person id. In the response table, the contents of the id column are movie ids.
Next get Kevin Bacon’s cast credits
kbcredits =getcastcredits(4724; iob)
Table with 5 columns and 108 rows:
id release_date original_title character ⋯
┌───────────────────────────────────────────────────────────────────
1 │ 617 1998-03-20 Wild Things Sergeant Ray Duquet… ⋯
2 │ 819 1996-10-18 Sleepers Sean Nokes ⋯
3 │ 1788 1984-02-17 Footloose Ren McCormack ⋯
4 │ 21032 1995-12-22 Balto Balto (voice) ⋯
5 │ 21332 1986-02-14 Quicksilver Jack Casey ⋯
6 │ 11601 1999-09-10 Stir of Echoes Tom Witzky ⋯
7 │ 11835 2007-08-31 Death Sentence Nick Hume ⋯
8 │ 25405 2009-09-21 Taking Chance LtCol Mike Strobl ⋯
9 │ 9362 1990-01-19 Tremors Valentine McKee ⋯
10 │ 12714 1988-02-05 She's Having a Baby Jake Briggs ⋯
11 │ 9692 2004-12-24 The Woodsman Walter ⋯
12 │ 9729 2005-10-07 Where the Truth Lies Lanny ⋯
13 │ 15006 2007-09-01 Rails & Ties Tom Stark ⋯
14 │ 61772 2008-01-01 Connected: The Powe… Kevin Bacon ⋯
15 │ 63437 1997-08-02 Telling Lies in Ame… Billy Magic ⋯
16 │ 45225 1989-09-15 The Big Picture Nick Chapman ⋯
17 │ 46094 1994-01-07 The Air Up There Jimmy Dolan ⋯
18 │ 17956 1991-02-22 He Said, She Said Dan Hanson ⋯
19 │ 49792 1991-02-01 Queens Logic Dennis ⋯
20 │ 49855 1987-08-28 End of the Line Everett ⋯
21 │ 19209 1987-07-10 White Water Summer Vic ⋯
22 │ 76168 1983-03-06 The Demon Murder Ca… Kenny Miller ⋯
23 │ 57585 2011-05-17 Elephant White Jimmy the Brit ⋯
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋱
How is iob recognized as a named argument?
From v1.8.0 onward Julia provides a convenient destructuring syntax for Structs, NamedTuples and argument lists. The key is the optional ; separator between the positional arguments and the named arguments in the function call. Because the ; separator is used here to indicate that only named arguments follow, passing iob without a name is equivalent to passing iob=iob. In other words, the default name of an argument after the ; is the name of the argument itself in the context of the caller.
Filtering the table
The first couple of dozen movies in Kevin Bacon’s filmography are of the sort we would expect, except for the movie (#14 in this list) where his character is named “Kevin Bacon”.
There are more unusual entries toward the end of the list
last(kbcredits, 23)
Table with 5 columns and 23 rows:
id release_date original_title character ⋯
┌────────────────────────────────────────────────────────────────────
1 │ 586482 2019-03-01 Holly Near: Singing… Self ⋯
2 │ 691677 2011-06-05 X-Men: First Class … Self - Sebastian Sh… ⋯
3 │ 726209 Leave the World Beh… ⋯
4 │ 10944 2003-10-22 In the Cut John Graham ⋯
5 │ 708353 2020-03-24 Find Your Groove Self ⋯
6 │ 950707 2022-03-20 Step Into… The Movi… Self ⋯
7 │ 319070 2015-01-25 Drunk Stoned Brilli… Self / Actor ⋯
8 │ 459956 2017-06-20 Story of a Girl Michael ⋯
9 │ 458506 2017-07-08 Tour de Pharmacy Ditmer Klerken ⋯
10 │ 280180 Beverly Hills Cop: … ⋯
11 │ 141498 1996-01-01 Lost Moon: The Triu… Himself ⋯
12 │ 462323 1986-04-07 The Tender Age Probation Officer (… ⋯
13 │ 774752 2022-11-25 The Guardians of th… Kevin Bacon ⋯
14 │ 658142 1998-10-13 The Yearbook: An An… Himself - 'Chip Dil… ⋯
15 │ 745713 2008-08-13 Animal House: The I… Self ⋯
16 │ 657148 2019-12-18 Live in Front of a … Pinky Peterson ⋯
17 │ 881931 2021-09-09 Clint Eastwood: A C… Self ⋯
18 │ 8469 1978-07-27 Animal House Chip Diller ⋯
19 │ 42146 1981-09-25 Only When I Laugh Don ⋯
20 │ 714841 2020-06-11 Picture Show: A Tri… Self ⋯
21 │ 20794 2001-11-23 Novocaine Actor Lance Phelps ⋯
22 │ 54663 1979-10-05 Starting Over Husband - Young Cou… ⋯
23 │ 37757 2010-05-04 Never Sleep Again: … Self (archive foota… ⋯
These include movies with no release date or, sometimes but not in this collection, movies with a release date in the future. Also, a list like this can include movies with the character either not named or not credited or named Self or Himself.
We will declare that such movies are not acceptable for our purposes and remove them using filter. Generally the first argument to filter is a function, which, in this case, we pass as an anonymous function written as a Do-Block
filter!(kbcredits) do r (isempty(r.release_date) ||string(today()) ≤ r.release_date) &&returnfalse r.character =="Kevin Bacon"&&returnfalseoccursin(r"^\s*$|^Self|^Himself|\(uncredited\)$", r.character) && return falsereturntrueend
Table with 5 columns and 79 rows:
id release_date original_title character ⋯
┌───────────────────────────────────────────────────────────────────
1 │ 617 1998-03-20 Wild Things Sergeant Ray Duquet… ⋯
2 │ 819 1996-10-18 Sleepers Sean Nokes ⋯
3 │ 1788 1984-02-17 Footloose Ren McCormack ⋯
4 │ 21032 1995-12-22 Balto Balto (voice) ⋯
5 │ 21332 1986-02-14 Quicksilver Jack Casey ⋯
6 │ 11601 1999-09-10 Stir of Echoes Tom Witzky ⋯
7 │ 11835 2007-08-31 Death Sentence Nick Hume ⋯
8 │ 25405 2009-09-21 Taking Chance LtCol Mike Strobl ⋯
9 │ 9362 1990-01-19 Tremors Valentine McKee ⋯
10 │ 12714 1988-02-05 She's Having a Baby Jake Briggs ⋯
11 │ 9692 2004-12-24 The Woodsman Walter ⋯
12 │ 9729 2005-10-07 Where the Truth Lies Lanny ⋯
13 │ 15006 2007-09-01 Rails & Ties Tom Stark ⋯
14 │ 63437 1997-08-02 Telling Lies in Ame… Billy Magic ⋯
15 │ 45225 1989-09-15 The Big Picture Nick Chapman ⋯
16 │ 46094 1994-01-07 The Air Up There Jimmy Dolan ⋯
17 │ 17956 1991-02-22 He Said, She Said Dan Hanson ⋯
18 │ 49792 1991-02-01 Queens Logic Dennis ⋯
19 │ 49855 1987-08-28 End of the Line Everett ⋯
20 │ 19209 1987-07-10 White Water Summer Vic ⋯
21 │ 76168 1983-03-06 The Demon Murder Ca… Kenny Miller ⋯
22 │ 57585 2011-05-17 Elephant White Jimmy the Brit ⋯
23 │ 94671 2013-07-25 Jayne Mansfield's C… Carroll Caldwell ⋯
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋱
The mutating form of filter, named filter!
Here we use the mutating form of filter, named filter!, which modifies the table in place. The ! at the end of the name is an indication that this is a mutating function. This is merely a naming convention, designed to alert the user that this function may change one or more of its arguments. The ! at the end of the name has no syntactic significance.
last(kbcredits, 23)
Table with 5 columns and 23 rows:
id release_date original_title character ⋯
┌────────────────────────────────────────────────────────────────────
1 │ 881 1992-12-11 A Few Good Men Capt. Jack Ross ⋯
2 │ 13776 1982-04-02 Diner Timothy Fenwick Jr. ⋯
3 │ 45132 2010-11-26 Super Jacques ⋯
4 │ 17908 2000-01-12 My Dog Skip Jack Morris ⋯
5 │ 58462 2005-01-24 Loverboy Marty ⋯
6 │ 131010 1997-06-16 Destination Anywhere Mike ⋯
7 │ 388399 2016-12-12 Patriots Day Richard DesLauriers ⋯
8 │ 462318 1979-12-15 The Gift Teddy ⋯
9 │ 261023 2015-09-04 Black Mass FBI Agent Charles M… ⋯
10 │ 44004 1980-02-08 Hero at Large 2nd Teenager ⋯
11 │ 747803 2022-09-02 One Way Fred Sr. ⋯
12 │ 843889 2022-06-12 Space Oddity Jeff McAllister ⋯
13 │ 2609 1987-11-26 Planes, Trains and … Taxi Racer ⋯
14 │ 4488 1980-05-09 Friday the 13th Jack Burrell ⋯
15 │ 50646 2011-07-29 Crazy, Stupid, Love. David Lindhagen ⋯
16 │ 10944 2003-10-22 In the Cut John Graham ⋯
17 │ 459956 2017-06-20 Story of a Girl Michael ⋯
18 │ 458506 2017-07-08 Tour de Pharmacy Ditmer Klerken ⋯
19 │ 657148 2019-12-18 Live in Front of a … Pinky Peterson ⋯
20 │ 8469 1978-07-27 Animal House Chip Diller ⋯
21 │ 42146 1981-09-25 Only When I Laugh Don ⋯
22 │ 20794 2001-11-23 Novocaine Actor Lance Phelps ⋯
23 │ 54663 1979-10-05 Starting Over Husband - Young Cou… ⋯
Creating a regular expression
This code chunk shows creating a regular expression by prepending an r to a string - a so-called string macro call.
Saving the results
The movies in kbcredits have a “Bacon number” of 1, by definition. Because in later code we will want to check if a movie has already been assigned a Bacon number, we save the information on these movies in a Dict, which is a key-value dictionary, where the key is the movie id. The keys in the dictionary are hashed, making for quick lookups. We will augment the information in each row with its depth and the id of the actor who caused the movie to be added.
In the next stage of additions we reverse the role of person and movie and obtain the cast for each of the movies that have been added in this round. Information on these people will be added to a Dict. To keep these two Dicts associated and to record the identity of the actor from which the Dicts are derived we declare a CastChain struct
and generics addmovies! and addpersons! that have methods for this struct.
functionaddmovies!(ch::CastChain) iob =IOBuffer() prd = ch.persondict mvd = ch.moviedict plv =maximum(getproperty.(values(prd), :lv)) # maximum level of a person lv = plv +one(Int8) # level to be assigned to moviesfor p invalues(prd) p.lv == plv ||continue# skip people added in earlier roundstry HTTP.get(joinpath(quri, string("person/", p.id, "/movie_credits")), response_stream=iob)catchthrow(ArgumentError("failure to get movie credits for person $(p.id)"))end recs =filter(JSON3.read(take!(iob)).cast) do rhaskey(mvd, r.id) &&returnfalse# skip if movie was already enteredhaskey(r, :release_date) ||returnfalse# there are a few that don't have release_date (isempty(r.release_date) ||string(today()) ≤ r.release_date) &&returnfalsehaskey(r, :character) ||returnfalse (isnothing(r.character) || r.character == p.name) &&returnfalseoccursin(r"^\s*$|^Self|^Himself|^Herself|\(uncredited\)$", r.character) && return falsereturntrueend addedby = p.idfor r in recs mvd[r.id] = (; id=r.id, release=Date(r.release_date), original_title=r.original_title, lv, pop=r.popularity, addedby, character=r.character, )endendreturn chend
addmovies! (generic function with 1 method)
functionaddpersons!(ch::CastChain) iob =IOBuffer() prd = ch.persondict mvd = ch.moviedict lv =maximum(getproperty.(values(mvd), :lv))for m invalues(mvd) m.lv == lv ||continuetry HTTP.get(joinpath(quri, string("movie/", m.id, "/credits")), response_stream=iob)catchthrow(ArgumentError("failure to get credits for movie $(m.id)"))end recs =filter(JSON3.read(take!(iob)).cast) do rhaskey(prd, r.id) &&returnfalse (isnothing(r.character) || r.character == r.name) &&returnfalseoccursin(r"^\s*$|^Self|^Himself|^Herself|\(uncredited\)$", r.character) && return falsereturntrueend addedby = m.idfor r in recs prd[r.id] = (; id=r.id, name=r.name, lv, pop=r.popularity, addedby, character=r.character)endendreturn chend
addpersons! (generic function with 1 method)
The constructor takes a person id (i.e. 4724 for Kevin Bacon), creates the structure and populates the level-1 movies and persons.
functionCastChain(id::Integer) iob =IOBuffer() id =Int32(id) try HTTP.get(joinpath(quri, "person/$id"); response_stream=iob) catchthrow(ArgumentError("unable to retrieve person details for id = $id"))end pr = JSON3.read(take!(iob)) # details of this person - we only use the namereturnaddpersons!(addmovies!(CastChain(fieldtype(CastChain, 1)(), # an empty Dict of the correct typeDict( id => (; # a Dict with a single entry id, pr.name, lv=Int8(0), pop=Float32(pr.popularity), addedby=Int32(0), character="" ), ), ), ), )end
CastChain
The level-1 information for Kevin Bacon, which is the movies in which he has appeared (and that satisfy our criteria) and his fellow cast members in those movies, is created as
@time kbch =CastChain(4724);
2.400300 seconds (148.17 k allocations: 24.157 MiB, 2.51% compilation time)
On the day that I ran this version it returned 79 movies and about 1850 actors who have a Bacon number of 1.
(length(kbch.moviedict), length(kbch.persondict))
(79, 1849)
l1ptbl =Table(collect(values(kbch.persondict)))
Table with 6 columns and 1849 rows:
id name lv pop addedby character
┌───────────────────────────────────────────────────────────────────────────
1 │ 14888 Bruce McGill 1 21.99 8469 Daniel Simpson Day
2 │ 7144 Michael Flynn 1 3.34 1788 Policeman
3 │ 9140 James Faulkner 1 8.58 49538 Swiss Bank Manager
4 │ 60876 Walter Breaux 1 0.6 820 Vernon Bundy
5 │ 1847933 Mat Langford 1 0.6 121950 Lester
6 │ 951993 Mary Klug 1 2.245 261023 Mom Bulger
7 │ 51582 Ned Vaughn 1 3.405 568 CAPCOM 2
8 │ 1207720 Olga Barbato 1 0.6 131010 Bella
9 │ 128629 Crystal Reed 1 11.996 50646 Amy Johnson
10 │ 2876 Matt Dillon 1 22.736 58462 Mark
11 │ 2447500 Danny Bohnen 1 0.692 747803 Oleg
12 │ 2057258 Tom Maier 1 0.6 45225 Building Manager
13 │ 4135 Robert Redford 1 18.675 464655 Reader - Declaration of I…
14 │ 56152 Kari Wuhrer 1 15.938 13641 Correspondent
15 │ 67773 Steve Martin 1 21.55 2609 Neal Page
16 │ 1470474 Krista Marie Yu 1 1.51 257345 Tasha
17 │ 1370 Brad Dourif 1 21.746 8438 Byron Stamphill
18 │ 95982 Danae Nason 1 2.199 261023 McGuire's Secretary
19 │ 1089410 Rebecca White 1 0.6 617 Policewoman #2
20 │ 1133460 Paul T. Taylor 1 1.38 45132 Frank Sr.
21 │ 7420 Harvey Fierstein 1 3.288 76168 The Demon (voice)
22 │ 21708 Tomas Milian 1 8.374 820 Leopoldo
23 │ 1043304 Andy Milder 1 4.635 568 GUIDO White
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
sort!(l1ptbl; by=getproperty(:id))
Table with 6 columns and 1849 rows:
id name lv pop addedby character
┌───────────────────────────────────────────────────────────────────────────
1 │ 20 Elizabeth Perkins 1 17.115 17956 Lorraine "Lorie" Bryer
2 │ 31 Tom Hanks 1 96.986 574379 Narrator
3 │ 33 Gary Sinise 1 16.466 574379 Ernie Pyle
4 │ 64 Gary Oldman 1 32.179 8438 Glenn
5 │ 85 Johnny Depp 1 26.459 261023 James 'Whitey' Bulger
6 │ 103 Mark Ruffalo 1 28.859 10944 Giovanni A. Malloy
7 │ 109 Elijah Wood 1 16.778 574379 Corp. Wilfred Hanson / Capt…
8 │ 114 Orlando Bloom 1 15.738 458506 JuJu Pepe
9 │ 133 Peter Sarsgaard 1 16.463 261023 Brian Halloran
10 │ 192 Morgan Freeman 1 73.802 464655 Reader - Declaration of Ind…
11 │ 213 Gerry Robert Byrne 1 1.16 131010 Bar Patron #2
12 │ 228 Ed Harris 1 36.464 568 Gene Kranz
13 │ 237 Aidan Devine 1 3.857 41952 Reporter
14 │ 287 Brad Pitt 1 48.921 574379 Sgt. Bill Mauldin
15 │ 335 Michael Shannon 1 40.914 9692 Rosen
16 │ 342 Eugene Byrd 1 7.868 819 Rizzo
17 │ 350 Laura Linney 1 16.805 322 Annabeth Markum
18 │ 380 Robert De Niro 1 64.375 819 Father Bobby
19 │ 382 Bob Hoskins 1 8.338 21032 Boris the Goose (voice)
20 │ 418 Robert Patrick 1 19.357 94671 Jimbo Caldwell
21 │ 425 Dan John Miller 1 0.84 26820 Mickey
22 │ 429 Lucas Till 1 16.653 49538 Alex Summers / Havok
23 │ 450 Mike Colter 1 11.571 25405 MGySgt Demetry
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
sort!(l1ptbl; by=getproperty(:addedby))
Table with 6 columns and 1849 rows:
id name lv pop addedby character
┌───────────────────────────────────────────────────────────────────────────
1 │ 4724 Kevin Bacon 0 29.145 0
2 │ 350 Laura Linney 1 16.805 322 Annabeth Markum
3 │ 504 Tim Robbins 1 19.331 322 Dave Boyle
4 │ 2228 Sean Penn 1 12.458 322 Jimmy Markum
5 │ 4728 Kevin Chapman 1 7.015 322 Val Savage
6 │ 4729 Tom Guiry 1 9.299 322 Brendan Harris
7 │ 4730 Emmy Rossum 1 11.579 322 Katie Markum
8 │ 4731 Andrew Mackin 1 0.6 322 John O'Shea
9 │ 4732 Adam Nelson 1 0.692 322 Nick Savage
10 │ 4733 Robert Wahlberg 1 6.359 322 Kevin Savage
11 │ 4734 Jenny O'Hara 1 7.048 322 Esther Harris
12 │ 4735 John Doman 1 8.005 322 Driver
13 │ 4736 Cameron Bowen 1 1.387 322 Young Dave
14 │ 4737 Jason Kelly 1 0.84 322 Young Jimmy
15 │ 4738 Connor Paolo 1 4.644 322 Young Sean
16 │ 4739 T. Bruce Page 1 0.6 322 Jimmy's Father
17 │ 4740 Miles Herter 1 0.994 322 Sean's Father
18 │ 4741 Cayden Boyd 1 3.251 322 Michael Boyle
19 │ 4742 Tori Davis 1 1.102 322 Lauren Devine
20 │ 4743 Jonathan Togo 1 6.139 322 Pete
21 │ 71552 Ari Graynor 1 9.23 322 Eve Pigeon
22 │ 97267 Susan Willis 1 3.869 322 Mrs. Prior
23 │ 130735 José Ramón Rosario 1 2.154 322 Lt. Friel (as Jose Ramon…
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
Early in the process a single movie, like
kbch.moviedict[322]
(id = 322, release = Date("2003-01-01"), original_title = "Mystic River", lv = 1, pop = 28.921f0, addedby = 4724, character = "Sean Devine")
can cause many actors to be added to this table.
Higher-level Bacon numbers
It takes only a few more seconds to add the movies in which the level-1 actors have appeared (via addmovies!(kbch)), producing about 24,000 level-2 movies (i.e. movies without Kevin Bacon in the cast but with a cast member who appeared in a movie with Kevin Bacon). The next stage of adding any cast members in these movies who are not already in the persondict takes about 15 to 20 minutes, not because it is compute intensive but because access to the API gets throttled when you are making many HTTP requests in quick succession.
The result is over 175,000 actors with a Bacon number of 2.
I saved the values in the persondict and moviedict as Arrow tables.
l2chain =CastChain(Dict(r.id => r for r inTable(Arrow.Table("./kbmovies.arrow"))),Dict(r.id => r for r inTable(Arrow.Table("./kbpersons.arrow"))));
Table with 6 columns and 178745 rows:
id name lv pop addedby character
┌──────────────────────────────────────────────────────────────────────────
1 │ 1 George Lucas 2 12.179 306 Disappointed Man
2 │ 2 Mark Hamill 2 29.562 23738 Void (voice)
3 │ 3 Harrison Ford 2 35.51 109410 Branch Rickey
4 │ 4 Carrie Fisher 2 5.444 278427 Angela as Mon Mothma (voice)
5 │ 5 Peter Cushing 2 13.677 26680 SS Commander
6 │ 6 Anthony Daniels 2 4.244 348350 Tak
7 │ 7 Andrew Stanton 2 13.095 127380 Crush (voice)
8 │ 8 Lee Unkrich 2 2.051 12 Additional Voices (voice)
9 │ 10 Bob Peterson 2 2.909 127380 Mr. Ray (voice)
10 │ 11 David Reynolds 2 2.107 12278 Ernie
11 │ 12 Alexander Gould 2 7.957 127380 Passenger Carl (voice)
12 │ 13 Albert Brooks 2 11.123 127380 Marlin (voice)
13 │ 14 Ellen DeGeneres 2 5.079 127380 Dory (voice)
14 │ 18 Brad Garrett 2 11.973 127380 Bloat (voice)
15 │ 19 Allison Janney 2 21.146 429473 Lou
16 │ 20 Elizabeth Perkins 1 17.115 17956 Lorraine "Lorie" Bryer
17 │ 22 Barry Humphries 2 6.786 12 Bruce (voice)
18 │ 23 Bill Hunter 2 6.993 6972 Skipper (Qantas Sloop)
19 │ 29 Steve Tisch 2 2.692 188222 Board Member
20 │ 31 Tom Hanks 1 96.986 574379 Narrator
21 │ 32 Robin Wright 2 16.604 13791 Melanie McGowan
22 │ 33 Gary Sinise 1 16.466 574379 Ernie Pyle
23 │ 34 Mykelti Williamson 2 13.31 666219 Truesdale
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
We can use these two tables to track from Björk (person id = 47) to Kevin Bacon in two steps.
l2chain.persondict[47]
(id = 47, name = "Björk", lv = 2, pop = 4.426f0, addedby = 16, character = "Selma Jezkova")
l2chain.moviedict[16]
(id = 16, release = Date("2000-06-30"), original_title = "Dancer in the Dark", lv = 2, pop = 15.257f0, addedby = 6758, character = "Detective")
l2chain.persondict[6758]
(id = 6758, name = "John Randolph Jones", lv = 1, pop = 1.694f0, addedby = 2609, character = "Cab Dispatcher")
l2chain.moviedict[2609]
(id = 2609, release = Date("1987-11-26"), original_title = "Planes, Trains and Automobiles", lv = 1, pop = 22.595f0, addedby = 4724, character = "Taxi Racer")
And, in case you had forgotten,
l2chain.persondict[4724]
(id = 4724, name = "Kevin Bacon", lv = 0, pop = 29.145f0, addedby = 0, character = "")
So Björk appeared as “Selma Jezkova” in the movie “Dancer in the Dark”, released in 2000, in which John Randolph Jones appeared as “Detective”. He also appeared as “Cab Dispatcher” in the 1987 release “Planes, Trains and Automobiles”, in which Kevin Bacon appeared as “Taxi Racer”.
Similarly, Rachel Weisz appeared as “Dr. Lily Sinclair” in the 1996 movie “Chain Reaction” in which Eddie Bo Smith, Jr. appeared as “Yusef Reed”. He also appeared as “Motel Security Guard” in the 2001 release “Novocaine” in which Kevin Back appeared as “Actor Lance Phelps”.
It is unlikely in a real game of “Six Degrees of Kevin Bacon”, without access to a database like this, that the connecting characters would be those with such minor roles.
Conclusion
This exercise was mostly to show HTTP requests with query strings and more Julia programming idioms. Quite a bit of the actual code is devoted to somewhat minor details but that is often the case. I have said that the most valuable character trait for a programmer is unbounded pessimism because you spend so much of your time thinking “What can go wrong here?”, and then being surprised when something you didn’t think of goes wrong.
Putting on my statistician’s hat for a moment, consider the number of actors at each level in the person table.
counts =zeros(Int, 3)for l inTable(collect(values(l2chain.persondict))).lv counts[l +1] +=1endcounts
3-element Vector{Int64}:
1
1848
176896
If we plot these counts on a logarithmic scale, as in Figure 1