MapReduce Schematics

Purpose

This application takes a list of images and a list keywords as input. The program will then rank the images based on their relevancy to the input keywords and return the ranked list of images.

Approach

The list of image URLs and list of keywords are packaged as a JSON object and sent as a POST request to the server's splitter function. The splitter function is responsible for unpacking the JSON from the front-end and splitting the image URLs into smaller lists. The splitter will then send out JSON objects of a unique map key, a complete list of keywords, and a partition of the image URLs.

In the multithreaded approach, Java's ExecutorService interface enables concurrent POST requests from the splitter to each of the mapper functions. Individual mapper functions will make calls to GCP's Vision API for the textual classification of each images it is responsible for. The mapper function then assigns a relevance rank number to each image based on the keywords and packaged into a JSON object containing the unique map key, the list of image URLs, and each of their relevance rankings. The JSON object is then sent back to the reducer function.

The single-threaded approach is similar to the multithreaded approach except for its handling of its POST requests to the mapper functions. The single-threaded approach utilizes a forloop for its POST requests, so calls to the next mapper function will not start until the previous mapper function had already completed its operation of calling to GCP's API, assigning relevance rankings, and repackaging the response to be sent back.

The reducer function will take outputs from the mapper functions in a master array of LabelRelevance objects (image URL, relevance ranking). Each mapper's data is unpackaged into LabelRelevance objects and the reducer's master array is updated based on the the unique map key for each mapper function. Even though the array is not thread-safe, the mapper keys set a constraint for each mapper function's memory access to the master array. The array is then sorted based on relevance, and the results packaged into a JSON object to be sent back to the front-end.

Multithreaded Approach Diagram

  1. Image URL and words sent via POST request to splitter function
  2. Splitter function partitions requests by unique keys mapped to each partition
  3. Splitter function initiates a threadpool
  4. Each threads from the threadpool initializes a mapper function with its respective partition
  5. Mapper function calls Vision API for its partition's image labels
  6. Mapper function forwards its ranked partition and key to the reducer function
  7. Reducer function collects each mapper function's results
  8. Points 5-7 occur concurrently
  9. Results from each mapper functions are aggregated to an array, with index loci based on their keys
  10. After all mapper functions have completed their processes, the reducer function will rank all results based on keyword relevance
  11. Ranked results are returned to the user

Single-threaded Approach Diagram

  1. Image URL and words sent via POST request to splitter function
  2. Splitter function partitions requests by unique keys mapped to each partition
  3. Splitter function distributes each partition to a mapper function on a loop
  4. Mapper function calls Vision API for its partition's image labels
  5. Mapper function forwards its ranked partition and key to the reducer function
  6. Reducer function collects each mapper function's results
  7. Points 4-6 for one mapper function will complete before the next mapper function can begin processes
  8. Results from each mapper functions are aggregated to an array, with index loci based on their keys
  9. After all mapper functions have completed their processes, the reducer function will rank all results based on keyword relevance
  10. Ranked results are returned to the user

Back