API

GAMA

GamaClassifier

class gama.GamaClassifier(config=None, scoring='neg_log_loss', *args, **kwargs)[source]

Gama with adaptations for (multi-class) classification.

Parameters
  • scoring (str, Metric or Tuple) – Specifies the/all metric(s) to optimize towards. A string will be converted to Metric. A tuple must specify each metric with the same type (e.g. all str). See Metrics for built-in metrics.

  • regularize_length (bool (default=True)) – If True, add pipeline length as an optimization metric. Short pipelines should then be preferred over long ones.

  • max_pipeline_length (int, optional (default=None)) – If set, limit the maximum number of steps in any evaluated pipeline. Encoding and imputation are excluded.

  • config (Dict) – Specifies available components and their valid hyperparameter settings. For more information, see GAMA Search Space Configuration.

  • random_state (int, optional (default=None)) – Seed for the random number generators used in the process. However, with n_jobs > 1, there will be randomization introduced by multi-processing. For reproducible results, set this and use n_jobs=1.

  • max_total_time (positive int (default=3600)) – Time in seconds that can be used for the fit call.

  • max_eval_time (positive int, optional (default=None)) – Time in seconds that can be used to evaluate any one single individual. If None, set to 0.1 * max_total_time.

  • n_jobs (int, optional (default=None)) – The amount of parallel processes that may be created to speed up fit. Accepted values are positive integers, -1 or None. If -1 is specified, multiprocessing.cpu_count() processes are created. If None is specified, multiprocessing.cpu_count() / 2 processes are created.

  • max_memory_mb (int, optional (default=None)) – Sets the total amount of memory GAMA is allowed to use (in megabytes). If not set, GAMA will use as much as it needs. GAMA is not guaranteed to respect this limit at all times, but it should never violate it for too long.

  • verbosity (int (default=logging.WARNING)) – Sets the level of log messages to be automatically output to terminal.

  • search (BaseSearch (default=AsyncEA())) – Search method to use to find good pipelines. Should be instantiated.

  • post_processing (BasePostProcessing (default=BestFitPostProcessing())) – Post-processing method to create a model after the search phase. Should be an instantiated subclass of BasePostProcessing.

  • output_directory (str, optional (default=None)) – Directory to use to save GAMA output. This includes both intermediate results during search and logs. If set to None, generate a unique name (“gama_HEXCODE”).

  • store (str (default='logs')) –

    Determines which data is stored after each run:
    • ’nothing’: keep nothing from this run

    • ’models’: keep only cache with models and predictions

    • ’logs’: keep only the logs

    • ’all’: keep logs and cache with models and predictions

GamaRegressor

class gama.GamaRegressor(config=None, scoring='neg_mean_squared_error', *args, **kwargs)[source]

Gama with adaptations for regression.

Metrics

If you have a custom scoring function, you can define your own Metric.

Metric

class gama.utilities.metrics.Metric(scorer: Union[sklearn.metrics._scorer._BaseScorer, str])[source]

A thin layer around the scorer class of scikit-learn.

MetricType

class gama.utilities.metrics.MetricType(value)[source]

Metric types supported by GAMA.

CLASSIFICATION: int = 1

discrete target

REGRESSION: int = 2

continuous target


Search Methods

AsynchronousSuccessiveHalving

class gama.search_methods.AsynchronousSuccessiveHalving(reduction_factor: Optional[int] = None, minimum_resource: Optional[int] = None, maximum_resource: Optional[int] = None, minimum_early_stopping_rate: Optional[int] = None)[source]

Asynchronous Halving Algorithm by Li et al.

paper: https://arxiv.org/abs/1810.05934

Parameters
  • reduction_factor (int, optional (default=3)) – Reduction factor of candidates between each rung.

  • minimum_resource (int, optional (default=100)) – Number of samples to use in the lowest rung.

  • maximum_resource (int, optional (default=number of samples in the dataset)) – Number of samples to use in the top rung. This should not exceed the number of samples in the data.

  • minimum_early_stopping_rate (int (default=1)) – Number of lowest rungs to skip.

AsyncEA

class gama.search_methods.AsyncEA(population_size: Optional[int] = None, max_n_evaluations: Optional[int] = None, restart_callback: Optional[Callable[], bool]] = None)[source]

Perform asynchronous evolutionary optimization.

Parameters
  • population_size (int, optional (default=50)) – Maximum number of individuals in the population at any time.

  • max_n_evaluations (int, optional (default=None)) – If specified, only a maximum of max_n_evaluations individuals are evaluated. If None, the algorithm will be run until interrupted by the user or a timeout.

  • restart_callback (Callable[[], bool], optional (default=None)) – Function which takes no arguments and returns True if search restart.

RandomSearch

class gama.search_methods.RandomSearch[source]

Perform random search over all possible pipelines.


Post-Processing

NoPostProcessing

class gama.postprocessing.NoPostProcessing(time_fraction: float = 0.0)[source]

Does nothing, no time will be reserved for post-processing.

Parameters

time_fraction (float) – Fraction of total time that to be reserved for this post-processing step.

BestFitPostProcessing

class gama.postprocessing.BestFitPostProcessing(time_fraction: float = 0.1)[source]

Post processing technique which trains the best found single pipeline.

Parameters

time_fraction (float) – Fraction of total time that to be reserved for this post-processing step.

EnsemblePostProcessing

class gama.postprocessing.EnsemblePostProcessing(time_fraction: float = 0.3, ensemble_size: Optional[int] = 25, hillclimb_size: Optional[int] = 10000, max_models: Optional[int] = 200)[source]

Ensemble construction per Caruana et al.

Parameters
  • time_fraction (float (default=0.3)) – Fraction of total time reserved for Ensemble building.

  • ensemble_size (int, optional (default=25)) – Total number of models in the ensemble. When a single model is chosen more than once, it will increase its weight in the ensemble and does count towards this maximum.

  • hillclimb_size (int, optional (default=10_000)) – Number of predictions that are used to determine the ensemble score during hillclimbing. If None, use all.

  • max_models (int, optional (default=200)) – Only consider the best max_models number of models. If None, use all. Consequently also sets the max number of unique models in the ensemble.


Genetic Programming

Components

Defines the building blocks for Individuals. Individuals represent machine learning pipelines in a back-end agnostic way. An Individual can be converted to its back-end specific representation (e.g. a scikit-learn Pipeline) by calling its pipeline property as long as a function has been provided to convert the individual to it.

Individuals are built with:

  • Terminals. Definition of a specific value for a specific hyperparameter. Immutable.

  • Primitives. Definition of a specific algorithm. Immutable.

    Defined by Terminal input, output type and operation.

  • PrimitiveNodes. Mutable for easy operations (e.g. mutation).

    An instantiated Primitive with specific Terminals.

  • Fitness. Stores information about the evaluation of the individual.

Individual

class gama.genetic_programming.components.Individual(main_node: gama.genetic_programming.components.primitive_node.PrimitiveNode, to_pipeline: Optional[Callable] = None)[source]

Collection of PrimitiveNodes which together specify a machine learning pipeline.

Parameters
  • main_node (PrimitiveNode) – The first node of the individual (the estimator node).

  • to_pipeline (Callable, optional (default=None)) – A function which can convert this individual into a machine learning pipeline. If not provided, the pipeline property will be unavailable.

Primitive

class gama.genetic_programming.components.Primitive(input: Tuple[str], output: str, identifier: Callable)[source]

Defines an operator which takes input and produces output.

E.g. a preprocessing or classification algorithm.

Create new instance of Primitive(input, output, identifier)

PrimitiveNode

class gama.genetic_programming.components.PrimitiveNode(primitive: gama.genetic_programming.components.primitive.Primitive, data_node: Union[gama.genetic_programming.components.primitive_node.PrimitiveNode, str], terminals: List[gama.genetic_programming.components.terminal.Terminal])[source]

An instantiation for a Primitive with specific Terminals.

Parameters
  • primitive (Primitive) – The Primitive type of this PrimitiveNode.

  • data_node (PrimitiveNode) – The PrimitiveNode that specifies all preprocessing before this PrimitiveNode.

  • terminals (List[Terminal]) – A list of terminals matching the primitive.

Terminal

class gama.genetic_programming.components.Terminal(value: object, output: str, identifier: str)[source]

Specifies a specific value for a specific type or input.

E.g. a value for a hyperparameter for an algorithm.

Create new instance of Terminal(value, output, identifier)

Mutation

Contains mutation functions for genetic programming. Each mutation function takes an individual and modifies it in-place.

gama.genetic_programming.mutation.mut_insert(individual: gama.genetic_programming.components.individual.Individual, primitive_set: dict)None[source]

Mutate an Individual in-place by inserting a PrimitiveNode at a random location.

The new PrimitiveNode will not be inserted as root node.

Parameters
  • individual (Individual) – Individual to mutate in-place.

  • primitive_set (dict) –

gama.genetic_programming.mutation.mut_replace_primitive(individual: gama.genetic_programming.components.individual.Individual, primitive_set: dict)None[source]

Mutates an Individual in-place by replacing one of its Primitives.

Parameters
  • individual (Individual) – Individual to mutate in-place.

  • primitive_set (dict) –

gama.genetic_programming.mutation.mut_replace_terminal(individual: gama.genetic_programming.components.individual.Individual, primitive_set: dict)None[source]

Mutates an Individual in-place by replacing one of its Terminals.

Parameters
  • individual (Individual) – Individual to mutate in-place.

  • primitive_set (dict) –

gama.genetic_programming.mutation.mut_shrink(individual: gama.genetic_programming.components.individual.Individual, primitive_set: Optional[dict] = None, shrink_by: Optional[int] = None)None[source]

Mutates an Individual in-place by removing any number of primitive nodes.

Primitive nodes are removed from the preprocessing end.

Parameters
  • individual (Individual) – Individual to mutate in-place.

  • primitive_set (dict, optional) – Not used. Present to create a matching function signature with other mutations.

  • shrink_by (int, optional (default=None)) – Number of primitives to remove. Must be at least one greater than the number of primitives in individual. If None, a random number of primitives is removed.

gama.genetic_programming.mutation.random_valid_mutation_in_place(individual: gama.genetic_programming.components.individual.Individual, primitive_set: dict, max_length: Optional[int] = None) → Callable[source]

Apply a random valid mutation in place.

The random mutation can be one of:

  • mut_random_primitive

  • mut_random_terminal, if the individual has at least one

  • mutShrink, if individual has at least two primitives

  • mutInsert, if it would not exceed new_max_length when specified.

Parameters
  • individual (Individual) – An individual to be mutated in-place.

  • primitive_set (dict) – A dictionary defining the set of primitives and terminals.

  • max_length (int, optional (default=None)) – If specified, impose a maximum length on the new individual.

Returns

The mutation function used.

Return type

Callable

Crossover

Functions which take two Individuals and produce at least one new Individual.

gama.genetic_programming.crossover.crossover_primitives(ind1: gama.genetic_programming.components.individual.Individual, ind2: gama.genetic_programming.components.individual.Individual) → Tuple[gama.genetic_programming.components.individual.Individual, gama.genetic_programming.components.individual.Individual][source]

Crossover two individuals by exchanging any number of preprocessing steps.

Parameters
  • ind1 (Individual) – The individual to crossover with individual2.

  • ind2 (Individual) – The individual to crossover with individual1.

gama.genetic_programming.crossover.crossover_terminals(ind1: gama.genetic_programming.components.individual.Individual, ind2: gama.genetic_programming.components.individual.Individual) → Tuple[gama.genetic_programming.components.individual.Individual, gama.genetic_programming.components.individual.Individual][source]

Crossover two individuals in-place by exchanging two Terminals.

Terminals must share output type but have different values.

Parameters
  • ind1 (Individual) – The individual to crossover with individual2.

  • ind2 (Individual) – The individual to crossover with individual1.

gama.genetic_programming.crossover.random_crossover(ind1: gama.genetic_programming.components.individual.Individual, ind2: gama.genetic_programming.components.individual.Individual, max_length: Optional[int] = None) → Tuple[gama.genetic_programming.components.individual.Individual, gama.genetic_programming.components.individual.Individual][source]

Random valid crossover between two individuals in-place, if it can be done.

Parameters
  • ind1 (Individual) – The individual to crossover with ind2.

  • ind2 (Individual) – The individual to crossover with ind1.

  • max_length (int, optional(default=None)) – The first individual in the returned tuple has at most max_length primitives. Requires both provided individuals to contain at most max_length primitives.

Raises

ValueError

  • If there is no valid crossover function for the two individuals. - If max_length is set and either ind1 or ind2 contain more primitives than max_length.


Utilities

Generic

Collection of generic components.

Pareto Front

class gama.utilities.generic.paretofront.ParetoFront(start_list: Optional[List[Any]] = None, get_values_fn: Optional[Callable[[Any], Tuple[Any, ]]] = None)[source]

A list of tuples in which no one tuple is dominated by another.

Parameters
  • start_list (list, optional (default=None)) – List of items of which to calculate the Pareto front.

  • get_values_fn (Callable, optional (default=None)) – Function that takes an item and returns a tuple of values, such that each should be maximized. If left None, it is assumed that items are already such tuples.

Stopwatch

class gama.utilities.generic.stopwatch.Stopwatch(timing_function=<built-in function time>)[source]

A context manager that keeps track of wall clock time spent.

Parameters

timing_function (Callable (default=time.time)) – The function used to measure time, e.g. time.time or time.process_time

Timekeeper

class gama.utilities.generic.timekeeper.TimeKeeper(total_time: Optional[int] = None)[source]

Simple object that helps keep track of time over multiple activities.

Parameters

total_time (int, optional (default=None)) – The total time available across activities. If set to None, the total_time_remaining property will be unavailable.

AsyncEvaluator

Warning

I’m sure there are better tools out there, but I have yet to find a minimal easy multi-processing tool. I tried using the built-in ProcessPoolExecutor, but it had short comings such as not being able to cancel jobs while they were running.

class gama.utilities.generic.async_evaluator.AsyncEvaluator(n_workers: Optional[int] = None, memory_limit_mb: Optional[int] = None, logfile: Optional[str] = None, wait_time_before_forced_shutdown: int = 10)[source]

Manages subprocesses on which arbitrary functions can be evaluated.

The function and all its arguments must be picklable. Using the same AsyncEvaluator in two different contexts raises a RuntimeError.

defaults: Dict, optional (default=None)

Default parameter values shared between all submit calls. This allows these defaults to be transferred only once per process, instead of twice per call (to and from the subprocess). Only supports keyword arguments.

Parameters
  • n_workers (int, optional (default=None)) – Maximum number of subprocesses to run for parallel evaluations. Defaults to AsyncEvaluator.n_jobs, using all cores unless overwritten.

  • memory_limit_mb (int, optional (default=None)) – The maximum number of megabytes that this process and its subprocesses may use in total. If None, no limit is enforced. There is no guarantee the limit is not violated.

  • logfile (str, optional (default=None)) – If set, recorded resource usage will be written to this file.

  • wait_time_before_forced_shutdown (int (default=10)) – Number of seconds to wait between asking the worker processes to shut down and terminating them forcefully if they failed to do so.