API

GAMA

GamaClassifier

class gama.GamaClassifier(config=None, scoring='neg_log_loss', *args, **kwargs)[source]

Gama with adaptations for (multi-class) classification.

Parameters

scoring (str, Metric or Tuple) – Specifies the/all metric(s) to optimize towards. A string will be converted to Metric. A tuple must specify each metric with the same type (e.g. all str). See Metrics for built-in metrics.
regularize_length (bool (default=True)) – If True, add pipeline length as an optimization metric. Short pipelines should then be preferred over long ones.
max_pipeline_length (int, optional (default=None)) – If set, limit the maximum number of steps in any evaluated pipeline. Encoding and imputation are excluded.
config (Dict) – Specifies available components and their valid hyperparameter settings. For more information, see GAMA Search Space Configuration.
random_state (int, optional (default=None)) – Seed for the random number generators used in the process. However, with n_jobs > 1, there will be randomization introduced by multi-processing. For reproducible results, set this and use n_jobs=1.
max_total_time (positive int (default=3600)) – Time in seconds that can be used for the fit call.
max_eval_time (positive int, optional (default=None)) – Time in seconds that can be used to evaluate any one single individual. If None, set to 0.1 * max_total_time.
n_jobs (int, optional (default=None)) – The amount of parallel processes that may be created to speed up fit. Accepted values are positive integers, -1 or None. If -1 is specified, multiprocessing.cpu_count() processes are created. If None is specified, multiprocessing.cpu_count() / 2 processes are created.
max_memory_mb (int, optional (default=None)) – Sets the total amount of memory GAMA is allowed to use (in megabytes). If not set, GAMA will use as much as it needs. GAMA is not guaranteed to respect this limit at all times, but it should never violate it for too long.
verbosity (int (default=logging.WARNING)) – Sets the level of log messages to be automatically output to terminal.
search (BaseSearch (default=AsyncEA())) – Search method to use to find good pipelines. Should be instantiated.
post_processing (BasePostProcessing (default=BestFitPostProcessing())) – Post-processing method to create a model after the search phase. Should be an instantiated subclass of BasePostProcessing.
output_directory (str, optional (default=None)) – Directory to use to save GAMA output. This includes both intermediate results during search and logs. This directory must be empty or not exist. If set to None, generate a unique name (“gama_HEXCODE”).
store (str (default='logs')) –
Determines which data is stored after each run:
- ’nothing’: keep nothing from this run
- ’models’: keep only cache with models and predictions
- ’logs’: keep only the logs
- ’all’: keep logs and cache with models and predictions

GamaRegressor

class gama.GamaRegressor(config=None, scoring='neg_mean_squared_error', *args, **kwargs)[source]: Gama with adaptations for regression.

Metrics

If you have a custom scoring function, you can define your own Metric.

Metric

class gama.utilities.metrics.Metric(scorer: Union[_BaseScorer, str])[source]: A thin layer around the scorer class of scikit-learn.

MetricType

class gama.utilities.metrics.MetricType(value)[source]

Metric types supported by GAMA.

CLASSIFICATION: int = 1: discrete target

REGRESSION: int = 2: continuous target

Search Methods

AsynchronousSuccessiveHalving

class gama.search_methods.AsynchronousSuccessiveHalving(reduction_factor: Optional[int] = None, minimum_resource: Optional[Tuple[int, float]] = None, maximum_resource: Optional[Tuple[int, float]] = None, minimum_early_stopping_rate: Optional[int] = None)[source]

Asynchronous Halving Algorithm by Li et al.

paper: https://arxiv.org/abs/1810.05934

Parameters

reduction_factor (int, optional (default=3)) – Reduction factor of candidates between each rung.
minimum_resource (int or float, optional (default=0.125)) – Number of samples to use in the lowest rung. If integer, it specifies the number of rows. If float, it specifies the fraction of the dataset.
maximum_resource (int or float optional (default=1.0)) – Number of samples to use in the top rung. If integer, it specifies the number of rows. If float, it specifies the fraction of the dataset.
minimum_early_stopping_rate (int (default=0)) – Number of lowest rungs to skip.

AsyncEA

class gama.search_methods.AsyncEA(population_size: Optional[int] = None, max_n_evaluations: Optional[int] = None, restart_callback: Optional[Callable[[], bool]] = None)[source]

Perform asynchronous evolutionary optimization.

Parameters

population_size (int, optional (default=50)) – Maximum number of individuals in the population at any time.
max_n_evaluations (int, optional (default=None)) – If specified, only a maximum of max_n_evaluations individuals are evaluated. If None, the algorithm will be run until interrupted by the user or a timeout.
restart_callback (Callable[[], bool], optional (default=None)) – Function which takes no arguments and returns True if search restart.

RandomSearch

class gama.search_methods.RandomSearch[source]: Perform random search over all possible pipelines.

Post-Processing

NoPostProcessing

class gama.postprocessing.NoPostProcessing(time_fraction: float = 0.0)[source]

Does nothing, no time will be reserved for post-processing.

Parameters: time_fraction (float) – Fraction of total time that to be reserved for this post-processing step.

BestFitPostProcessing

class gama.postprocessing.BestFitPostProcessing(time_fraction: float = 0.1)[source]

Post processing technique which trains the best found single pipeline.

Parameters: time_fraction (float) – Fraction of total time that to be reserved for this post-processing step.

EnsemblePostProcessing

class gama.postprocessing.EnsemblePostProcessing(time_fraction: float = 0.3, ensemble_size: Optional[int] = 25, hillclimb_size: Optional[int] = 10000, max_models: Optional[int] = 200)[source]

Ensemble construction per Caruana et al.

Parameters

time_fraction (float (default=0.3)) – Fraction of total time reserved for Ensemble building.
ensemble_size (int, optional (default=25)) – Total number of models in the ensemble. When a single model is chosen more than once, it will increase its weight in the ensemble and does count towards this maximum.
hillclimb_size (int, optional (default=10_000)) – Number of predictions that are used to determine the ensemble score during hillclimbing. If None, use all.
max_models (int, optional (default=200)) – Only consider the best max_models number of models. If None, use all. Consequently also sets the max number of unique models in the ensemble.

Genetic Programming

Components

Defines the building blocks for Individuals. Individuals represent machine learning pipelines in a back-end agnostic way. An Individual can be converted to its back-end specific representation (e.g. a scikit-learn Pipeline) by calling its pipeline property as long as a function has been provided to convert the individual to it.

Individuals are built with:

Terminals. Definition of a specific value for a specific hyperparameter. Immutable.

Primitives. Definition of a specific algorithm. Immutable.
Defined by Terminal input, output type and operation.

PrimitiveNodes. Mutable for easy operations (e.g. mutation).
An instantiated Primitive with specific Terminals.

Fitness. Stores information about the evaluation of the individual.

Individual

class gama.genetic_programming.components.Individual(main_node: PrimitiveNode, to_pipeline: Optional[Callable] = None)[source]

Collection of PrimitiveNodes which together specify a machine learning pipeline.

Parameters

main_node (PrimitiveNode) – The first node of the individual (the estimator node).
to_pipeline (Callable, optional (default=None)) – A function which can convert this individual into a machine learning pipeline. If not provided, the pipeline property will be unavailable.

Primitive

class gama.genetic_programming.components.Primitive(input: Tuple[str, ...], output: str, identifier: Callable)[source]

Defines an operator which takes input and produces output.

E.g. a preprocessing or classification algorithm.

Create new instance of Primitive(input, output, identifier)

PrimitiveNode

class gama.genetic_programming.components.PrimitiveNode(primitive: Primitive, data_node: Union[PrimitiveNode, str], terminals: List[Terminal])[source]

An instantiation for a Primitive with specific Terminals.

Parameters

primitive (Primitive) – The Primitive type of this PrimitiveNode.
data_node (PrimitiveNode) – The PrimitiveNode that specifies all preprocessing before this PrimitiveNode.
terminals (List[Terminal]) – A list of terminals matching the primitive.

Terminal

class gama.genetic_programming.components.Terminal(value: object, output: str, identifier: str)[source]

Specifies a specific value for a specific type or input.

E.g. a value for a hyperparameter for an algorithm.

Create new instance of Terminal(value, output, identifier)

Mutation

Contains mutation functions for genetic programming. Each mutation function takes an individual and modifies it in-place.

gama.genetic_programming.mutation.mut_insert(individual: Individual, primitive_set: dict) → None[source]

Mutate an Individual in-place by inserting a PrimitiveNode at a random location.

The new PrimitiveNode will not be inserted as root node.

Parameters

individual (Individual) – Individual to mutate in-place.
primitive_set (dict) –

gama.genetic_programming.mutation.mut_replace_primitive(individual: Individual, primitive_set: dict) → None[source]

Mutates an Individual in-place by replacing one of its Primitives.

Parameters

individual (Individual) – Individual to mutate in-place.
primitive_set (dict) –

gama.genetic_programming.mutation.mut_replace_terminal(individual: Individual, primitive_set: dict) → None[source]

Mutates an Individual in-place by replacing one of its Terminals.

Parameters

individual (Individual) – Individual to mutate in-place.
primitive_set (dict) –

gama.genetic_programming.mutation.mut_shrink(individual: Individual, primitive_set: Optional[dict] = None, shrink_by: Optional[int] = None) → None[source]

Mutates an Individual in-place by removing any number of primitive nodes.

Primitive nodes are removed from the preprocessing end.

Parameters

individual (Individual) – Individual to mutate in-place.
primitive_set (dict, optional) – Not used. Present to create a matching function signature with other mutations.
shrink_by (int, optional (default=None)) – Number of primitives to remove. Must be at least one greater than the number of primitives in individual. If None, a random number of primitives is removed.

gama.genetic_programming.mutation.random_valid_mutation_in_place(individual: Individual, primitive_set: dict, max_length: Optional[int] = None) → Callable[source]

Apply a random valid mutation in place.

The random mutation can be one of:

mut_random_primitive

mut_random_terminal, if the individual has at least one

mutShrink, if individual has at least two primitives

mutInsert, if it would not exceed new_max_length when specified.

Parameters

individual (Individual) – An individual to be mutated in-place.
primitive_set (dict) – A dictionary defining the set of primitives and terminals.
max_length (int, optional (default=None)) – If specified, impose a maximum length on the new individual.

Returns

The mutation function used.

Return type

Callable

Crossover

Functions which take two Individuals and produce at least one new Individual.

gama.genetic_programming.crossover.crossover_primitives(ind1: Individual, ind2: Individual) → Tuple[Individual, Individual][source]

Crossover two individuals by exchanging any number of preprocessing steps.

Parameters

ind1 (Individual) – The individual to crossover with individual2.
ind2 (Individual) – The individual to crossover with individual1.

gama.genetic_programming.crossover.crossover_terminals(ind1: Individual, ind2: Individual) → Tuple[Individual, Individual][source]

Crossover two individuals in-place by exchanging two Terminals.

Terminals must share output type but have different values.

Parameters

ind1 (Individual) – The individual to crossover with individual2.
ind2 (Individual) – The individual to crossover with individual1.

gama.genetic_programming.crossover.random_crossover(ind1: Individual, ind2: Individual, max_length: Optional[int] = None) → Tuple[Individual, Individual][source]

Random valid crossover between two individuals in-place, if it can be done.

Parameters

ind1 (Individual) – The individual to crossover with ind2.
ind2 (Individual) – The individual to crossover with ind1.
max_length (int, optional(default=None)) – The first individual in the returned tuple has at most max_length primitives. Requires both provided individuals to contain at most max_length primitives.

Raises

ValueError –

If there is no valid crossover function for the two individuals. - If max_length is set and either ind1 or ind2 contain more primitives than max_length.

Utilities

Generic

Collection of generic components.

Pareto Front

class gama.utilities.generic.paretofront.ParetoFront(start_list: Optional[List[Any]] = None, get_values_fn: Optional[Callable[[Any], Tuple[Any, ...]]] = None)[source]

A list of tuples in which no one tuple is dominated by another.

Parameters

start_list (list, optional (default=None).) – List of items of which to calculate the Pareto front.
get_values_fn (Callable, optional (default=None)) – Function that takes an item and returns a tuple of values, such that each should be maximized. If left None, it is assumed that items are already such tuples.

Stopwatch

class gama.utilities.generic.stopwatch.Stopwatch(timing_function=<built-in function time>)[source]

A context manager that keeps track of wall clock time spent.

Parameters: timing_function (Callable (default=time.time)) – The function used to measure time, e.g. time.time or time.process_time

Timekeeper

class gama.utilities.generic.timekeeper.TimeKeeper(total_time: Optional[int] = None)[source]

Simple object that helps keep track of time over multiple activities.

Parameters: total_time (int, optional (default=None)) – The total time available across activities. If set to None, the total_time_remaining property will be unavailable.

AsyncEvaluator

Warning

I’m sure there are better tools out there, but I have yet to find a minimal easy multi-processing tool. I tried using the built-in ProcessPoolExecutor, but it had short comings such as not being able to cancel jobs while they were running.

class gama.utilities.generic.async_evaluator.AsyncEvaluator(n_workers: int = 1, memory_limit_mb: Optional[int] = None, logfile: Optional[str] = None, wait_time_before_forced_shutdown: int = 10)[source]

Manages subprocesses on which arbitrary functions can be evaluated.

The function and all its arguments must be picklable. Using the same AsyncEvaluator in two different contexts raises a RuntimeError.

defaults: Dict, optional (default=None): Default parameter values shared between all submit calls. This allows these defaults to be transferred only once per process, instead of twice per call (to and from the subprocess). Only supports keyword arguments.

Parameters

n_workers (int (default=1)) – Maximum number of subprocesses to run for parallel evaluations.
memory_limit_mb (int, optional (default=None)) – The maximum number of megabytes that this process and its subprocesses may use in total. If None, no limit is enforced. There is no guarantee the limit is not violated.
logfile (str, optional (default=None)) – If set, recorded resource usage will be written to this file.
wait_time_before_forced_shutdown (int (default=10)) – Number of seconds to wait between asking the worker processes to shut down and terminating them forcefully if they failed to do so.