API
GAMA
GamaClassifier
- class gama.GamaClassifier(config=None, scoring='neg_log_loss', *args, **kwargs)[source]
- Gama with adaptations for (multi-class) classification. - Parameters
- scoring (str, Metric or Tuple) – Specifies the/all metric(s) to optimize towards. A string will be converted to Metric. A tuple must specify each metric with the same type (e.g. all str). See Metrics for built-in metrics. 
- regularize_length (bool (default=True)) – If True, add pipeline length as an optimization metric. Short pipelines should then be preferred over long ones. 
- max_pipeline_length (int, optional (default=None)) – If set, limit the maximum number of steps in any evaluated pipeline. Encoding and imputation are excluded. 
- config (Dict) – Specifies available components and their valid hyperparameter settings. For more information, see GAMA Search Space Configuration. 
- random_state (int, optional (default=None)) – Seed for the random number generators used in the process. However, with - n_jobs > 1, there will be randomization introduced by multi-processing. For reproducible results, set this and use- n_jobs=1.
- max_total_time (positive int (default=3600)) – Time in seconds that can be used for the - fitcall.
- max_eval_time (positive int, optional (default=None)) – Time in seconds that can be used to evaluate any one single individual. If None, set to 0.1 * max_total_time. 
- n_jobs (int, optional (default=None)) – The amount of parallel processes that may be created to speed up - fit. Accepted values are positive integers, -1 or None. If -1 is specified, multiprocessing.cpu_count() processes are created. If None is specified, multiprocessing.cpu_count() / 2 processes are created.
- max_memory_mb (int, optional (default=None)) – Sets the total amount of memory GAMA is allowed to use (in megabytes). If not set, GAMA will use as much as it needs. GAMA is not guaranteed to respect this limit at all times, but it should never violate it for too long. 
- verbosity (int (default=logging.WARNING)) – Sets the level of log messages to be automatically output to terminal. 
- search (BaseSearch (default=AsyncEA())) – Search method to use to find good pipelines. Should be instantiated. 
- post_processing (BasePostProcessing (default=BestFitPostProcessing())) – Post-processing method to create a model after the search phase. Should be an instantiated subclass of BasePostProcessing. 
- output_directory (str, optional (default=None)) – Directory to use to save GAMA output. This includes both intermediate results during search and logs. This directory must be empty or not exist. If set to None, generate a unique name (“gama_HEXCODE”). 
- store (str (default='logs')) – - Determines which data is stored after each run:
- ’nothing’: keep nothing from this run 
- ’models’: keep only cache with models and predictions 
- ’logs’: keep only the logs 
- ’all’: keep logs and cache with models and predictions 
 
 
 
 
GamaRegressor
Metrics
If you have a custom scoring function, you can define your own Metric.
Metric
MetricType
Search Methods
AsynchronousSuccessiveHalving
- class gama.search_methods.AsynchronousSuccessiveHalving(reduction_factor: Optional[int] = None, minimum_resource: Optional[Tuple[int, float]] = None, maximum_resource: Optional[Tuple[int, float]] = None, minimum_early_stopping_rate: Optional[int] = None)[source]
- Asynchronous Halving Algorithm by Li et al. - paper: https://arxiv.org/abs/1810.05934 - Parameters
- reduction_factor (int, optional (default=3)) – Reduction factor of candidates between each rung. 
- minimum_resource (int or float, optional (default=0.125)) – Number of samples to use in the lowest rung. If integer, it specifies the number of rows. If float, it specifies the fraction of the dataset. 
- maximum_resource (int or float optional (default=1.0)) – Number of samples to use in the top rung. If integer, it specifies the number of rows. If float, it specifies the fraction of the dataset. 
- minimum_early_stopping_rate (int (default=0)) – Number of lowest rungs to skip. 
 
 
AsyncEA
- class gama.search_methods.AsyncEA(population_size: Optional[int] = None, max_n_evaluations: Optional[int] = None, restart_callback: Optional[Callable[[], bool]] = None)[source]
- Perform asynchronous evolutionary optimization. - Parameters
- population_size (int, optional (default=50)) – Maximum number of individuals in the population at any time. 
- max_n_evaluations (int, optional (default=None)) – If specified, only a maximum of - max_n_evaluationsindividuals are evaluated. If None, the algorithm will be run until interrupted by the user or a timeout.
- restart_callback (Callable[[], bool], optional (default=None)) – Function which takes no arguments and returns True if search restart. 
 
 
RandomSearch
Post-Processing
NoPostProcessing
BestFitPostProcessing
EnsemblePostProcessing
- class gama.postprocessing.EnsemblePostProcessing(time_fraction: float = 0.3, ensemble_size: Optional[int] = 25, hillclimb_size: Optional[int] = 10000, max_models: Optional[int] = 200)[source]
- Ensemble construction per Caruana et al. - Parameters
- time_fraction (float (default=0.3)) – Fraction of total time reserved for Ensemble building. 
- ensemble_size (int, optional (default=25)) – Total number of models in the ensemble. When a single model is chosen more than once, it will increase its weight in the ensemble and does count towards this maximum. 
- hillclimb_size (int, optional (default=10_000)) – Number of predictions that are used to determine the ensemble score during hillclimbing. If - None, use all.
- max_models (int, optional (default=200)) – Only consider the best - max_modelsnumber of models. If- None, use all. Consequently also sets the max number of unique models in the ensemble.
 
 
Genetic Programming
Components
Defines the building blocks for Individuals.
Individuals represent machine learning pipelines in a back-end agnostic way.
An Individual can be converted to its back-end specific representation
(e.g. a scikit-learn Pipeline) by calling its pipeline property
as long as a function has been provided to convert the individual to it.
Individuals are built with:
Terminals. Definition of a specific value for a specific hyperparameter. Immutable.
- Primitives. Definition of a specific algorithm. Immutable.
Defined by Terminal input, output type and operation.
- PrimitiveNodes. Mutable for easy operations (e.g. mutation).
An instantiated Primitive with specific Terminals.
Fitness. Stores information about the evaluation of the individual.
Individual
- class gama.genetic_programming.components.Individual(main_node: PrimitiveNode, to_pipeline: Optional[Callable] = None)[source]
- Collection of PrimitiveNodes which together specify a machine learning pipeline. - Parameters
- main_node (PrimitiveNode) – The first node of the individual (the estimator node). 
- to_pipeline (Callable, optional (default=None)) – A function which can convert this individual into a machine learning pipeline. If not provided, the - pipelineproperty will be unavailable.
 
 
Primitive
PrimitiveNode
- class gama.genetic_programming.components.PrimitiveNode(primitive: Primitive, data_node: Union[PrimitiveNode, str], terminals: List[Terminal])[source]
- An instantiation for a Primitive with specific Terminals. - Parameters
- primitive (Primitive) – The Primitive type of this PrimitiveNode. 
- data_node (PrimitiveNode) – The PrimitiveNode that specifies all preprocessing before this PrimitiveNode. 
- terminals (List[Terminal]) – A list of terminals matching the - primitive.
 
 
Terminal
Mutation
Contains mutation functions for genetic programming. Each mutation function takes an individual and modifies it in-place.
- gama.genetic_programming.mutation.mut_insert(individual: Individual, primitive_set: dict) None[source]
- Mutate an Individual in-place by inserting a PrimitiveNode at a random location. - The new PrimitiveNode will not be inserted as root node. - Parameters
- individual (Individual) – Individual to mutate in-place. 
- primitive_set (dict) – 
 
 
- gama.genetic_programming.mutation.mut_replace_primitive(individual: Individual, primitive_set: dict) None[source]
- Mutates an Individual in-place by replacing one of its Primitives. - Parameters
- individual (Individual) – Individual to mutate in-place. 
- primitive_set (dict) – 
 
 
- gama.genetic_programming.mutation.mut_replace_terminal(individual: Individual, primitive_set: dict) None[source]
- Mutates an Individual in-place by replacing one of its Terminals. - Parameters
- individual (Individual) – Individual to mutate in-place. 
- primitive_set (dict) – 
 
 
- gama.genetic_programming.mutation.mut_shrink(individual: Individual, primitive_set: Optional[dict] = None, shrink_by: Optional[int] = None) None[source]
- Mutates an Individual in-place by removing any number of primitive nodes. - Primitive nodes are removed from the preprocessing end. - Parameters
- individual (Individual) – Individual to mutate in-place. 
- primitive_set (dict, optional) – Not used. Present to create a matching function signature with other mutations. 
- shrink_by (int, optional (default=None)) – Number of primitives to remove. Must be at least one greater than the number of primitives in - individual. If None, a random number of primitives is removed.
 
 
- gama.genetic_programming.mutation.random_valid_mutation_in_place(individual: Individual, primitive_set: dict, max_length: Optional[int] = None) Callable[source]
- Apply a random valid mutation in place. - The random mutation can be one of: - mut_random_primitive 
- mut_random_terminal, if the individual has at least one 
- mutShrink, if individual has at least two primitives 
- mutInsert, if it would not exceed - new_max_lengthwhen specified.
 - Parameters
- individual (Individual) – An individual to be mutated in-place. 
- primitive_set (dict) – A dictionary defining the set of primitives and terminals. 
- max_length (int, optional (default=None)) – If specified, impose a maximum length on the new individual. 
 
- Returns
- The mutation function used. 
- Return type
- Callable 
 
Crossover
Functions which take two Individuals and produce at least one new Individual.
- gama.genetic_programming.crossover.crossover_primitives(ind1: Individual, ind2: Individual) Tuple[Individual, Individual][source]
- Crossover two individuals by exchanging any number of preprocessing steps. - Parameters
- ind1 (Individual) – The individual to crossover with individual2. 
- ind2 (Individual) – The individual to crossover with individual1. 
 
 
- gama.genetic_programming.crossover.crossover_terminals(ind1: Individual, ind2: Individual) Tuple[Individual, Individual][source]
- Crossover two individuals in-place by exchanging two Terminals. - Terminals must share output type but have different values. - Parameters
- ind1 (Individual) – The individual to crossover with individual2. 
- ind2 (Individual) – The individual to crossover with individual1. 
 
 
- gama.genetic_programming.crossover.random_crossover(ind1: Individual, ind2: Individual, max_length: Optional[int] = None) Tuple[Individual, Individual][source]
- Random valid crossover between two individuals in-place, if it can be done. - Parameters
- ind1 (Individual) – The individual to crossover with ind2. 
- ind2 (Individual) – The individual to crossover with ind1. 
- max_length (int, optional(default=None)) – The first individual in the returned tuple has at most - max_lengthprimitives. Requires both provided individuals to contain at most- max_lengthprimitives.
 
- Raises
- If there is no valid crossover function for the two individuals. - If - max_lengthis set and either- ind1or- ind2contain more primitives than- max_length.
 
 
Utilities
Generic
Collection of generic components.
Pareto Front
- class gama.utilities.generic.paretofront.ParetoFront(start_list: Optional[List[Any]] = None, get_values_fn: Optional[Callable[[Any], Tuple[Any, ...]]] = None)[source]
- A list of tuples in which no one tuple is dominated by another. - Parameters
- start_list (list, optional (default=None).) – List of items of which to calculate the Pareto front. 
- get_values_fn (Callable, optional (default=None)) – Function that takes an item and returns a tuple of values, such that each should be maximized. If left None, it is assumed that items are already such tuples. 
 
 
Stopwatch
Timekeeper
- class gama.utilities.generic.timekeeper.TimeKeeper(total_time: Optional[int] = None)[source]
- Simple object that helps keep track of time over multiple activities. - Parameters
- total_time (int, optional (default=None)) – The total time available across activities. If set to None, the - total_time_remainingproperty will be unavailable.
 
AsyncEvaluator
Warning
I’m sure there are better tools out there, but I have yet to find a minimal easy multi-processing tool. I tried using the built-in ProcessPoolExecutor, but it had short comings such as not being able to cancel jobs while they were running.
- class gama.utilities.generic.async_evaluator.AsyncEvaluator(n_workers: int = 1, memory_limit_mb: Optional[int] = None, logfile: Optional[str] = None, wait_time_before_forced_shutdown: int = 10)[source]
- Manages subprocesses on which arbitrary functions can be evaluated. - The function and all its arguments must be picklable. Using the same AsyncEvaluator in two different contexts raises a - RuntimeError.- defaults: Dict, optional (default=None)
- Default parameter values shared between all submit calls. This allows these defaults to be transferred only once per process, instead of twice per call (to and from the subprocess). Only supports keyword arguments. 
 - Parameters
- n_workers (int (default=1)) – Maximum number of subprocesses to run for parallel evaluations. 
- memory_limit_mb (int, optional (default=None)) – The maximum number of megabytes that this process and its subprocesses may use in total. If None, no limit is enforced. There is no guarantee the limit is not violated. 
- logfile (str, optional (default=None)) – If set, recorded resource usage will be written to this file. 
- wait_time_before_forced_shutdown (int (default=10)) – Number of seconds to wait between asking the worker processes to shut down and terminating them forcefully if they failed to do so.