Module mimir.attacks.quantile

Implementation of the attack proposed in 'Scalable Membership Inference Attacks via Quantile Regression' https://arxiv.org/pdf/2307.03694.pdf

Classes

class CustomTrainer (alpha_fpr, **kwargs)
Expand source code
class CustomTrainer(Trainer):
    def __init__(
        self,
        alpha_fpr,
        **kwargs,
    ):
        super().__init__(**kwargs)
        self.alpha_fpr = alpha_fpr

    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.pop("labels")
        # forward pass
        outputs = model(**inputs)
        logits = outputs.get("logits")
        loss = ch.mean(
            ch.max(
                self.alpha_fpr * (logits - labels),
                (1 - self.alpha_fpr) * (labels - logits),
            )
        )
        return (loss, outputs) if return_outputs else loss

Trainer is a simple but feature-complete training and eval loop for PyTorch, optimized for 🤗 Transformers.

Args

model ([PreTrainedModel] or torch.nn.Module, optional): The model to train, evaluate or use for predictions. If not provided, a model_init must be passed.

<Tip>

[`Trainer`] is optimized to work with the [`PreTrainedModel`] provided by the library. You can still use
your own models defined as <code>torch.nn.Module</code> as long as they work the same way as the 🤗 Transformers
models.

</Tip>

args ([TrainingArguments], optional): The arguments to tweak for training. Will default to a basic instance of [TrainingArguments] with the output_dir set to a directory named tmp_trainer in the current directory if not provided. data_collator (DataCollator, optional): The function to use to form a batch from a list of elements of train_dataset or eval_dataset. Will default to [default_data_collator] if no processing_class is provided, an instance of [DataCollatorWithPadding] otherwise if the processing_class is a feature extractor or tokenizer. train_dataset (Union[torch.utils.data.Dataset, torch.utils.data.IterableDataset, datasets.Dataset], optional): The dataset to use for training. If it is a [~datasets.Dataset], columns not accepted by the model.forward() method are automatically removed.

Note that if it's a <code>torch.utils.data.IterableDataset</code> with some randomization and you are training in a
distributed fashion, your iterable dataset should either use a internal attribute <code>generator</code> that is a
<code>torch.Generator</code> for the randomization that must be identical on all processes (and the Trainer will
manually set the seed of this <code>generator</code> at each epoch) or have a <code>set\_epoch()</code> method that internally
sets the seed of the RNGs used.

eval_dataset (Union[torch.utils.data.Dataset, dict[str, torch.utils.data.Dataset, datasets.Dataset]), optional): The dataset to use for evaluation. If it is a [~datasets.Dataset], columns not accepted by the model.forward() method are automatically removed. If it is a dictionary, it will evaluate on each dataset prepending the dictionary key to the metric name. processing_class (PreTrainedTokenizerBase or BaseImageProcessor or FeatureExtractionMixin or ProcessorMixin, optional): Processing class used to process the data. If provided, will be used to automatically process the inputs for the model, and it will be saved along the model to make it easier to rerun an interrupted training or reuse the fine-tuned model. This supersedes the tokenizer argument, which is now deprecated. model_init (Callable[[], PreTrainedModel], optional): A function that instantiates the model to be used. If provided, each call to [~Trainer.train] will start from a new instance of the model as given by this function.

The function may have zero argument, or a single one containing the optuna/Ray Tune/SigOpt trial object, to
be able to choose different architectures according to hyper parameters (such as layer count, sizes of
inner layers, dropout probabilities etc).

compute_loss_func (Callable, optional): A function that accepts the raw model outputs, labels, and the number of items in the entire accumulated batch (batch_size * gradient_accumulation_steps) and returns the loss. For example, see the default loss function used by [Trainer]. compute_metrics (Callable[[EvalPrediction], Dict], optional): The function that will be used to compute metrics at evaluation. Must take a [EvalPrediction] and return a dictionary string to metric values. Note When passing TrainingArgs with batch_eval_metrics set to True, your compute_metrics function must take a boolean compute_result argument. This will be triggered after the last eval batch to signal that the function needs to calculate and return the global summary statistics rather than accumulating the batch-level statistics callbacks (List of [TrainerCallback], optional): A list of callbacks to customize the training loop. Will add those to the list of default callbacks detailed in here.

If you want to remove one of the default callbacks used, use the [`Trainer.remove_callback`] method.

optimizers (tuple[torch.optim.Optimizer, torch.optim.lr_scheduler.LambdaLR], optional, defaults to (None, None)): A tuple containing the optimizer and the scheduler to use. Will default to an instance of [AdamW] on your model and a scheduler given by [get_linear_schedule_with_warmup] controlled by args. optimizer_cls_and_kwargs (tuple[Type[torch.optim.Optimizer], dict[str, Any]], optional): A tuple containing the optimizer class and keyword arguments to use. Overrides optim and optim_args in args. Incompatible with the optimizers argument.

Unlike <code>optimizers</code>, this argument avoids the need to place model parameters on the correct devices before initializing the Trainer.

preprocess_logits_for_metrics (Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional): A function that preprocess the logits right before caching them at each evaluation step. Must take two tensors, the logits and the labels, and return the logits once processed as desired. The modifications made by this function will be reflected in the predictions received by compute_metrics.

Note that the labels (second parameter) will be <code>None</code> if the dataset does not have them.

Important attributes:

- **model** -- Always points to the core model. If using a transformers model, it will be a [`PreTrainedModel`]
  subclass.
- **model_wrapped** -- Always points to the most external model in case one or more other modules wrap the
  original model. This is the model that should be used for the forward pass. For example, under <code>DeepSpeed</code>,
  the inner model is wrapped in <code>DeepSpeed</code> and then again in <code>torch.nn.DistributedDataParallel</code>. If the inner
  model hasn't been wrapped, then <code>self.model\_wrapped</code> is the same as <code>self.model</code>.
- **is_model_parallel** -- Whether or not a model has been switched to a model parallel mode (different from
  data parallelism, this means some of the model layers are split on different GPUs).
- **place_model_on_device** -- Whether or not to automatically place the model on the device - it will be set
  to <code>False</code> if model parallel or deepspeed is used, or if the default
  <code>TrainingArguments.place\_model\_on\_device</code> is overridden to return <code>False</code> .
- **is_in_train** -- Whether or not a model is currently running <code>train</code> (e.g. when <code>evaluate</code> is called while
  in <code>train</code>)

Ancestors

  • transformers.trainer.Trainer

Methods

def compute_loss(self, model, inputs, return_outputs=False)
Expand source code
def compute_loss(self, model, inputs, return_outputs=False):
    labels = inputs.pop("labels")
    # forward pass
    outputs = model(**inputs)
    logits = outputs.get("logits")
    loss = ch.mean(
        ch.max(
            self.alpha_fpr * (logits - labels),
            (1 - self.alpha_fpr) * (labels - logits),
        )
    )
    return (loss, outputs) if return_outputs else loss

How the loss is computed by Trainer. By default, all models return the loss in the first element.

Args

model (nn.Module): The model to compute the loss for. inputs (dict[str, Union[torch.Tensor, Any]]): The input data for the model. return_outputs (bool, optional, defaults to False): Whether to return the model outputs along with the loss. num_items_in_batch (Optional[torch.Tensor], optional): The number of items in the batch. If num_items_in_batch is not passed,

Returns

The loss of the model along with its output if return_outputs was set to True Subclass and override for custom behavior. If you are not using num_items_in_batch when computing your loss, make sure to overwrite self.model_accepts_loss_kwargs to False. Otherwise, the loss calculationg might be slightly inacurate when performing gradient accumulation.

class QuantileAttack (config,
model: Model,
alpha: float)
Expand source code
class QuantileAttack(Attack):
    """
    Implementation of the attack proposed in 'Scalable Membership Inference Attacks via Quantile Regression'
    https://arxiv.org/pdf/2307.03694.pdf
    """

    def __init__(self, config, model: Model, alpha: float):
        """
        alpha (float): Desired FPR
        """
        ref_model = QuantileReferenceModel(
            config, name="Sreevishnu/funnel-transformer-small-imdb"
        )
        super().__init__(self, config, model, ref_model)
        self.alpha = alpha

    def _train_quantile_model(self, dataset):
        def tokenize_function(examples):
            return self.ref_model.tokenizer(
                examples["text"], padding="max_length", truncation=True
            )

        tokenized_dataset = dataset.map(tokenize_function, batched=True)
        training_args = TrainingArguments(
            output_dir="quantile_ref_model",
            evaluation_strategy="epoch",
            num_train_epochs=1,
        )

        def compute_metrics(eval_pred):
            predictions, labels = eval_pred
            rmse = mean_squared_error(labels, predictions, squared=False)
            return {"rmse": rmse}

        trainer = CustomTrainer(
            alpha_fpr=self.alpha,
            model=self.ref_model.model,
            args=training_args,
            train_dataset=tokenized_dataset,
            eval_dataset=tokenized_dataset,
            compute_metrics=compute_metrics,
        )
        # Train quantile model
        trainer.train()

    def prepare(self, known_non_members):
        """
        Step 1: Use non-member dataset, collect confidence scores for correct label.
        Step 2: Train a quantile regression model that takes X as input and predicts quantile. Use pinball loss
        Step 3: Test by checking if member: score is higher than output of quantile regression model.
        """

        # Step 1: Use non-member dataset, collect confidence scores for correct label.
        # Get likelihood scores from target model for known_non_members
        # Note that these non-members should be different from the ones in testing
        scores = [self.target_model.get_ll(x) for x in known_non_members]
        # Construct a dataset out of this to be used in Huggingface, with
        # "text" containing the actual data, and "labels" containing the scores
        dataset = Dataset.from_dict({"text": known_non_members, "labels": scores})

        # Step 2: Train a quantile regression model that takes X as input and predicts quantile. Use pinball loss
        self._train_quantile_model(dataset)

    def attack(self, document, **kwargs):
        # Step 3: Test by checking if member: score is higher than output of quantile regression model.

        # Get likelihood score from target model for doc
        ll = self.target_model.get_ll(document)

        # Return ll - quantile_model(doc)
        tokenized = self.ref_model.tokenizer(document, return_tensors="pt")
        # Shift items in the dictionary to the correct device
        tokenized = {k: v.to(self.ref_model.model.device, non_blocking=True) for k, v in tokenized.items()}
        quantile_score = self.ref_model.model(**tokenized)
        print(quantile_score)
        quantile_score = quantile_score.logits.item()

        # We want higher score to be non-member
        return quantile_score - ll

Implementation of the attack proposed in 'Scalable Membership Inference Attacks via Quantile Regression' https://arxiv.org/pdf/2307.03694.pdf

alpha (float): Desired FPR

Ancestors

Methods

def prepare(self, known_non_members)
Expand source code
def prepare(self, known_non_members):
    """
    Step 1: Use non-member dataset, collect confidence scores for correct label.
    Step 2: Train a quantile regression model that takes X as input and predicts quantile. Use pinball loss
    Step 3: Test by checking if member: score is higher than output of quantile regression model.
    """

    # Step 1: Use non-member dataset, collect confidence scores for correct label.
    # Get likelihood scores from target model for known_non_members
    # Note that these non-members should be different from the ones in testing
    scores = [self.target_model.get_ll(x) for x in known_non_members]
    # Construct a dataset out of this to be used in Huggingface, with
    # "text" containing the actual data, and "labels" containing the scores
    dataset = Dataset.from_dict({"text": known_non_members, "labels": scores})

    # Step 2: Train a quantile regression model that takes X as input and predicts quantile. Use pinball loss
    self._train_quantile_model(dataset)

Step 1: Use non-member dataset, collect confidence scores for correct label. Step 2: Train a quantile regression model that takes X as input and predicts quantile. Use pinball loss Step 3: Test by checking if member: score is higher than output of quantile regression model.

Inherited members