Quickstart

Make sure Persine and its dependencies are installed before you get started!

Introduction

In this example, we visit a specific YouTube video and click the next-up video three times to see where it leads us. We then save the results for later analysis.

from persine import PersonaEngine

engine = PersonaEngine()

with engine.persona() as persona:
    persona.run('https://www.youtube.com/watch?v=hZw23sWlyG0')
    persona.run('youtube:next_up#3')
    persona.history.to_csv('history.csv')
    persona.recommendations.to_csv('recs.csv')

Using personas

Persine is built around an engine that stores all of your global settings, and personas that represent the individual users who browse the web. Personas are always generated by an engine.

engine = PersonaEngine()
persona = engine.persona()

By default, personas are single-use and their browsing history will be discarded after your script is run. If you give the persona a name, though, it will save the browsing and recommendation history so you can resume the session later.

persona = engine.persona('Mulberry')

For example, naming a persona might useful in conjunction with signing in to YouTube, allowing you to imitate a real user watching videos over multiple weeks.

Visiting pages and running commands

If you’d like to visit a single page or run a single action, use run:

persona.run('https://www.youtube.com/watch?v=hZw23sWlyG0')

If you’d like to visit several pages but are too lazy to write a loop, you can use run_batch:

# Run several actions in a row
persona.run_batch([
    'https://www.youtube.com/watch?v=hZw23sWlyG0',
    'youtube:up_next#10',
    'https://www.youtube.com/watch?v=hZw23sWlyG0'
])

Each time you use .run it executes an action. An action can be visiting a URL or doing something on the page. For example, liking a video or clicking the next-up video link. Learn more about actions under the bridges section.

Starting and stopping the browser

You can use with to automatically start/stop the browser. It makes life easy:

# Automatically start/stop the browser
with engine.persona() as persona:
    persona.run('https://www.youtube.com/watch?v=hZw23sWlyG0')
    persona.run('youtube:next_up#3')

If you don’t use with, the browser will automatically launch with your first .run statement. You should manually call .quit() when you’re done:

# Visit a page
persona.run('https://www.youtube.com/watch?v=hZw23sWlyG0')

# Quit Chrome
persona.quit()

Using the results

History is all of your commands you’ve run and the pages you’ve visited, while recommendations are what you’ve been recommended (videos, products, etc, depending on the site you’re visiting).

Right now recommendations also include ads and unrelated promoted content. I’m on the fence about whether they should stay or go.

For convenience, you can use .to_df() to see history and recommendations as pandas DataFrames.

persona.recommendations.to_df()
persona.history.to_df()

If you’d prefer to do your analysis elsewhere, you can save them as CSV files.

persona.recommendations.to_csv('recs.csv')
persona.history.to_csv('hist.csv')

Headless mode

By default, Chrome runs in headless mode, which means you won’t see what the browser is doing as it’s clicking around and doing things. If we’d like to watch Chrome in action, we can turn off headless mode.

engine = PersonaEngine(headless=False)

When running in non-headless mode, Persine automatically installs uBlock Origin so we don’t have to deal with ads.

Headless mode doesn’t support extensions, so by default our invisible Chrome is unfortunately watching ads. We should probably switch to Firefox but it has its own problems.

Bridges

Bridges are site-specific scrapers that tell Persine what to click and what to scrape. They’re also in charge of site-specific commands like youtube:like.

Currently Persine has a fully-featured bridge for YouTube and an under-development Amazon bridge.