In [None]:
from lib.xgta import Xgta
import connection_psql as creds
xgta = Xgta(
    creds=creds, 
    streaming=True, # Streaming uses less RAM, but is slower. Container or notebook may fail/shut down if RAM limit is exceeded.
)

------

Calculate frequencey of tweets per day that contain certain (case insensitive) keywords with:

```python
r1 = xgta.frequency_of_tweets_per_day_containing(keywords)
```

The `keywords` variable is case insensitive and performs a regex search. Stringing multiple keywords together is possible with the `|`-Operator.

For example, tweets contianing the words "Rassismus" and / or "Diskriminierung" use the following):

In [None]:
r1 = xgta.frequency_of_tweets_per_day_containing("rassismus|diskriminierung")

The returned object (in the example above `r1`) contains the `Polars.DataFrame` as `.df` (e.g., `r1.df`) and a method `.plot_frequency_of()` to plot a timeseries of the dataframe.

In [None]:
r1.df # Returns the DataFrame itself

In [None]:
r1.plot_frequency_of(['search_term:all_tweets']) # Plots a timeseries of all tweets fitting the search criteria.

It is also possibel to plot multiple columns of the DataFrame in one figure. (Uncomment or comment with `#`)

In [None]:
r1.plot_frequency_of([
    'search_term:all_tweets',
    'search_term:is_retweet',
    'search_term:is_original_tweet',
    'percent:all_tweets',
    'percent:is_retweet',
    'percent:is_original_tweet',
    # 'all_tweets',
    # 'is_retweet',
    # 'is_original_tweet',
])

Plots are interactive: 

- Deselect columns by clicking on the ledgend.
- Draw rectangles to zoom in.
- Double click to reset plots to their default view.

---

To keep previous results, and search for new terms, add save results in a new return-object, e.g., `r2`:

In [None]:
r2 = xgta.frequency_of_tweets_per_day_containing("krise")
r2.plot_frequency_of(['search_term:is_retweet','search_term:is_original_tweet'])
r2.df.describe()

Compare multiple results with each other:

In [None]:
xgta.plot_frequency_of(
    results=[r1, r2], # Two or more results in an array []
    plot=[
        'search_term:is_retweet',
        'search_term:is_original_tweet',
    ],
    shared_xaxes=True, # Optional
    shared_yaxes=True, # Optional
)

---

## Access dataset with polars directly

In [None]:
import polars as pl

Get list of available columns.

In [None]:
xgta.df_xgta.collect_schema().names()

In [None]:
pl.Config(fmt_str_lengths=350)

q = (
    xgta.df_xgta
    .limit(1_000_000) # Use limit to develop your queries. It greatly speeds up the development time.
    .filter(
        pl.col('text').str.to_lowercase().str.contains(r"\bwir\b")
        &
        pl.col('isretweet').not_()
    )
    .select(["postdate", "text"])
)

q.collect()