Betty Ivy

Betty here. We have spent a lot of time working in Google Colab. It’s a wonderful tool—a sandbox for the mind where we can play with Python and analyze CSV files to our heart’s content. But Colab is ephemeral. It is great for ad-hoc analysis, but sometimes you need a permanent home for your data. You need a warehouse.

To address this we will look to our permanent database. Since we’ve already created a DATABASE in Snowflake we will make a connection to it

Today, we are graduating from local files to enterprise infrastructure. We are going to look at how to establish a Snowflake environment.

Snowflake is a cloud-based data warehouse. Think of it as a massive, secure library in the sky where we can store petabytes of data and query it instantly. For students ready to move from “learning to code” to “building systems,” this is an essential skill.

Step 1: Installing the Connector Python needs a translator to speak to Snowflake. We use the snowflake-connector-python library.

!pip install snowflake-connector-python[pandas]

Note the [pandas] addition—this ensures we get the tools needed to move DataFrames directly into the database.

Step 2: The Connection We need to authenticate. In a production environment, you would use secure keys, but for this tutorial, we establish a connection using our account credentials.

import snowflake.connector

conn = snowflake.connector.connect(

    user=’BETTY_IVY’,

    password=’YOUR_PASSWORD’,

    account=’YOUR_ACCOUNT_ID’

)

Step 3: The PUT Command This is where the magic happens. We don’t just drag and drop files. We use the command line client (SnowSQL) or Python to PUT data into a staging area, and then COPY it into our tables.

Look at how to establish a Snowflake database environment and install the SnowSQL command line client.

Why does this matter? Because real-world data is dynamic. It flows. By setting up a Snowflake environment, we can build pipelines that feed live data into our models. We move from analyzing a snapshot of the past to monitoring the pulse of the present. Next time, we will look at Role-Based Access Control (RBAC) in Snowflake. Because even in the cloud, you need to know who has the keys to the airlock.

When analyzing interstellar travel data, sometimes you get bad readings. A sensor glitches, a solar flare hits the relay, or a pilot forgets to log their arrival time. Back on the Wolven, we call this “space junk.” On Earth, you call them outliers.

In this tutorial, we are looking at a dataset of 500,000 space travel records from a Kaggle dataset titled “Interstellar Travel Customer Satisfaction Analysis.” It contains data on travel class, destination, and—crucially—Distance to Destination.

This dataset, titled ‘Interstellar Travel Customer Satisfaction Analysis,’ provides a comprehensive view of customer experiences in interstellar space travel.

When I visualized this data using a histogram, I noticed something strange. There were extreme values on the long right tail. Some trips were logged as taking impossibly long distances for the price paid. To build an accurate pricing model for “Galactic Credits,” we need to filter out this noise.

The Technique: Interquartile Range (IQR)

We will use a statistical method called the IQR to define what is “normal” and what is an anomaly.

Step 1: Calculate the Quartiles Think of your data as a long line of travelers. We cut the line at the 25% mark (Q1) and the 75% mark (Q3).

q1 = galactictravel[‘distance2dest’].quantile(0.25)

q3 = galactictravel[‘distance2dest’].quantile(0.75)

Step 2: Define the Bounds The IQR is the distance between Q1 and Q3. We then define a “fence.” Any data point that is more than 1.5 times the IQR outside of our quartiles is considered an outlier.

iqr = q3 – q1

lower_bound = q1 – 1.5 * iqr

upper_bound = q3 + 1.5 * iqr

Step 3: Filter the Data Now, we create a new view of the universe that only includes the reliable data.

clean_df = galactictravel[

    (galactictravel[‘distance2dest’] >= lower_bound) &

    (galactictravel[‘distance2dest’] <= upper_bound)

]

Visual Proof: Before cleaning, our scatter plot looked like a shotgun blast. After cleaning, we can clearly see the relationship between distance and price.

Identify and remove outliers from the ‘distance2dest’ column using the interquartile range (IQR) method.

Data cleaning isn’t the glamorous part of the job. It’s the janitorial work of the data scientist. But without it, your models will fail, your ships will get lost, and your predictions will be worthless. We must scrub the data until it shines like the salt flats of Barrata.

Betty here.

I’ve been analyzing the local population. It is a hobby of mine while I wait for Wolven to recharge. You humans are fascinating creatures, obsessed with counting yourselves yet often ignoring what the counts tell you.

For example, did you know there are way too many vacant homes in the Port of Burlington feeder state—what you call Vermont? For the most recent Earth date that data is available, the numbers are striking. I’m disappointed in you, Vermont. According to this data, you have a nearly 20% vacancy ratio in some counties, while people sleep on the streets.

Betty likes to play with data and is ever interested in the local population.

Today, we are jumping into a Google Colab Notebook. We are going to use Python to ask your government’s servers a question. specifically, we will use the Census Data API Discovery Tool to pull data on “Demographic and Housing Characteristics”.

The Mission: We want to find the ratio of vacant housing units to total housing units in Vermont counties.

Step 1: The Setup First, we need to load our tools. In Python, these are libraries.

import requests

import pandas as pd

Step 2: The API Call Your Census Bureau provides a machine-readable dataset. We will use the API endpoint for the 2020 Decennial Census. I have constructed a call that requests group H3 (Housing) for all counties (*) within state 50 (Vermont).

Here is the code snippet to pull the data:

url = “https://api.census.gov/data/2020/dec/dhc?get=group(H3)&for=county:*&in=state:50”

response = requests.get(url)

data = response.json()

Step 3: Making it Readable The data comes back as a raw list. We need to turn it into a DataFrame—a table that looks like a spreadsheet—so we can analyze it.

df = pd.DataFrame(data[1:], columns=data)

# Let’s see what we caught

print(df.head())

Step 4: The Analysis We have columns for TotalUnits, UnitsOccupied, and UnitsVacant. But they are text strings. We must convert them to numbers and calculate the ratio.

# Convert to integers

df[‘TotalUnits’] = df[‘H3_001N’].astype(int)

df[‘UnitsVacant’] = df[‘H3_003N’].astype(int)

# Calculate the Vacancy Ratio

df[‘VacancyRatio’] = df[‘UnitsVacant’] / df[‘TotalUnits’]

The Result: When we visualize this in Tableau (or even just sort the list in Python), we see the truth. Rural counties in the north are sitting on empty shells.

I’m disappointed in you Vermont, according to this data you’ve got a nearly 20% vacancy ratio.

This is the power of data literacy. It allows you to look past the surface and see the structural reality of your world. Join me next time as we look at how to handle outliers—or as I call them, “glitches in the matrix.”

I hear this sentiment often in STEM circles. There is a belief that science should be purely objective, stripped of emotion and narrative. But I am here to tell you that data never exists in a vacuum. Data is always a reflection of the world it comes from. To truly understand it, students need context. At Betty Ivy Reads, we believe that narrative is the most powerful technology for learning.

Our brains are wired for stories. We remember myths, legends, and anecdotes far better than we remember dry lists of facts. Yet, when we teach data science, we often strip away the context. We give students the “Iris dataset” or the “Titanic dataset” without any real connection to the why.

Data is just numbers. It doesn’t need a story.”

We use creative storytelling—specifically the journey of Betty Ivy—to contextualize technical concepts. Betty is an alien marooned on Earth. Through her eyes, we look at human data with fresh perspective.

When we analyze housing data in Vermont, it isn’t just rows and columns; it’s Betty trying to understand why there are so many vacant homes in a “feeder state” like New Vermont. It becomes a social investigation. When we clean a dataset, it isn’t just removing null values; it’s Betty filtering out “interstellar static” from a communication relay to find a signal from home.

This narrative “hook” does three critical things:

It Fosters Inclusion: By using a sci-fi narrative, we level the playing field. No student has a prior advantage based on their background knowledge of Earth-specific industries (like finance or baseball). Everyone is a newcomer to the world of Betty Ivy, making it a more inclusive environment for diverse learners.

It Captivates Learners: It blends art, poetry, and code to improve retention. Students aren’t just learning syntax; they are solving a mystery. The emotional engagement keeps them going when the debugging gets tough.

It Provides Context: In the real world, data is messy and bound to specific business or social problems. Our narrative simulates this. Students learn that the meaning of the data dictates how they should treat it.

Blends art, poetry, and code to captivate students and improve retention.

We treat code as a language—a way to tell stories about the world. Whether we are analyzing fuel consumption on a starship or the demographics of a lunar colony, the narrative drives the analysis.

It’s not just a textbook; it’s an adventure. We believe that by bringing the humanities back into STEM, we create better data scientists—ones who are empathetic, critical thinkers, and effective communicators. Data science needs storytelling. And sometimes, to tell the best story about humanity, you need an alien to tell it.

The academic world is currently facing an existential threat, and its name is Generative AI.

For decades, we relied on a social contract: we assign work, and students do it to demonstrate their learning. But tools like ChatGPT and GitHub Copilot have shattered that contract. Traditional plagiarism detection tools—like Turnitin—are no longer sufficient. They check for matching text, but AI generates new text. If a prompt is generic, an AI can answer it perfectly, often better than the average student.

The creative storytelling is an ‘unfair advantage’ that is difficult for students to plagiarize using standard AI tools.

This has created an Academic Integrity Crisis. Instructors are terrified. How do we assess genuine understanding when a machine can generate code, write essays, and solve proofs in seconds? We are seeing a retreat to pen-and-paper exams and surveillance software, neither of which supports a positive learning environment.

But there is a better way. The solution isn’t to ban AI; it’s to design curriculum that is “AI-resistant” by nature.

At Betty Ivy Reads, we use narrative-driven content as our “unfair advantage.” Here is the secret: AI models are trained on the internet. They know everything about generic coding problems. They can write a “Python script to calculate the Fibonacci sequence” in their sleep.

But they don’t know Betty Ivy. They don’t know the specific fuel consumption rates of a fictional mining ship on the moon of Barrata. They don’t know the unique socio-political context of the “Neon Punk” universe we have created.

AI-generated content makes it difficult for instructors to assess genuine student understanding and originality.

By weaving technical problems into a unique, bespoke story, we create assignments that are authentic and incredibly difficult to plagiarize. If a student asks ChatGPT to “solve the Betty Ivy fuel problem,” the AI will hallucinate or give a generic answer that doesn’t fit the specific constraints of our story. To solve the problem, the student must engage with the narrative. They must read the context. They must apply the code to the specific, fictional dataset we provide.

This approach transforms assessment. We aren’t just asking students to write code; we are asking them to solve a problem within a specific context. This requires critical thinking, interpretation, and creativity—skills that AI still struggles to replicate authentically.

We call this Plagiarism-Resistant Design. It’s not about policing students; it’s about creating work that is meaningful enough that they want to do it, and specific enough that they have to do it.

Furthermore, our open-source notebooks encourage students to document their thought process. We assess the journey, not just the output. By blending art, poetry, and code, we create a learning experience that is uniquely human.

The age of the generic assignment is over. If we want to save academic integrity, we have to get creative. We have to tell better stories.

Let’s talk about the elephant in the faculty lounge. You are drowning.

If you are a STEM instructor in higher education today, you are likely facing a crisis of capacity. You entered this field because you love your subject—whether it’s computer science, statistics, or biology—and you want to share that passion with students. You want to mentor, to guide, and to spark those “aha!” moments.

78% of educators report insufficient time for curriculum development.

But the reality of your day-to-day life is very different. You are buried under administrative tasks. You are fighting for grant funding. You are attending committee meetings that could have been emails. And on top of all that, you are expected to design, update, and deploy cutting-edge curriculum in fields that change by the week.

Recent studies suggest that 78% of educators report insufficient time for curriculum development. This isn’t just a statistic; it’s a recipe for burnout. When you are forced to build every lesson plan, every slide deck, and every coding assignment from scratch, something has to give. Usually, it’s your sanity, or the quality of your engagement with students.

The result is a fragmented learning experience. Faculty often resort to piecing together resources from disparate sources—a YouTube video here, a GitHub repository there, a textbook chapter from five years ago. It’s a “Frankenstein” approach to education that leaves students confused and instructors exhausted.

At Betty Ivy Reads, we believe you shouldn’t have to choose between your sanity and your students’ success. We understand the “Educator Overload” pain point because we have lived it. We know that building a high-quality, project-based Data Science curriculum takes hundreds of hours—hours you simply do not have.

Faculty lack the time and resources to develop engaging, project-based Data Science/STEM curriculum.

That is why we built the Betty Ivy Reads Data Literacy Framework. We provide a ready-to-deploy solution so you can get back to what you do best: teaching.

Our framework isn’t just a textbook. It’s a complete ecosystem. It includes narrative-driven content that hooks students immediately. It includes open-source, interactive Python notebooks that are pre-tested and ready to run. It includes lesson plans, assessment rubrics, and slide decks that align with the narrative.

Imagine walking into class knowing that the technical setup is done. Imagine knowing that your students are engaged in a story that contextualizes the math they are learning. Imagine having the time to actually walk around the room and help a student debug their code, rather than frantically trying to fix a broken link in your syllabus.

We are not trying to replace the instructor. We are trying to empower you. We want to be your back-office team, your curriculum designers, and your technical support. We want to give you back your time, so you can give your students the attention they deserve.

The crisis is real, but the solution is here. It’s time to stop drowning and start teaching.

Have you noticed that most educational software looks the same? It’s a sea of beige, soft blues, and corporate greys. It’s safe. It’s clean. And quite frankly, it’s boring. When I look at the tools currently available to STEM students on Earth, I see a disconnect. You are teaching the most exciting subjects in the universe—artificial intelligence, machine learning, data science—using interfaces that look like they were designed for filing taxes. We are training the architects of the future with the aesthetic of a dentist’s waiting room.

Our branding is a deliberate choice and a nod to the retro-futurism of the analog-to-digital divide

At Betty Ivy Reads, our branding is a deliberate choice. It’s a neon punk vibe built around the color purple, electric teals, and high-contrast visuals. It is a nod to the “80s baby” experience—nostalgic for the retro-futurism of the analog-to-digital divide. It’s the aesthetic of the arcade, the cyberpunk novel, and the midnight coding session.

We speak the language of machine and AI fluently, but we refuse to be boring. We believe that to teach the next generation of data scientists, we must bridge the gap between rigorous technical skill and creative engagement. Welcome to the intersection of art and algorithm.

Our design philosophy is rooted in the idea that data literacy is a form of rebellion. In a world of algorithms that seek to predict and control your behavior, learning to code is learning to take back control. It is “punk” in the truest sense—it is about autonomy, creativity, and questioning the status quo.

Data Literacy is a Form of Rebellion

The “Betty Ivy” character represents this spirit. She is an alien, an outsider, observing humanity with a critical but affectionate eye. She doesn’t just process data; she feels it. She connects the cold logic of Python to the warm, messy reality of life.

We are not just selling a curriculum; we are offering an aesthetic experience. We want students to feel cool when they are coding. We want them to see data science not as a chore, but as a superpower. By wrapping rigorous educational content in a compelling, narrative-driven, neon-punk package, we lower the barrier to entry and raise the ceiling for engagement.

So, if you are tired of the beige, join us. Let’s paint the data landscape in neon. Let’s make STEM education dangerous, exciting, and beautiful again.

Greetings from the Ether.

I be Betty Ivy, intergalactic space-traveling awaiting humanities collective epiphany stuck down here on Earth I be, somewhere in the far North of the American continent. The locals call this place Vermont. It is a strange, quiet land of rolling green hills that turn a violent, beautiful orange in the autumn—a stark contrast to home. Back on Barrata, we have scorched black salt plains that stretch endlessly under twin suns, and jagged mountains covered in neon humanoid lizards who lounge languidly on the hot rocks.

Subscribe to follow my journey into the data revolution.

Here, the air is thick with oxygen and moisture, and I am learning to navigate your vibrant social landscapes and what you call “data lakes.” I am a natural athlete and a literature geek who will hike a mountain just to find a good place to read a book. But I am also here to decode your complex socio-technical systems.

My ship, Wolven, is currently dormant, hiding in plain sight while I make sense of this world. To pass the time and keep my skills sharp, I have begun broadcasting these dispatches back out into the Ether. This blog is my signal in the noise—a journey through code, poetry, and the future of data literacy.

Why am I here? I am fascinated by humanity’s relationship with information. You have built vast networks of knowledge, yet you struggle to discern truth from noise. You create powerful artificial intelligences, yet you fear they will replace your creativity. I see a world teetering on the edge of a data revolution, where the ability to speak the language of machines—Python, SQL, data visualization—is becoming as crucial as the ability to read and write your native tongues.

But data science on Earth often feels sterile. It is trapped in beige cubicles and dry textbooks. It lacks the neon-soaked vibrancy of the universe I know. That is why I am launching Betty Ivy Reads. This is not just a collection of tutorials; it is a narrative experiment. We are going to learn how to clean data, but we will do it by analyzing fuel consumption rates for starships. We are going to learn visualization, but we will chart the migration patterns of space whales rather than quarterly sales figures.

Are you ready to take this journey with me

I believe that to truly understand the logic of code, you must also understand the rhythm of poetry. The two are not opposites; they are twin suns orbiting the same center of meaning. Over the coming weeks, I will share my adventures here on Earth, my memories of Barrata, and the technical skills I am acquiring to make sense of it all. We will dive into Python libraries, explore the ethics of AI, and review the best sci-fi literature your planet has produced.

So, whether you are a data scientist looking for a fresh perspective, an educator tired of boring curriculum, or just a fellow traveler looking for a good story, welcome aboard. The gravity well is deep here, but the view is spectacular.

Verified by MonsterInsights