← Home

⚡Apache Spark

⌘K
ðŸĪ–
Claude Code AI Tools
ðŸĪ—
Hugging Face AI Tools
ðŸĶœ
LangChain AI Tools
🧠
Keras AI Tools
ðŸĶ™
Ollama AI Tools
🐍
Python Programming Languages
ðŸŸĻ
JavaScript Programming Languages
🔷
TypeScript Programming Languages
⚛ïļ
React Programming Languages
ðŸđ
Go Programming Languages
ðŸĶ€
Rust Programming Languages
📊
MATLAB Programming Languages
🗄ïļ
SQL Programming Languages
⚙ïļ
C/C++ Programming Languages
☕
Java Programming Languages
ðŸŸĢ
C# Programming Languages
🍎
Swift Programming Languages
🟠
Kotlin Programming Languages
â–ē
Next.js Programming Languages
💚
Vue.js Programming Languages
ðŸ”Ĩ
Svelte Programming Languages
ðŸŽĻ
Tailwind CSS Programming Languages
💚
Node.js Programming Languages
🌐
HTML Programming Languages
ðŸŽĻ
CSS/SCSS Programming Languages
🐘
PHP Programming Languages
💎
Ruby Programming Languages
ðŸ”ī
Scala Programming Languages
📊
R Programming Languages
ðŸŽŊ
Dart Programming Languages
💧
Elixir Programming Languages
🌙
Lua Programming Languages
🐊
Perl Programming Languages
🅰ïļ
Angular Programming Languages
🚂
Express.js Programming Languages
ðŸą
NestJS Programming Languages
ðŸ›Īïļ
Ruby on Rails Programming Languages
◾ïļ
GraphQL Programming Languages
🟊
Haskell Programming Languages
💚
Nuxt.js Programming Languages
🔷
SolidJS Programming Languages
⚡
htmx Programming Languages
ðŸ’ŧ
VS Code Development Tools
🧠
PyCharm Development Tools
📓
Jupyter Development Tools
🧠
IntelliJ IDEA Development Tools
💚
Neovim Development Tools
ðŸ”Ū
Emacs Development Tools
🔀
Git DevOps & CLI
ðŸģ
Docker DevOps & CLI
â˜ļïļ
Kubernetes DevOps & CLI
☁ïļ
AWS CLI DevOps & CLI
🔄
GitHub Actions DevOps & CLI
🐧
Linux Commands DevOps & CLI
ðŸ’ŧ
Bash Scripting DevOps & CLI
🌐
Nginx DevOps & CLI
📝
Vim DevOps & CLI
ðŸ”Ļ
Makefile DevOps & CLI
🧊
Pytest DevOps & CLI
🊟
Windows DevOps & CLI
ðŸ“Ķ
Package Managers DevOps & CLI
🍎
macOS DevOps & CLI
🏗ïļ
Terraform DevOps & CLI
🔧
Ansible DevOps & CLI
⎈
Helm DevOps & CLI
ðŸ”Ļ
Jenkins DevOps & CLI
ðŸ”Ĩ
Prometheus DevOps & CLI
📊
Grafana DevOps & CLI
ðŸ’ŧ
Zsh DevOps & CLI
🐟
Fish Shell DevOps & CLI
💙
PowerShell DevOps & CLI
🔄
Argo CD DevOps & CLI
🔀
Traefik DevOps & CLI
☁ïļ
Azure CLI DevOps & CLI
☁ïļ
Google Cloud CLI DevOps & CLI
📟
tmux DevOps & CLI
🔧
jq DevOps & CLI
✂ïļ
sed DevOps & CLI
📊
awk DevOps & CLI
🌊
Apache Airflow DevOps & CLI
ðŸ”Ē
NumPy Databases & Data
🐞
Pandas Databases & Data
ðŸ”Ĩ
PyTorch Databases & Data
🧠
TensorFlow Databases & Data
📈
Matplotlib Databases & Data
🐘
PostgreSQL Databases & Data
🐎
MySQL Databases & Data
🍃
MongoDB Databases & Data
ðŸ”ī
Redis Databases & Data
🔍
Elasticsearch Databases & Data
ðŸĪ–
Scikit-learn Databases & Data
👁ïļ
OpenCV Databases & Data
⚡
Apache Spark Databases & Data
ðŸŠķ
SQLite Databases & Data
⚡
Supabase Databases & Data
ðŸ”ĩ
Neo4j Databases & Data
ðŸ“Ļ
Apache Kafka Databases & Data
🐰
RabbitMQ Databases & Data
ðŸ”Ī
Regex Utilities
📝
Markdown Utilities
📄
LaTeX Utilities
🔐
SSH & GPG Utilities
🌐
curl & HTTP Utilities
📜
reStructuredText Utilities
🚀
Postman Utilities
🎎
FFmpeg Utilities
🖞ïļ
ImageMagick Utilities
🔍
ripgrep Utilities
🔍
fzf Utilities
📗
Microsoft Excel Office Applications
📘
Microsoft Word Office Applications
📙
Microsoft PowerPoint Office Applications
📝
Hancom Hangul Hancom Office
ðŸ“―ïļ
Hancom Hanshow Hancom Office
📊
Hancom Hancell Hancom Office
📄
Google Docs Google Workspace
📊
Google Sheets Google Workspace
ðŸ“―ïļ
Google Slides Google Workspace
🔌
Cadence Virtuoso EDA & Hardware
⚙ïļ
Synopsys EDA EDA & Hardware
💎
Verilog & VHDL EDA & Hardware
⚡
LTSpice EDA & Hardware
🔧
KiCad EDA & Hardware
📝
Notion Productivity
💎
Obsidian Productivity
💎
Slack Productivity
ðŸŽŪ
Discord Productivity
ðŸŽĻ
Figma Design Tools
📘
Confluence Atlassian
📋
Jira Atlassian
🃏
Jest Testing
⚡
Vitest Testing
🎭
Playwright Testing
ðŸŒē
Cypress Testing
🌐
Selenium Testing
💙
Flutter Mobile Development
ðŸ“ą
React Native Mobile Development
🍎
SwiftUI Mobile Development
ðŸ“ą
Expo Mobile Development
🐍
Django Web Frameworks
⚡
FastAPI Web Frameworks
ðŸŒķïļ
Flask Web Frameworks
🍃
Spring Boot Web Frameworks
ðŸļ
Gin Web Frameworks
⚡
Vite Build Tools
ðŸ“Ķ
Webpack Build Tools
⚡
esbuild Build Tools
🐘
Gradle Build Tools
ðŸŠķ
Maven Build Tools
🔧
CMake Build Tools
ðŸŽŪ
Unity Game Development
ðŸĪ–
Godot Game Development
🔌
Arduino Embedded & IoT
🔍
Nmap Security
🐕
Datadog Monitoring
📖
Swagger/OpenAPI Documentation
No results found
EN KO

🚀 Setup & SparkSession

⚙ïļ SparkSession

from pyspark.sql import SparkSession Import SparkSession
spark = SparkSession.builder.appName("App").getOrCreate() Create session
.master("local[*]") Local mode (all cores)
.config("spark.executor.memory", "4g") Set executor memory
.config("spark.driver.memory", "2g") Set driver memory
spark.stop() Stop session
sc = spark.sparkContext Get SparkContext

📊 DataFrame Basics

➕ Creating DataFrames

df = spark.read.csv("file.csv", header=True, inferSchema=True) Read CSV
df = spark.read.json("file.json") Read JSON
df = spark.read.parquet("file.parquet") Read Parquet
df = spark.createDataFrame(data, schema) From Python data
df = spark.createDataFrame(pandas_df) From Pandas
df.write.csv("output.csv") Write CSV
df.write.parquet("output.parquet") Write Parquet

📋 DataFrame Info

df.show() Show first 20 rows
df.show(n, truncate=False) Show n rows (no truncate)
df.printSchema() Print schema
df.columns Column names
df.dtypes Column types
df.count() Row count
df.describe().show() Statistics

🔍 Selection & Filtering

📌 Column Selection

df.select("col1", "col2") Select columns
df.select(df.col1, df.col2) Select (Column objects)
from pyspark.sql.functions import col Import col function
df.select(col("col1")) Select with col()
df.selectExpr("col1", "col2 * 2 as doubled") Select with expressions
df.drop("col1") Drop column

🔧 Filtering

df.filter(df.col > 10) Filter rows
df.filter("col > 10") Filter (string expr)
df.where(df.col == "value") Where (alias)
df.filter((df.col1 > 10) & (df.col2 < 20)) Multiple conditions
df.filter(df.col.isin([1, 2, 3])) Filter with isin
df.filter(df.col.isNull()) Filter null
df.filter(df.col.isNotNull()) Filter not null
df.filter(df.col.like("%pattern%")) Filter with like

🔄 Transformations

➕ Column Operations

df.withColumn("new", df.col * 2) Add/replace column
df.withColumnRenamed("old", "new") Rename column
df.withColumn("col", df.col.cast("integer")) Cast type
df.withColumn("col", F.when(condition, value).otherwise(other)) Conditional column
df.withColumn("col", F.lit("constant")) Constant value
df.fillna(0) Fill null values
df.na.drop() Drop null rows

📋 Row Operations

df.distinct() Remove duplicates
df.dropDuplicates(["col1"]) Drop duplicates by column
df.orderBy("col") Sort ascending
df.orderBy(df.col.desc()) Sort descending
df.limit(10) Limit rows
df.sample(fraction=0.1) Random sample

📊 Aggregation & Grouping

📈 Aggregation

from pyspark.sql import functions as F Import functions
df.agg(F.sum("col")) Sum
df.agg(F.avg("col")) Average
df.agg(F.count("col")) Count
df.agg(F.min("col"), F.max("col")) Min/Max
df.agg(F.countDistinct("col")) Count distinct

ðŸ‘Ĩ Group By

df.groupBy("col").count() Count per group
df.groupBy("col").sum("value") Sum per group
df.groupBy("col").agg(F.avg("val"), F.max("val")) Multiple aggregations
df.groupBy("col1", "col2").count() Group by multiple
df.groupBy("col").pivot("pivot_col").sum("val") Pivot table

🔗 Joins & Unions

ðŸĪ Joins

df1.join(df2, "key") Inner join on key
df1.join(df2, df1.k1 == df2.k2) Join on expression
df1.join(df2, "key", "left") Left join
df1.join(df2, "key", "right") Right join
df1.join(df2, "key", "outer") Full outer join
df1.join(df2, "key", "left_anti") Left anti join
df1.crossJoin(df2) Cross join

➕ Union

df1.union(df2) Union (keep duplicates)
df1.unionByName(df2) Union by column names
df1.intersect(df2) Intersection
df1.subtract(df2) Subtract

🗄ïļ SQL Queries

ðŸ’ū Spark SQL

df.createOrReplaceTempView("table") Create temp view
spark.sql("SELECT * FROM table") Run SQL query
spark.sql("SELECT col, COUNT(*) FROM table GROUP BY col") SQL aggregation
df.createOrReplaceGlobalTempView("gtable") Global temp view
spark.sql("SELECT * FROM global_temp.gtable") Query global view

🔧 Built-in Functions

📝 String Functions

F.upper(col) / F.lower(col) Upper/lower case
F.trim(col) / F.ltrim(col) / F.rtrim(col) Trim whitespace
F.length(col) String length
F.substring(col, start, len) Substring
F.concat(col1, col2) Concatenate
F.split(col, pattern) Split string
F.regexp_replace(col, pattern, replacement) Regex replace

📅 Date Functions

F.current_date() Current date
F.current_timestamp() Current timestamp
F.year(col) / F.month(col) / F.day(col) Extract date parts
F.datediff(end, start) Date difference
F.date_add(col, days) Add days
F.to_date(col, format) String to date

ðŸ’Ą Tips & Best Practices

âœĻ Useful Tips

  • Use Parquet for best performance
  • Cache DataFrames that are used multiple times
  • Use broadcast joins for small tables
  • Partition data by frequently filtered columns
  • Use explain() to understand query plans
  • Avoid UDFs when possible, use built-in functions