duckdb for Debian ----------------- 1. Create a CSV file: duckdb -c "COPY (SELECT 'Alice' AS name, 30 AS age UNION ALL SELECT 'Bob', 25) TO 'people.csv' (HEADER)" 2. Create a Parquet file: duckdb -c "COPY (SELECT 'Alice' AS name, 30 AS age UNION ALL SELECT 'Bob', 25) TO 'people.parquet'" 3. Query a CSV file directly (no import needed): duckdb -c "SELECT * FROM 'people.csv'" 4. Run a query against the CSV file: duckdb -c "SELECT name FROM 'people.csv' WHERE age > 27" 5. Query a Parquet file directly: duckdb -c "SELECT * FROM 'people.parquet'" 6. Import a CSV into a persistent table: duckdb mydata.db -c "CREATE TABLE people AS SELECT * FROM 'people.csv'" 7. Query the imported table: duckdb mydata.db -c "SELECT * FROM people" 8. List the largest installed packages on this system: echo 'package,version,size_KiB' > packages.csv dpkg-query -W -f '${Package},${Version},${Installed-Size}\n' >> packages.csv duckdb -c "SELECT package, version, size_KiB FROM read_csv('packages.csv', header=true) ORDER BY size_KiB DESC LIMIT 20" Extensions (amd64 and arm64 only) --------------------------------- Pre-built extensions are not available for loong64, ppc64el and riscv64; examples 9-11 below will fail on those architectures. DuckDB has a built-in registry of known extensions compiled into the binary. To list all available extensions and their status: duckdb -c "SELECT extension_name, installed, loaded, install_mode, description FROM duckdb_extensions()" External extensions are downloaded from extensions.duckdb.org and installed into ~/.duckdb/extensions///. The binary knows which extensions exist and where to fetch them — no network access is needed just to list them. Each extension is installed once with INSTALL, then loaded with LOAD at the start of each session. 9. inet — adds IP address types (INET/CIDR) and network functions. This example reads all network interfaces and their addresses from the system, including interfaces that are down: duckdb -c "INSTALL inet" ip -j a > /tmp/interfaces.json duckdb -c "LOAD inet; SELECT ifname, operstate, family, local::inet AS address FROM (SELECT ifname, operstate, unnest(addr_info, recursive:=true) FROM read_json('/tmp/interfaces.json')) UNION ALL SELECT ifname, operstate, NULL, NULL FROM read_json('/tmp/interfaces.json') WHERE len(addr_info) = 0 ORDER BY ifname" 10. httpfs — query remote files over HTTP/S directly without downloading them. This example queries the NYC taxi dataset hosted as a Parquet file: duckdb -c "INSTALL httpfs" duckdb -c "LOAD httpfs; SELECT passenger_count, round(avg(fare_amount), 2) AS avg_fare FROM 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2024-01.parquet' GROUP BY passenger_count ORDER BY passenger_count LIMIT 5" 11. spatial — geospatial types and functions (points, polygons, distance, etc.). This example calculates the distance in km from Paris to several European cities: duckdb -c "INSTALL spatial" duckdb -c "LOAD spatial; SET geometry_always_xy = true; SELECT city, ST_Point(lon, lat) AS location, round(ST_Distance_Sphere(ST_Point(lon, lat), ST_Point(2.3522, 48.8566)) / 1000, 0) AS km_from_paris FROM (VALUES ('Paris', 2.3522, 48.8566), ('Berlin', 13.4050, 52.5200), ('London', -0.1276, 51.5074), ('Madrid', -3.7038, 40.4168)) t(city, lon, lat) ORDER BY km_from_paris" For all command-line options: duckdb -h For shell dot commands: type .help inside the shell.