| Title: | Access 'Wharton Research Data Services' ('WRDS') |
|---|---|
| Description: | Provides simple functions for accessing data from 'Wharton Research Data Services' ('WRDS'), a widely used financial database in academic research. Includes credential management via the system keyring, database tools, and functions for downloading generic tables, 'Compustat' fundamentals, and linking tables. |
| Authors: | Ulrich Atz [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-1719-3780>) |
| Maintainer: | Ulrich Atz <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.1.9000 |
| Built: | 2026-05-29 17:01:26 UTC |
| Source: | https://github.com/statzhero/wrds |
Displays a glimpse-like summary of a WRDS table showing column names,
types, and human-readable labels, similar to dplyr::glimpse().
describe_table(wrds, library, table, n = 20, max_cols = 25)describe_table(wrds, library, table, n = 20, max_cols = 25)
wrds |
A |
library |
Character. The name of the library (schema). |
table |
Character. The name of the table. |
n |
Integer. Number of sample rows to fetch for value preview. Default is 20. |
max_cols |
Integer. Maximum number of columns to display. Default is 25. |
Invisibly returns a list with components:
A data frame with column_name, data_type, and label
Table description, or NA if unavailable
Row count
A data frame with sample rows (if n > 0)
## Not run: wrds <- wrds_connect() describe_table(wrds, "comp", "funda") wrds_disconnect(wrds) ## End(Not run)## Not run: wrds <- wrds_connect() describe_table(wrds, "comp", "funda") wrds_disconnect(wrds) ## End(Not run)
Downloads company-level static data from Compustat including header SIC codes, NAICS codes, state of incorporation, and other identifying information.
get_company( wrds, region = c("na", "global"), columns = NULL, n = Inf, lazy = FALSE )get_company( wrds, region = c("na", "global"), columns = NULL, n = Inf, lazy = FALSE )
wrds |
A |
region |
One of |
columns |
Character vector of columns to return. Defaults to key identifiers and classification codes. |
n |
Maximum number of rows to return. Defaults to |
lazy |
If |
The sic column contains the "header" SIC code, which is the company's
most recent SIC classification stored as a character. For historical SIC
codes that change over time, use get_compustat() with fill_sic = TRUE,
which coalesces the historical sich (integer) with the header sic.
A tibble with company header data. Default columns vary by region:
North America (from comp.company):
gvkey, conm: Identifiers
sic, naics: Industry classifications (character)
state, fic, loc: Geographic information
Global (from comp.g_company):
gvkey, conm: Identifiers
sic, naics: Industry classifications
fic, loc: Geographic information
get_compustat() for fundamentals data with optional SIC filling
## Not run: wrds <- wrds_connect() # Get company header data company <- get_company(wrds) # Get global companies g_company <- get_company(wrds, region = "global") # Lazy query get_company(wrds, lazy = TRUE) |> dplyr::filter(sic == "7370") |> dplyr::collect() wrds_disconnect(wrds) ## End(Not run)## Not run: wrds <- wrds_connect() # Get company header data company <- get_company(wrds) # Get global companies g_company <- get_company(wrds, region = "global") # Lazy query get_company(wrds, lazy = TRUE) |> dplyr::filter(sic == "7370") |> dplyr::collect() wrds_disconnect(wrds) ## End(Not run)
Downloads financial statement data from Compustat with standard filters for clean, analysis-ready data.
get_compustat( wrds, frequency = c("annual", "quarterly"), region = c("na", "global"), start_date = NULL, end_date = NULL, columns = NULL, add_columns = NULL, indfmt = "INDL", consol = "C", fill_sic = FALSE, n = Inf, lazy = FALSE )get_compustat( wrds, frequency = c("annual", "quarterly"), region = c("na", "global"), start_date = NULL, end_date = NULL, columns = NULL, add_columns = NULL, indfmt = "INDL", consol = "C", fill_sic = FALSE, n = Inf, lazy = FALSE )
wrds |
A |
frequency |
One of |
region |
One of |
start_date |
Start date for filtering. Character string in
|
end_date |
End date for filtering. Character string in
|
columns |
Character vector of columns to return, replacing the defaults.
Use |
add_columns |
Character vector of additional columns to include beyond
the defaults. Ignored if |
indfmt |
Industry format filter. Defaults to |
consol |
Consolidation level. Defaults to |
fill_sic |
If |
n |
Maximum number of rows to return. Defaults to |
lazy |
If |
Default filters follow standard practice for most research applications.
Region-specific filters are applied automatically based on region:
datafmt: "STD" for North America, "HIST_STD" for Global
popsrc: "D" (domestic) for North America, "I" (international) for Global
North America and Global data have different structures and should not be combined without careful column harmonization.
A tibble with Compustat fundamentals. Default columns vary by region:
North America (from comp.funda / comp.fundq):
Identifiers: gvkey, cusip, tic, conm, datadate
Time: fyear/fyearq, fyr/fqtr
Income: ni/niq, ib/ibq, oiadp/oiadpq, revt/revtq
Balance sheet: at/atq, lt/ltq, seq/seqq, ceq/ceqq
Market: csho/cshoq, prcc_f/prccq
Other: sale/saleq, capx/capxy, che/cheq, dlc/dlcq, dltt/dlttq
Industry: sich (historical SIC); sic (when fill_sic = TRUE, coalesced
from sich and header SIC)
Global (from comp.g_funda / comp.g_fundq):
Identifiers: gvkey, isin, conm, datadate
Geography: loc, fic, exchg
Similar financial variables (with some differences, e.g., nit/nitq
instead of ni/niq)
link_ccm() for CRSP-Compustat linking, get_company() for
company header data
## Not run: wrds <- wrds_connect() # Annual North America fundamentals funda <- get_compustat(wrds) # Quarterly with date filter fundq <- get_compustat(wrds, frequency = "quarterly", start_date = "2020-01-01", end_date = "2023-12-31" ) # Global annual g_funda <- get_compustat(wrds, region = "global") # Lazy query for further filtering get_compustat(wrds, lazy = TRUE) |> dplyr::filter(fyear >= 2020) |> dplyr::select(gvkey, datadate, at, lt) |> dplyr::collect() # Fill missing SIC codes with header SIC from comp.company funda_sic <- get_compustat(wrds, fill_sic = TRUE) # Preview first 100 rows before full download preview <- get_compustat(wrds, n = 100) wrds_disconnect(wrds) ## End(Not run)## Not run: wrds <- wrds_connect() # Annual North America fundamentals funda <- get_compustat(wrds) # Quarterly with date filter fundq <- get_compustat(wrds, frequency = "quarterly", start_date = "2020-01-01", end_date = "2023-12-31" ) # Global annual g_funda <- get_compustat(wrds, region = "global") # Lazy query for further filtering get_compustat(wrds, lazy = TRUE) |> dplyr::filter(fyear >= 2020) |> dplyr::select(gvkey, datadate, at, lt) |> dplyr::collect() # Fill missing SIC codes with header SIC from comp.company funda_sic <- get_compustat(wrds, fill_sic = TRUE) # Preview first 100 rows before full download preview <- get_compustat(wrds, n = 100) wrds_disconnect(wrds) ## End(Not run)
Generic function to download data from any table in the WRDS database. Returns a lazy table by default, allowing you to build queries with dplyr before collecting.
get_table(wrds, library, table, columns = NULL, n = Inf, lazy = TRUE)get_table(wrds, library, table, columns = NULL, n = Inf, lazy = TRUE)
wrds |
A |
library |
Character. The name of the library (schema), e.g., |
table |
Character. The name of the table within the library. |
columns |
Character vector of columns to return. If |
n |
Maximum number of rows to return. Defaults to |
lazy |
If |
This function provides generic access to any WRDS table. For commonly-used tables with standard research filters, prefer the specialized functions:
get_compustat() for Compustat fundamentals with standard filters
get_company() for company header data
link_ccm() for CRSP-Compustat linking
The lazy table can be filtered, selected, and mutated using dplyr verbs, which are translated to SQL and executed on the server:
get_table(wrds, "crsp", "msf") |> filter(date >= "2025-01-01") |> select(permno, date, ret, prc) |> collect()
A tbl_lazy object (if lazy = TRUE) or a tibble (if lazy = FALSE).
describe_table() to explore table structure,
list_tables() to list available tables in a library
## Not run: wrds <- wrds_connect() # Preview table structure first describe_table(wrds, "crsp", "msf") # Get a lazy table and build your query get_table(wrds, "crsp", "msf") |> dplyr::filter(date >= "2025-01-01") |> dplyr::select(permno, date, ret, prc, vol) |> dplyr::collect() # Collect immediately with specific columns get_table(wrds, "crsp", "dsf", columns = c("permno", "date", "ret", "prc"), lazy = FALSE, n = 1000 ) # Access any table in any library get_table(wrds, "ibes", "statsum_epsus") |> dplyr::filter(fpedats >= "2025-01-01") |> dplyr::collect() wrds_disconnect(wrds) ## End(Not run)## Not run: wrds <- wrds_connect() # Preview table structure first describe_table(wrds, "crsp", "msf") # Get a lazy table and build your query get_table(wrds, "crsp", "msf") |> dplyr::filter(date >= "2025-01-01") |> dplyr::select(permno, date, ret, prc, vol) |> dplyr::collect() # Collect immediately with specific columns get_table(wrds, "crsp", "dsf", columns = c("permno", "date", "ret", "prc"), lazy = FALSE, n = 1000 ) # Access any table in any library get_table(wrds, "ibes", "statsum_epsus") |> dplyr::filter(fpedats >= "2025-01-01") |> dplyr::collect() wrds_disconnect(wrds) ## End(Not run)
Downloads the CCM (CRSP-Compustat Merged) linking table that maps CRSP PERMNOs to Compustat GVKEYs with valid date ranges.
link_ccm( wrds, linktype = c("LC", "LU", "LS"), linkprim = c("P", "C"), n = Inf, lazy = FALSE )link_ccm( wrds, linktype = c("LC", "LU", "LS"), linkprim = c("P", "C"), n = Inf, lazy = FALSE )
wrds |
A |
linktype |
Character vector. Types of links to include.
Defaults to
|
linkprim |
Character vector. Link primacy filters.
Defaults to
|
n |
Maximum number of rows to return. Defaults to |
lazy |
If |
The linking table comes from crsp.ccmxpf_lnkhist. Missing linkenddt
values indicate ongoing links and are replaced with the maximum date in the
table for easier date-range joins.
To use the link, join on gvkey and ensure your observation date falls
within the linkdt to linkenddt range.
A tibble with columns:
Compustat company identifier
CRSP permanent security identifier
Start date of the link
End date of the link (missing values replaced with max date)
Type of link
Link primacy
Ian Gow, Financial Accounting Research, Chapter on Identifiers: https://iangow.github.io/far_book/identifiers.html
## Not run: wrds <- wrds_connect() ccm <- link_ccm(wrds) # Join with Compustat data compustat <- get_compustat(wrds) compustat |> dplyr::inner_join(ccm, by = dplyr::join_by(gvkey)) |> dplyr::filter(datadate >= linkdt, datadate <= linkenddt) wrds_disconnect(wrds) ## End(Not run)## Not run: wrds <- wrds_connect() ccm <- link_ccm(wrds) # Join with Compustat data compustat <- get_compustat(wrds) compustat |> dplyr::inner_join(ccm, by = dplyr::join_by(gvkey)) |> dplyr::filter(datadate >= linkdt, datadate <= linkenddt) wrds_disconnect(wrds) ## End(Not run)
Downloads the WRDS-provided linking table that maps IBES tickers to CRSP PERMNOs with valid date ranges and match quality scores.
link_ibes_crsp(wrds, max_score = 5L, n = Inf, lazy = FALSE)link_ibes_crsp(wrds, max_score = 5L, n = Inf, lazy = FALSE)
wrds |
A |
max_score |
Maximum match quality score to include. Defaults to better matches:
|
n |
Maximum number of rows to return. Defaults to |
lazy |
If |
The linking table comes from wrdsapps_link_crsp_ibes.ibcrsphist.
To use the link, join on ticker and ensure your observation date falls
within the sdate to edate range.
A tibble with columns:
IBES ticker
CRSP permanent security identifier
Start date of the link
End date of the link
Match quality score (1 = best, 6 = worst)
WRDS IBES-CRSP Linking Table Documentation: https://wrds-www.wharton.upenn.edu/documents/796/IBES_CRSP_Linking_Table_by_WRDS.pdf
## Not run: wrds <- wrds_connect() ibes_link <- link_ibes_crsp(wrds) # Join with IBES data on ticker and date range ibes_data |> dplyr::inner_join(ibes_link, by = dplyr::join_by(ticker)) |> dplyr::filter(date >= sdate, date <= edate) wrds_disconnect(wrds) ## End(Not run)## Not run: wrds <- wrds_connect() ibes_link <- link_ibes_crsp(wrds) # Join with IBES data on ticker and date range ibes_data |> dplyr::inner_join(ibes_link, by = dplyr::join_by(ticker)) |> dplyr::filter(date >= sdate, date <= edate) wrds_disconnect(wrds) ## End(Not run)
Returns a tibble of WRDS schemas the user has access to, with human-readable product names where available.
list_subscriptions(wrds)list_subscriptions(wrds)
wrds |
A |
A tibble with columns schema and product.
## Not run: wrds <- wrds_connect() list_subscriptions(wrds) wrds_disconnect(wrds) ## End(Not run)## Not run: wrds <- wrds_connect() list_subscriptions(wrds) wrds_disconnect(wrds) ## End(Not run)
Returns a tibble of table names within a WRDS library (schema), with human-readable descriptions where available.
list_tables(wrds, library)list_tables(wrds, library)
wrds |
A |
library |
Character. The name of the library (schema) to query. |
A tibble with columns table and description.
## Not run: wrds <- wrds_connect() list_tables(wrds, "comp") wrds_disconnect(wrds) ## End(Not run)## Not run: wrds <- wrds_connect() list_tables(wrds, "comp") wrds_disconnect(wrds) ## End(Not run)
Extracts the first two characters from SIC codes to create broader industry classifications.
sic_2digit(sic)sic_2digit(sic)
sic |
A numeric or character vector of SIC codes. |
SIC codes are hierarchical: the first two digits represent major industry groups (e.g., "54" = Retail-Food Stores), while the full 4-digit code provides more specific classifications (e.g., "5412" = Retail-Convenience Stores).
A character vector of 2-digit SIC codes.
# Convenience Stores (SIC 5412) -> Retail-Food Stores (54) sic_2digit(5412) # [1] "54" sic_2digit(c(5412, 5400)) # [1] "54" "54"# Convenience Stores (SIC 5412) -> Retail-Food Stores (54) sic_2digit(5412) # [1] "54" sic_2digit(c(5412, 5400)) # [1] "54" "54"
Establishes a connection to the WRDS PostgreSQL server using credentials stored securely in the system keyring.
wrds_connect(user_key = "wrds_user", password_key = "wrds_pw", keyring = NULL)wrds_connect(user_key = "wrds_user", password_key = "wrds_pw", keyring = NULL)
user_key |
Name of the keyring entry storing the WRDS username.
Defaults to |
password_key |
Name of the keyring entry storing the WRDS password.
Defaults to |
keyring |
Optional keyring name. If |
Credentials must be set up before first use with wrds_set_credentials().
The connection uses bigint = "numeric" so that 64-bit integers from
PostgreSQL are returned as doubles, which avoids overflow and works
well with tidyverse functions.
A DBIConnection object for the WRDS PostgreSQL database.
wrds_disconnect(), wrds_set_credentials()
## Not run: wrds <- wrds_connect() list_subscriptions(wrds) wrds_disconnect(wrds) ## End(Not run)## Not run: wrds <- wrds_connect() list_subscriptions(wrds) wrds_disconnect(wrds) ## End(Not run)
Closes a WRDS database connection.
wrds_disconnect(wrds)wrds_disconnect(wrds)
wrds |
A |
Invisibly returns TRUE if disconnection was successful.
## Not run: wrds <- wrds_connect() wrds_disconnect(wrds) ## End(Not run)## Not run: wrds <- wrds_connect() wrds_disconnect(wrds) ## End(Not run)
A mapping of WRDS schema names to human-readable product names.
Used by list_subscriptions() to enrich results, and available for
users who want to look up product names directly.
wrds_productswrds_products
A data frame with 613 rows and 2 columns:
WRDS schema or product code
Human-readable product name
https://wrds-www.wharton.upenn.edu/users/products/
Interactively stores WRDS username and password in the system keyring for secure, persistent storage.
wrds_set_credentials( user_key = "wrds_user", password_key = "wrds_pw", keyring = NULL )wrds_set_credentials( user_key = "wrds_user", password_key = "wrds_pw", keyring = NULL )
user_key |
Name for the username keyring entry. Defaults to |
password_key |
Name for the password keyring entry. Defaults to |
keyring |
Optional keyring name. If |
This function prompts for username and password interactively. Credentials are stored securely using the operating system's keyring (Keychain on macOS, Credential Manager on Windows, Secret Service on Linux).
Invisibly returns TRUE on success.
## Not run: wrds_set_credentials() ## End(Not run)## Not run: wrds_set_credentials() ## End(Not run)
Interactively updates the WRDS password stored in the system keyring without changing the username.
wrds_update_password(password_key = "wrds_pw", keyring = NULL)wrds_update_password(password_key = "wrds_pw", keyring = NULL)
password_key |
Name for the password keyring entry. Defaults to |
keyring |
Optional keyring name. If |
Invisibly returns TRUE on success.
wrds_set_credentials(), wrds_connect()
## Not run: wrds_update_password() ## End(Not run)## Not run: wrds_update_password() ## End(Not run)