One file, one row per report with everything, even revision history and deleted reports.
This is for the serious data analysts to improve their results but also now anyone can search VAERS more easily.
Each week, VAERS data from the CDC is many files, multiple years, a file for symptom-entry labels, another for vaccine info and a third for other data with free-text about the harm and so on, plus USA and “foreign” are separated.
Casual information-gatherers putting on their data hat are often stymied by that complexity but now anyone can do it. For example, how many reports mention carditis? One query and done, see below. Fertility issues anyone? How many reports had the ‘DIED’ field changed in either direction? Pretty easy. Every word that was ever entered and released publicly is there.
You’ll be surprised at what you find, one way or another or both, I was.
Everything is combined in one flat file. What is a flat file? Example …
In the this VAERS report, five injections were entered, so in the CDCs 2021VAERSVAX.csv file, there are five lines, lots (doses).
However, in the result, that becomes a single line containing all of the information about that report including changes. Easy to gather information with searches.
The other CDC file with multiple lines per VAERS_ID is named like 2021VAERSSYMPTOMS.csv.
Those 8 rows of 39 symptoms all apply to this 38 year old woman reported in 1046447, and the flat form of it looks like this:
That’s how the flat file works, every report in a single line, here, it's that simple, despite scads of web pages contaminating the definition of a 'flat file'.
For easy word searches.
These are some numbers (of reports) for certain strings of characters, casting a wide initial net:
23513 stroke
24395 embol
30264 cereb
35474 vascul
48959 menstr
63941 lung|pulmon
69574 thromb|clot
148278 heart
178395 cardi
A more sophisticated net with regular expressions (regex) for pattern matching with wildcards …
207,384 female fertility and babies
\b(?:fo*et[ua]|perinatal|natal|neonat|embryo|baby|infant|stillb|born(?!e)|unborn|birth|abortion|gestat|trimester|metrorrhagia|menorrhagia|miscarr|vagin|uvul|vulv|ovar|oophor|uter[aiu]|cervix|endomet|parametritis|clitor|ca*esare|fallop|intermenstr|premenstr|menstr|menses|irregular period|matern|contracept|breast|mammary|mastit|lactat|low birth weight|pregnan|postmenopaus|preeclampsia|premature (?:rupture|labo|baby|deliv)|nipple (?:pain|swell)|donor sperm)
319,239 heart and vascular
\b(?:amyloidos|aneurysm|aorto|arrhythm|arter|atrial|atrio|beriberi|bundle branch|cardi|myocard|pericard|chagas|chordae tendinae|coronar|coxsackie|cytomegalovirus|dressler|dyssynchrony|ecg|ekg|enterovirus|eosinophilic|fistula|globinaemia|globinuria|glycogen storage|haemorrhage|heart|holt\-oram|iron overload|ischaemic|kearns\-sayre|orthostatic|papillary|peripartum|pleuroperi|pulmonale|pulseless electrical|rhabdomy|sarcoido|septal|septum|shoshin|shunt|sigmoid|steatosis|syphilit|tamponade|thromb|athero|uraemic|valve disorder|vein|ventric|fibril|cor pulmon|purpur|vascul|circulat|defib)
Again, they are the initial wide net.
Those searches are looking through everything involved in each report, all fields.
Download
The Excel file below has everything that is VAERS data for the covid era. Opens fine in the free alternative LibreOffice. Since spreadsheets have a limit of 1,048,576 rows and there are over 1.5 million covid vaccine reports, it was necessary to separate the records into two sheets. Meanwhile the size of an Excel file is only limited by one’s system memory. The two sheets can be combined into one csv for any purpose preferring that. Sorted by cell_edits, status, changes, and VAERS_ID.
Parent Directory: https://univaers.com/download/flatfile/
Run Output: HTM_run_output_vaers_flatfile.htm
… An html page you can open right now with a click (or tap), and most of the VAERS_IDs are clickable, so click the VAERS_IDs to examine reports in full at Medalerts, they open in a new tab. Can be useful for comprehending the CDC changes quickly with all notable changes present there. It is wordy as the operations were happening, showing what’s going on.
Results in Excel: XLSX_VAERS_FLATFILE.zip
Open Source Python Code: py_vaers_flatfile_build_all.txt (rename to run it)
Statistics added 2023-06-23: https://univaers.com/download/flatfile/stats.csv
… Charting the columns could be informative, the numbers each week. File is used each new week for totals (row ‘All’).
Some Details
Numbers below are based on CDC download as of Friday, June 9, 2023.
Most changes by CDC staff are grammar and spelling plus translations to English and that part is good. And then also, CDC removes names of places and people. Many changes can be tough to comprehend such as removing something and then placing it back later. Sometimes fields are even blank on release and filled in later. The htm file above shows all significant changes.
~1.56 million reports of vaccine harm**
7,042 writeups changed (in SYMPTOM_TEXT)
30,337 deleted reports preserved/noted
13 restored after being deleted
30,199 reports never published (bug fixed)
29,613 cells edited significantly by CDC.
542,206 delayed (held back, only released later)
1,036 cells that were blank but filled in later
1,541 cells completely blanked out unless you count the half million erased in the 11-11 purge of 2022, about a gigabyte of information, gone, but preserved here
Over 330,000 ages filled in, indicated in writeups but the age field was left blank (that’s from other code)
29,372,096 trivial cell changes of mere non-letter punctuation, ignored
Hard to believe but 1,323,039 repeat sentences in the writeups are removed, a total of total 53 MB. (Most extreme: 6,192 characters removed in VAERS_ID 1645697)
Symptom entries had the most changes through 2021 aside from bulk changes elsewhere involving the word vsafe
** CDC could and should separate out to a different file the administrative reports where no known harm to any person was being reported. And keep deleted reports in a public file with a column noting the reason for deletion.
Number of significant changes in each field
7042 SYMPTOM_TEXT
4137 OTHER_MEDS
3317 LAB_DATA
3062 ALLERGIES
2911 HISTORY
1944 CUR_ILL
1806 VAX_DOSE_SERIES
1390 SPLTTYPE
1129 symptom_entries
635 VAX_LOT
355 PRIOR_VAX
308 VAX_ROUTE
302 VAX_TYPE
300 VAX_NAME
282 VAX_MANU
185 DIED
94 AGE_YRS
84 STATE
60 NUMDAYS
58 VAX_SITE
50 CAGE_YR
33 OFC_VISIT
28 SEX
18 ER_ED_VISIT
17 V_ADMINBY
16 RECOVD
10 CAGE_MO
9 HOSPITAL
7 HOSPDAYS
5 DATEDIED
5 DISABLE
3 L_THREAT
3 ONSET_DATE
3 VAX_DATE
2 BIRTH_DEFECT
2 TODAYS_DATE
1 RECVDATE
0 ER_VISIT
0 V_FUNDBY
0 X_STAY
0 FORM_VERS
0 RPT_DATE
Clarifying
New columns:
cell_edits: Increase each time a cell in that report is changed (but ignoring trivial changes like punctuation)
status: Deleted, Restored and/or Delayed (release was held back)
changes: Which fields were changed in what way, and when.
A fourth new column, symptoms_entries
_|_
indicates entries from the SYMPTOMS files (all combined as one string in the symptom_entries column).
Example:
_|_Cardiac disorder_|_Electrocardiogram_|_Heart rate increased_|_
In the run output (and also ‘changes’ column):
<>
means not the same, previous week on the left, current week on the right
Open-close square bracket like []
means the field is empty.
On the other hand, nothingness on one side means you’re looking at an excerpt of only the changes, like this ...
symptom_entries 1395844 _|_Death_|_ <>
(That was the only one removed of the six symptom entries, click the VAERS_ID number above to see the report)
When more than one record has the same change, they look like this:
3 DIED [] <> Y [1196932, 1350545, 1526319]
Otherwise like this:
DIED 2180632 Y <> []
CDC did fill in some in the DIED field as Y after initial reports were released, like those 3 above.
They removed _|_Death_|_
as a symptom 9 times and added it twice.
When a report is deleted or restored, the cell_edits field increases by 42 as a way of calling attention to them (that’s the number of CDC columns).
There are some abbreviations made, like C19 Pfizer-BionT
instead of … COVID19 (COVID19 (PFIZER-BIONTECH))
Repeat sentences are replaced by ..^..
Other
I keep running out of disk space here, need to find a place for my code and I will make it update automatically each week which I can do, and someone to pay for it which I can’t do. At least 16 GB RAM, 6 TB HD.
Some might even call it entertaining, sharp analysis of VAERS by a colleague at this channel ...
If you use this flat file, I’d love to hear about it in the comments, also any comments. The goal is efficiency, accuracy, real numbers brought to light.
My man! I'm already using your flat file for "up-coding" all these serious SAE's they have sitting in the None of Above bucket. Like this: https://imgur.com/gallery/uBZz4sf. WYSIWYG like Openvaers & Medalerts, then some more dynamic views like vaersaware and your stuff! We don't need no stink'n badges!
Tremendous work, thank you