Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Jeph Herrin <info@flyingbuttress.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: -collapsetofile- |

Date |
Fri, 28 Feb 2014 13:48:24 -0500 |

I think you can get all the info you need to write Stata files here: http://www.stata.com/help.cgi?dta_115

cheers, Jeph On 2/28/2014 1:19 PM, Nick Cox wrote:

-save- is part of the executable . which save built-in command: save and so its code is not accessible to users. Nick njcoxstata@gmail.com On 28 February 2014 18:06, Andrew Maurer <Andrew.Maurer@qrm.com> wrote:Hi Statalist, I've written a pair of program -collapsetofile- and -recover- to allow users to "collapse" data to a file without destroying the dataset like -collapse- does. I don't know if anyone else will have use for this, but it will save me a lot of computer time when dealing with large datasets. I would be very interested if anyone has any input or comments on how to improve coding efficiency / style (the code is still a bit rough). ado file (collapsetofile.ado): http://codepad.org/DcwtvDEb ado file (recover.ado) : http://codepad.org/csZhQvb0 sthlp file (collapsetofile.sthlp): http://codepad.org/AsKC79uK The biggest improvement would come from being able to save directly to a .dta. I assume that this would require either: 1) looking at the format/header/footer of stata dtas in clear text and fwrite()'ing it from mata, and/or 2) looking at the source for a command like save and just copying that (is the source for -save- available?) Before writing this I found myself waiting for hours when graphing summary statistics of large datasets with sequences of: use fulldata // this could be >10gb preserve collapse (sum) thisvar thatvar, by(byvar1 byvar2) ... some data manipulation twoway line... restore preserve collapse (sum) anothervar yetanothervar, by(byvar3) ... some data manipulation twoway line... restore ... preserve collapse (sum) more vars, by(byvar10) ... some data manipulation twoway line... restore For a 20gb dataset with 10 graphs, that makes 10 preserves/restores * 20gb = 200gb written/read to disk. -collapsetofile- writes just the collapsed data to be graphed to a file with no other disk reads/writes: use fulldata collapsetofile (sum) thisvar thatvar using dataforgraph1, by(byvar1 byvar2) collapsetofile (sum) anothervar yetanothervar dataforgraph2, by(byvar3) ... collapsetofile (sum) more vars, by(byvar10) recover dataforgraph1, clear ... some data manipulation twoway line... ... recover dataforgraph2, clear ... some data manipulation twoway line... ... Thanks to Nick Cox for mentioning the importance of saving characteristics/metadata with the dataset. Thanks to Sergiy Radyakin for making me realize that I could never write a mata program that would compute stats "by" variables as fast as stata's -_mean- in -collapse-, since stata's built-in C code can take advantage of parallelization, while mata code cannot.* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ .

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: -collapsetofile-***From:*Andrew Maurer <Andrew.Maurer@qrm.com>

**Re: st: -collapsetofile-***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: -collapsetofile-** - Next by Date:
**RE: st: -collapsetofile-** - Previous by thread:
**Re: st: -collapsetofile-** - Next by thread:
**st: teffects, caliper, propensity score matching** - Index(es):